Sunteți pe pagina 1din 8

TECHNICAL COMMUNICATION No. TC1437 Ed.

01

OmniPCX Enterprise Nb of pages : 8 Date : 15 February 2011

URGENT

NOT URGENT TEMPORARY PERMANENT

SUBJECT: LOSS OF SIGNALING LINK ON IP DEVICES AND INFORMATION ABOUT "UDP-


LOST" AND "UDP-KEEP ALIVE" TIMERS FROM QoS CATEGORIES

CONTENTS

1. PURPOSE OF DOCUMENT ...........................................................3

2. HOW DOES THE SYSTEM MONITORS THE QUALITY OF SIGNALING


LINKS OF IP DEVICES?.................................................................3

3. HOW TO ADAPT THE CONFIGURATION TO THE BEHAVIOR OF THE


DATA NETWORK?........................................................................4

4. SOME CONFIGURATION RECOMMENDATIONS ..........................5

5. SOME COMMON QUESTIONS .....................................................6


5.1 Why a message "UDP-Keep-Alive" every 15 seconds? .............................. 6
5.2 Why after a short network interruption sometime all IP devices do not
reboot? ..................................................................................................... 6
5.3 What is the delay "UDP Lost Reinit"? ......................................................... 6
5.4 Why IP devices reboot after detecting the loss of the signalling link? ....... 6
5.5 Can we increase timer "UDP-lost" to 255? ................................................ 6

6. PREVENTIVE MAINTENANCE .......................................................7

1
2
OmniPCX Enterprise
LOSS OF SIGNALING LINK ON IP DEVICES
AND INFORMATION ABOUT "UDP-LOST"
AND "UDP-KEEP ALIVE" TIMERS FROM QoS
CATEGORIES

1. PURPOSE OF DOCUMENT
The purpose of this document is to provide configuration recommendations when IP devices such as
GD, INTIP, IP-Phone, PCS are connected to the call server behind unstable IP links of the client
network.
The instability of the IP network is characterized by the loss of the signaling link with the call server,
followed by GD, INTIP and IP Phones reset.
If the customers complain, it is necessary to assess the quality of those links and adapt, if necessary,
the system configuration.
This document does not address the case of call server duplication.

2. HOW DOES THE SYSTEM MONITORS THE QUALITY OF


SIGNALING LINKS OF IP DEVICES?
When an IP device (GD, INTIP and PCS) is idle, the call server sends a "UDP-Keep-Alive" message
every 15 seconds and awaits a response "Keep-Alive-ACK" message within 200 ms. For their part,
the IP devices send a "UDP-Keep-Alive" message to the call server and are waiting "Keep-Alive-ACK"
within 200 ms message as response.
Special case for IP Phones: They do not send "Keep-Alive" to the call server but are waiting for a
"Keep-Alive" every 15 seconds.
When an IP device is in traffic, data messages (signaling messages) exchanged with the call server
have the same function as the "Keep-Alive". For each data message sent, a "Data-ACK" message is
expected within 200 ms. This explains why there is no "Keep-Alive" message when there is signaling
traffic on the IP link.
When an IP device is in conversation, exchange of "Keep-Alive" is enabled as there is no signaling
message.
If a "Keep-Alive-ACK" or "Data-ACK" acknowledgment message is not received within 200 ms, the
sender of the "Keep-Alive" or Data message retransmits every second the same message until the
end of the "Udp-Lost" timer (7 seconds by default). This mechanism is used both by the call server
and IP devices (GD, INTIP and PCS).
The initial message is retransmitted 7 times before assuming that the signaling link is interrupted,
and call-server transmits following incidents
0379 = Inter Link Crystal HS: 23 (19.1), 10.17.4.40,00:80:9 f: 8d: 00:3 e
for the loss of the GD or INTIP.
0386 = Sig. My Computer IP: No response from the station 00:80:9 f:
56:05: f3, 34012 for the loss of IP Phones.
0430 = 10.17.1.1 PCS is inactive for the loss of the PCS

Ed. 01 / 15 February 2011 3 TC1437


OmniPCX Enterprise
LOSS OF SIGNALING LINK ON IP DEVICES
AND INFORMATION ABOUT "UDP-LOST"
AND "UDP-KEEP ALIVE" TIMERS FROM QoS
CATEGORIES

If they are not rescued by PCS or backup signalling, GD and INTIP reboot in a loop until the link
with the call server is restored, then transmit to the call server incident 5857 giving the reason why
they rebooted "Hardware reset on loss of IP DL (DL = Data Link).
5857 = GD / GA / INTIP / RGD: Reason of reboot 2 for GD, GD3, INTIP3
5857 = GD / GA / INTIP / RGD: Reason of reboot 5 for INTIP2
The example given above corresponds to the default values of timers defined in "QoS" IP category:
− UDP Lost = 7s
− Lost UDP Reinit = 7s
− UDP-Keep-Alive = 15s
However we will see why and how these values can be changed depending on the behavior of the
client network.

3. HOW TO ADAPT THE CONFIGURATION TO THE BEHAVIOR OF


THE DATA NETWORK?
Two types of network interruption are generally observed on client data networks.
1 Long cuts.
They are usually due to works on the data network or exceptional accidents. In this case, it is not
necessary to change the default values of timers in QoS categories. When the signaling link is
lost, IP equipments reset permanently until they found the link with the call server. IP devices
secured by PCS or backup signaling will be secured at the earliest, after 7 seconds (timer UDP-
Lost 7s), and at the later after 22 seconds (timer UDP Lost 7s + Keep-Alive 15s).
2 Short cuts.
The notion of "short cuts" is very relative because it depends on the quality of the customer data
network. Instabilities are encountered mainly on links with remote sites (sometimes WANs) on
which the cuts are greater than 7 seconds (UDP-Lost timer) defined by default in the QoS
category assigned to GD / INTIP and to IP domains. So it is necessary to adapt the Udp-Lost
timer to the behavior of these links.
We must estimate the duration of cuts. Usually chronic, they are easily detectable by investigating
incidents 5857 related to the reboot reason 2 for GD, GD3, INTIP3 and reboot reason 5 for
INTIP. The Udp-Lost timer of 7 seconds was set by Alcatel-Lucent which believes that an
interruption of more than 7 seconds is exceptional on a network that supports VoIP. However it
appears that in reality this timer must be increased to avoid repetitive reboots of IP devices or
switches on PCS or on backup signaling link, which may be more disadvantageous for the end
user than repetitive cuts on an unstable network.

TC1437 4 Ed. 01 / 15 February 2011


OmniPCX Enterprise
LOSS OF SIGNALING LINK ON IP DEVICES
AND INFORMATION ABOUT "UDP-LOST"
AND "UDP-KEEP ALIVE" TIMERS FROM QoS
CATEGORIES

4. SOME CONFIGURATION RECOMMENDATIONS


Take the example of a remote installation behind ADSL link, equipped with a GD, IP Phones and
with a PCS whose frequent interruptions of the link are evaluated at 30 seconds.
This configuration will be applied only to the "UDP-Lost" timer of the QoS category assigned to the IP
domains and GD.
The "UDP-Keep-Alive" timer should remain to 15s and "UDP-Lost" should remain to 7s.
IP Phones and PCS use the QoS category assigned to the IP domain of remote site.
The GD uses QoS category defined in its Ethernet parameters (mgr / Shelf / Board /
Ethernet parameters).
So that the PCS is able to receive connection requests from the GD and IP Phones, it is important that
it switches in the ACTIVE state before GD and IP Phones detecting the loss of the signaling link.
For this reason, we need the IP address of the PCS to be excluded from ranges of the IP domain of
GD and IP Phones. Thus, the PCS will be in IP domain "0" by default and will take the QoS category
"0" assigned to the default IP domain "0". This justifies to never change the default values of QoS
category "0".
16 QoS categories are available in the system. This allows choosing one of them, and then
changing its "UDP-Lost" timer. In our example, it may be increased from 7 to 35 seconds before
assigning it to the IP domain and GD. It is possible to give a name to each class of QoS to identify,
for example, "QoS for ADSL unstable connections". To validate the new QoS, it is necessary to reset
GD and IP Phone and make a pcscopy.
When the link interruption is higher than 50 seconds, the PCS will switch to ACTIVE mode between
7s and 22s (22 = 7 + 15 UDP-Keep-Alive), then the GD and IP Phones will connect PCS between
35s and 50s (50 = 15 + 35).
When the link interruption is between 7 and 35s seconds, the PCS will switch to ACTIVE mode
between 7s and 22s (22 = 7 + 15), but as no any IP device will connect on it, PCS will not reboot at
the link restore. However, it may be useful to monitor the incidents 0430 = PCS is disabled and
0429 = PCS is activated to assess the instability of the IP link.
In summary
− Keep the timer values of the QoS category "0" with the default values:
• UDP Lost = 7s
• Lost-UDP-Reinit = 7s
• UDP-Keep-Alive = 15s
− The IP address of the PCS should be out of range of IP domain of GD and IP Phones.
− Change the "UDP Lost" timer of a QoS category then identify it.
− Assign the modified QoS category to IP domain and Ethernet parameters of GD based on the
quality of the link between the call-server and the remote site.
− Validate the new QoS category by resetting GD and IP Phone, then do a pcscopy.

Ed. 01 / 15 February 2011 5 TC1437


OmniPCX Enterprise
LOSS OF SIGNALING LINK ON IP DEVICES
AND INFORMATION ABOUT "UDP-LOST"
AND "UDP-KEEP ALIVE" TIMERS FROM QoS
CATEGORIES

5. SOME COMMON QUESTIONS

5.1 Why a message "UDP-Keep-Alive" every 15 seconds?


The "Keep-Alive" timer was set to three seconds up to OmniPCX Enterprise Release 6.0, then it has
been increased to 15 seconds by default because the number of users and IP Media Gateway
allowed on the system increased. It was therefore necessary to limit the control messages in the
customer's data network. This delay should not be changed.

5.2 Why after a short network interruption sometime all IP devices do not
reboot?
During short breaks in the network, some IP devices can be lost by the system while others are not. It
depends on when the "UDP-Keep-Alive" is positioned on each device in relation to the duration of
the outage. This example justifies increasing UDP-Lost timer in case of short network interruptions.
Network breaks less than 22 seconds (15 seconds of "Keep-Alive" + 7 seconds of "UDP-Lost) are not
necessarily detected.

5.3 What is the delay "UDP Lost Reinit"?


The timer "UDP Lost Reinit" is an historic timer that is no longer used. It allowed IP devices (IP Phones
and INTIPB) that had a link with physical INTIPA to wait for a possible rescue of a second INTIPA
board.
This delay is no longer used since the signaling link is done by fictive INTIPA in call server. If the
process of a fictive INTIPA backtrace, it is immediately replaced by the process of the second fictive
INTIPA.
The "UDP-Lost-Reinit" timer should not be changed.

5.4 Why IP devices reboot after detecting the loss of the signalling link?
Alcatel-Lucent believes that a network supporting VoIP should be stable. However, it is planned to
reboot IP devices when they lose the IP signaling link with the call server for a defense reason. For
this reason it is necessary to analyze the 5857 incidents as they give the reason for restarting GD /
INTIP "Hardware reset on loss of DL-IP. (DL = data link)
5857 = GD / GA / INTIP / RGD: Reason of reboot 2 for GD, GD3, INTIP3
5857 = GD / GA / INTIP / RGD: Reason of reboot 5 for INTIP2

5.5 Can we increase timer "UDP-lost" to 255?


This is not recommended because it can create bad side effects like buffers overflow. More then 60
seconds contact technical support.
When increasing "UDP Lost" timer to ensure that equipment does not reboot, the client must be
aware that users may complain about interruptions during audio conversation or terminal blocked
for some seconds.

TC1437 6 Ed. 01 / 15 February 2011


OmniPCX Enterprise
LOSS OF SIGNALING LINK ON IP DEVICES
AND INFORMATION ABOUT "UDP-LOST"
AND "UDP-KEEP ALIVE" TIMERS FROM QoS
CATEGORIES

6. PREVENTIVE MAINTENANCE
Statistics tool of signaling link.
It is possible to observe the behavior of signaling link. The statistics are stored in memory and are
saved in the monitor.log file stored in /mnt/flash/info during reboot of the GD. The
statistics are also available in RAM during operation of the coupler, and give an overview of the
health of the signaling link.
− To collect statistics in real time in live memory proceed as follows.
• GD / GA, GD2 / GA2
Connect with telnet then monitor
board 0 0
Other 0 0
sig
dump
− For GD3 / GA3 INTIP3:
• Connect with telnet then monitor
system
dl
dump
− For INTIP / INTIP2
• Do a cpl_online <crystal> <coupler>
remoteip
dump
− To collect statistics recorded during previous GD reboot you have to open monitor.log files
stored in /mnt/flash / info".
Interpretation
• From A to B, are recorded the last 20 events occurring on the signalling link.
• At 10.20 am as there is traffic, we can not see any "Keep-Alive" messages, but only data
messages. We can see that some data messages were not acknowledged as well on call
server side as on GD side.
• A 10:21 am: (C) the GD no longer receives data ACK, and after 7 consecutive attempts to
send the same message, GD reboots at the end of the timer UDP-Lost 7s.
In the table at the end of this edition, we can see that since the last restart of the GD
• 579 times GD had to retransmit a message once because he was not acknowledged.
• 6 times GD had to send the same message 2 times because it was not acknowledged.
• 5 times GD had to send the same message 5 times because it was not acknowledged.
• After 7 consecutive failures, the GD reboots.

Ed. 01 / 15 February 2011 7 TC1437


OmniPCX Enterprise
LOSS OF SIGNALING LINK ON IP DEVICES
AND INFORMATION ABOUT "UDP-LOST"
AND "UDP-KEEP ALIVE" TIMERS FROM QoS
CATEGORIES

Dump DL incident:
02/12 10:19:40 - LOCAL nack 1 times seq num 16146 <---------------------- A
02/12 10:19:48 - DISTANT nack 1 times seq num 9259
02/12 10:19:48 - LOCAL nack 1 times seq num 16237
02/12 10:20:18 - LOCAL nack 1 times seq num 16248
02/12 10:20:36 - Msg lost
02/12 10:20:36 - DISTANT nack 1 times seq num 9320
02/12 10:20:39 - DISTANT nack 1 times seq num 9350
02/12 10:20:42 - DISTANT nack 1 times seq num 9352
02/12 10:20:56 - LOCAL nack 1 times seq num 16280
02/12 10:20:57 - DISTANT nack 1 times seq num 9355
02/12 10:21:15 - Msg lost
02/12 10:21:15 - DISTANT nack 1 times seq num 9360
02/12 10:21:21 - Msg lost <----------------------------------------- C
02/12 10:21:22 - Msg lost
02/12 10:21:23 - Msg lost
02/12 10:21:24 - Msg lost
02/12 10:21:25 - Msg lost
02/12 10:21:26 - Msg lost
02/12 10:21:27 - Msg lost <----------------------------------------- B
02/12 10:21:27 - State of the link changed to LINK_DOWN

Dump of DL Stat :
000000AA-08BB676F: - Rx num = 26282 (Data 17510 + Ack 8772)
000000AB-08BB676F: - Rx Nack num = 698
000000AC-08BB676F: - Tx num = 9365
000000AD-08BB676F: - Tx Nack num = 1188
000000AE-08BB676F: - Tx Rp num = 0
000000AF-08BB676F:

000000B0-08BB676F: DL Retrans profile :


000000B1-08BB676F: | Nb retrans | Counter |
000000B2-08BB676F: | 1 | 579 |
000000B3-08BB676F: | 2 | 6 |
000000B4-08BB676F: | 3 | 0 |
000000B5-08BB676F: | 4 | 0 |
000000B6-08BB676F: | 5 | 5 |
000000B7-08BB676F: | 6 | 0 |
000000B8-08BB676F: | 7 | 1 |
000000BB-08BB676F: Mngt_Sig (callback_process_link_indic) DATA LINK 0 is DOWN

TC1437 8 Ed. 01 / 15 February 2011

S-ar putea să vă placă și