Sunteți pe pagina 1din 5

Adaptive Thresholds for AMR Codec

Mode Selection
Tomas Lundberg, Peter de Bruin, Stefan Bruhn, Stefan Håkansson, and Stephen Craig
Ericsson Research
Ericsson AB
Sweden
Email: {tomas.lundberg, peter.de.bruin, stefan.bruhn, stefan.lk.hakansson, stephen.craig}@ericsson.com

Abstract— The speech codecs from the Adaptive Multi-Rate so that the total (gross) bit rate is constant. Consequently, the
(AMR) codec family enable provisioning of excellent speech qual- lower the source bit rate becomes, the more robust the codec
ity, at the same time providing a way forward towards state-of- mode is against bit errors, since a larger portion of the bit rate
the-art, spectrally efficient, high capacity cellular networks. One
straightforward way to characterize the benefit of AMR speech is used for channel error protection.
codecs is that the robustness to interference and noise in radio The gains from AMR are two-fold: When radio quality is
networks is increased and that this advantage over other, non- sufficiently good, a higher codec mode can be used, thereby
adaptive, speech codecs can be capitalized on in several different improving the perceived speech quality. On the other hand,
ways, e.g., by enhancing speech quality or improving system since lower AMR codecs modes are very robust, radio network
capacity. In this paper, improved mode adaptation, where codec
mode switching thresholds are adaptive to radio conditions, is planning can be made to allow an increased interference level
discussed. Example simulations show that an adaptive thresholds in the system, thereby increasing system capacity significantly,
algorithm applied to GSM can significantly improve objective comparing, e.g., narrowband AMR to the common Enhanced
speech quality. Corresponding improvements were also found in Full Rate (EFR) codec in GSM. Thus, introducing AMR may
informal listening tests. improve speech quality, increase capacity or yield a combined
I. I NTRODUCTION effect of both quality and capacity improvement.
This paper proposes Adaptive Thresholds as an improve-
Adaptive Multi-Rate (AMR) codecs are standardized by
ment to current AMR codec mode adaptation. Section II
3GPP for GSM [1], [2], the world’s most widespread cellular
describes mode adaptation in more detail and motivates the
technology, as well as for WCDMA. In GSM the codec
need for adaptive thresholds. An algorithm is exemplified in
mode adaptation is based on estimations of the radio link
section III, and simulations in section IV analyze its impact on
quality, while in WCDMA the AMR adaptation concept works
speech quality. Finally, conclusions are outlined in section V.
completely differently. Quality of Service control in WCDMA
basically corresponding to AMR codec mode adaptation in II. C ODEC M ODE A DAPTATION
GSM is done by means of fast power control, whereas codec A. General description
mode adaptation may be used as a tool for capacity control.
For codec mode adaptation, the receiving side performs
In this paper we will exclusively deal with AMR in GSM.
radio link quality measurements of the incoming channel
Two variants of AMR exist, narrowband AMR and wide-
yielding a quality indicator (QI), which is defined as an
band AMR (AMR-WB). Narrowband AMR consists of eight
equivalent carrier-to-interference (C/I) ratio [3]. The QI is then
codec modes with different source bit rates, from 12.2 kbps
compared to a set of pre-defined thresholds to decide which
down to 4.75 kbps. It provides the traditional audio bandwidth
codec mode to use. The thresholds are normally fixed during
of PSTN telephony of about 100–3500 Hz. AMR-WB contains
a call, but the system can initiate a change to the thresholds.
nine different codec modes with source bit rates from 6.6
To obtain the best possible speech quality it is important to
kpbs up to 23.85 kpbs,1 and with an audio bandwidth of 50–
properly select the thresholds for codec mode adaptation. This
7000 Hz. The increased bandwidth improves the intelligibility
implies that it is necessary to obtain a QI that correctly reflects
and naturalness of speech significantly at the same time as
the speech quality for any given radio condition, frequency
the quality for music and mixed content material is improved.
hopping scheme, and network configuration. Obviously, this
While we have focused on narrowband AMR in this paper,
procedure may be quite complicated. Furthermore, conditions
the same solution could be implemented also for AMR-WB.
vary over time. There may also be performance variations
Common to both AMR variants is that a number of codec
between different receiver units, both regarding actual perfor-
modes are collected into a pre-defined Active Codec Set
mance and QI estimation. This means that it is likely that even
(ACS), which usually is fixed during a call. The level of
well-selected adaptation thresholds will not be optimal at all
channel coding is adjusted depending on the source bit rate
times.
1 For GSM Full Rate channels three codec modes, with source bit rates of Fixed thresholds can be sub-optimal for the current condi-
6.6 kbps, 8.8 kbps, and 12.65 kbps, are feasible. tions by being either too high or too low. In the case where

0-7803-8887-9/05/$20.00 (c)2005 IEEE


estimate long−term calculate
the thresholds are too high, a switch from a less robust codec radio
long−term SQ outside
yes
threshold
parameters
mode to a more robust mode will be initiated earlier than SQ codec limits? adjustment
no
necessitated by the radio conditions. This will cause a slight
degradation of the speech quality due to the lower intrinsic
speech quality of the more robust codec mode. A more serious set threshold
adjustment return
threshold
adjustment
problem arises when the thresholds are too low, causing the to 0.0
switch from the less robust mode to occur too late. This may
significantly increase both the frame and bit error rates after Fig. 1. Simplified block diagram of the adaptive thresholds algorithm. SQ
denotes “speech quality”.
channel decoding and in turn cause a severe degradation of
the speech quality. However, since both cases lead to speech
quality reduction they should both be avoided.
performed in the same node as that in which the radio link
B. Adaptive thresholds for codec mode adaptation quality measurements are conducted.
Our proposed solution to the above-mentioned problems The proposed algorithm estimates the speech quality on
is to use thresholds that are adaptive to the current radio the receiving link (i.e. in the MS) and compares the estimate
conditions. An algorithm to adapt thresholds could be applied against given speech quality limits for each codec mode. A
either on the terminal side or on the network side, working simplified block diagram of the algorithm can be seen in
on the uplink and/or the downlink. Hence, considering down- Fig. 1. The speech quality could be estimated from Frame
link threshold adaptation, either the receiving terminal could Erasure Rate (FER), bit error rate measures, e.g., RxQual,
modify the thresholds directly, or the network could initiate or objective speech quality measures, e.g., SQI [4]. In this
the threshold adaptation in the terminal based on the mea- particular implementation we have elected to use FER reported
surement reports that it receives. The current standard allows by the MS in the Enhanced Measurement Report (EMR) [5];
the application of a “normalization factor” to normalize the QI consequently, a necessary requirement for the algorithm to
for different receiver performances. So even though it could work is that the MS supports EMR. EMR includes the number
be argued that terminals are not allowed to modify their own of correctly received frames during each measurement period
codec mode switching thresholds, a direct modification and of 480 ms, i.e., 24 speech frames, from which the FER can
adaptation of thresholds in the proposed manner is considered be derived.
to be standard compliant. The speech quality estimate for the receiving link is gen-
There are in theory several advantages of having the algo- erally too noisy for direct use in the threshold adjustment
rithm working on the receiving side, i.e., in the network and decision, and instead a long-term average is calculated. If
working on the uplink or in the terminal and working on the the estimated long-term speech quality for a given mode is
downlink, since in that case more information about the quality outside its limits, either too good or too bad, it is likely
of the link can be made available to the algorithm. It will lead that the associated threshold for switching to the appropriate
to a faster and more accurate adaptation of the thresholds. adjacent codec mode is sub-optimal for the current radio
However, in practice it is often desirable for operators to conditions. The algorithm will then modify all codec mode
align the QI estimation and hence adaptation performance of switching thresholds accordingly. The reason for modifying
the various Mobile Stations (MSs) from different vendors, all thresholds instead of only the threshold in question is
which are likely to vary. This implies that for downlink mainly practical; it simplifies maintaining the thresholds in
threshold adaptation an algorithm where the network initiates a consistent order, i.e. not overlapping [3].
the threshold adaptation in the MS is required. It is, in fact, necessary to estimate two different long-
For such a network-based algorithm working on the down- term speech quality values, one for comparing with the upper
link the new set of thresholds could be sent to the MS using speech quality limit, and one for comparing with the lower
a RATSCCH message (Robust AMR Traffic Synchronized limit. The reason for this is that a high FER value in the lowest,
Control Channel). A RATSCCH message steals one speech most robust codec mode is obtained when the radio conditions
frame in AMR Full Rate and two speech frames in AMR are poor, regardless of the values of the thresholds. Therefore,
Half Rate, effectively causing one or two erased frames. FER values obtained when in the lowest codec mode are
Consequently, the algorithm should not be allowed to update discarded when calculating the long-term speech quality used
the thresholds too often. Simulations show, however, that this for comparison with the upper limit. The converse is true for
is not a problem in realistic scenarios. the long-term speech quality used for comparison with the
lower limit, i.e. regardless of the value of the thresholds, a low
III. D ESCRIPTION OF THE A LGORITHM FER value in the highest, least robust codec mode is obtained
In this paper we propose a network-based threshold adap- when the radio conditions are good, and consequently FER
tation algorithm for the downlink. As this case is more values obtained when in the highest codec mode are discarded
constrained than the case where the threshold adaptation is in this case.
done directly at the receiver, its performance will be a lower The calculation of the threshold adjustment is based on the
bound for direct adaptation methods where the adaptation is two long-term speech quality estimates and on the attainable
speech quality with the current ACS. Due to the averaging 3.8
without adaptive thresholds
process used in obtaining the long-term speech quality esti- with adaptive thresholds

mates, several threshold adjustments will usually have to be 3.7

speech quality (PESQ-LQO)


made. Thus, the threshold adaptation speed is much lower
than the codec mode adaptation. The threshold adaptation can
3.6
be thought of as a fine-tuning of the normal codec mode
adaptation: the normal codec mode adaptation takes care of
the time-critical adaptation in response to rapid changes in 3.5

the radio environment. Concurrently, the threshold adaptation


monitors the normal codec mode adaptation, and compensates 3.4
for long-term systematic errors.
IV. S IMULATION R ESULTS 3.3

correct

+2dB

+5dB

+8dB
Simulations were performed in an AMR link simulation
tool, using Typical Urban 3 km/h (TU3) as the fast fading
profile, additional slow fading (shadowing), and with different Fig. 2. Speech quality, according to [6], averaged over the whole simulation
frequency hopping scenarios. An ACS with the modes MR475, for TU3IFH and four different C/I overestimation levels.
MR59, MR67, and MR102 was used to evaluate the perfor-
mance gain, where a receiver with incorrect C/I estimation
was simulated. The incorrect C/I estimation could for instance 5 dB. For each such segment the speech quality is given as
be due to unusual radio environments, broken antennas, a the difference in speech quality relative to the corresponding
stationary MS in a non-frequency hopping environment, etc. segment in a simulation with the correct C/I estimation. The
Moreover, since the standard does not contain any detailed speech quality for the first segment (given by the first two bars
description on implementation but merely lists performance from the left) is considerably lower compared to a simulation
requirements, another cause for inconsistencies in the C/I using correct C/I estimation, and this is caused by not selecting
estimation could be different, though standard compliant, a sufficiently robust codec mode due to the overestimation of
receiver and C/I estimator implementations. the C/I. At this initial stage the threshold adaptation is not
For simplicity and ease of interpretation, a fixed, systematic yet in steady state. The first threshold adjustment occurred
error was studied. However, it should be noted that the after approximately 5 seconds, and after that the speech
algorithm is not limited to deal with only this kind of error. The quality, on average, improved considerably. The fluctuations
incorrect C/I estimations were modeled by adding a positive in speech quality, and the seemingly strange phenomenon that
or negative offset to the C/I estimates before the codec mode the speech quality without adaptive thresholds can be higher
selection. The offset was kept fixed during a given simulation than with adaptive thresholds for a few segments, have a
run, and varied between different runs. A positive C/I offset simple explanation: when codec mode switching thresholds are
(i.e. an overestimation of the C/I) corresponds to setting the determined, the optimal thresholds are those that are optimal
thresholds too low, while a negative offset (i.e. an underestima- over the long term, i.e. on the order of minutes. The eight
tion of the C/I) corresponds to setting the thresholds too high. second long segments used in Fig. 3 are so short that, due
As previously discussed, using too low thresholds is a more to random fluctuations in the radio environment, a codec
serious problem than using too high thresholds. Consequently, mode other than the globally optimal codec mode can give
more emphasis was put on evaluating the former case. better speech quality for just that particular segment. The
The simulations were run for 22,000 speech frames aggregated speech quality, e.g. as shown in Fig. 2, is obtained
(440 seconds) and the speech quality was evaluated using by averaging the segments over time.
PESQ, a tool for calculating objective speech quality [6], as The corresponding view in the FER domain is shown in
well as informal listening tests. FER statistics and codec mode Fig. 4. Here the accumulated number of frame erasures is
usage were also registered and used in the evaluation. plotted against time, with and without adaptive thresholds,
together with the location and size of the threshold adjust-
A. Overestimated C/I ments. The first threshold adjustment, in which the thresholds
The speech quality evaluation results for TU3 with Ideal were adjusted 3 dB upwards, came after only 4.8 seconds. A
Frequency Hopping (IFH) are shown in Fig. 2, where “correct” second adjustment of 2.5 dB occurred after a little less than
means correct C/I estimation, “+2 dB” an overestimation of two minutes. In this particular case, the algorithm slightly
2 dB, etc. As can be seen in the figure, when employing the overcompensated the C/I overestimation of 5 dB, since the
adaptive thresholds algorithm the speech quality is maintained thresholds were adjusted by in total 5.5 dB. This was due
at approximately the same level as without any offset, even at to a deliberate design choice: it was deemed more important
the higher C/I estimation error levels. to make rather large steps in the threshold adjustments in
Fig. 3 shows a detailed plot of the speech quality for each order to speed up the adaptation than it was to avoid the
segment of 8 seconds for the case of an overestimation of risk of overcompensating the C/I estimation bias. Such an
0.8 3.8
+5 dB, without adaptive thresholds without adaptive thresholds
0.6 +5 dB, with adaptive thresholds with adaptive thresholds
speech quality difference (∆PESQ-LQO)

0.4 3.7

speech quality (PESQ-LQO)


0.2

0 3.6
-0.2

-0.4
3.5
-0.6

-0.8
3.4
-1

-1.2
3.3

correct

-2dB

-5dB

-8dB
-1.4
0 50 100 150 200 250 300
time (s)

Fig. 3. Speech quality for each 8 second segment for the case of a 5 dB Fig. 5. Speech quality, according to [6], averaged over the whole simulation
overestimation. The speech quality is given as the difference relative to a for TU3IFH and four different C/I underestimation error levels.
simulation with correct C/I estimation. Only the first 300 seconds are shown.

600
correct speech quality in this case is mainly due to the difference in
+5 dB, no adaptive thresholds intrinsic speech quality of the different codec modes. This is in
500 +5 dB, with adaptive thresholds
ackumulated number of FE

contrast to the case of C/I overestimation where the differences


400
in speech quality from not selecting sufficiently robust speech
codec modes are much more pronounced. Also, in contrast
300 to the overestimation case, the algorithm here must not be
too aggressive in its threshold adjustment, since adjusting the
200 thresholds downwards too much must be avoided at all costs.
+2.5
+3.0
Fig. 5 shows the speech quality averaged over the whole
100 speech database for simulations with different levels of the
C/I underestimation. Using adaptive thresholds improves the
0 speech quality, although not as drastically as in the case of
0 50 100 150 200 250 300 350 400 450
time (s) overestimating the C/I.
The explanation for the improvement in speech quality can
Fig. 4. The accumulated number of frame erasures (FE) for an overestimation be found by looking at the relative usage of the different
of 5 dB, with and without adaptive thresholds, compared to a simulation with
correct C/I estimation. The arrows show the location and the size of the
codec modes in Fig. 6. Here it can be seen that with adaptive
threshold adjustments. thresholds the usage of the highest mode approaches the level
obtained with a correct C/I estimation, and that the usage of
the lowest mode has decreased. Despite this clear improvement
approach also reduces the number of threshold adjustments in codec mode usage, the use of adaptive thresholds does not
that have to be made, which means less signaling, although quite enable the optimal codec mode distribution as obtained
at the potential cost of slightly suboptimal thresholds in with a correct C/I estimation to be reached. This is why the
some situations. It is unrealistic to expect the algorithm to average speech quality with adaptive thresholds in Fig. 5 does
always exactly compensate for an incorrect C/I estimation; not reach the level obtained with a correct C/I estimation.
however, depending on the algorithm parameter settings, it The higher relative usage of the MR475 codec mode with its
almost exclusively comes within ±1 dB. lower intrinsic speech quality lowers the overall speech quality
compared to the case with correct C/I estimation. However, as
B. Underestimated C/I discussed previously, the adaptive threshold algorithm must
The results from simulations with an underestimation of the be rather conservative in its adjustments when compensating
C/I are more subtle. This is to a large extent due to the fact that for underestimations in C/I. The FER level after threshold
the difference in speech quality in this case almost exclusively adjustments should never be allowed to exceed the FER level
comes from selecting a too robust codec mode, which gives a obtained with correct C/I estimations since that would lead to
FER level that is too low.2 In other words, the difference in significant reductions in speech quality.

2 The optimal speech quality is obtained with a certain FER that in most C. Listening tests
cases is not the same as the lowest attainable FER level. In fact, the lowest
FER level is obtained by exclusively using the most robust codec mode, but To verify the promising simulated speech quality results,
that does not generally give the optimal speech quality. informal listening tests were also performed. For all simulated
70
correct V. C ONCLUSIONS
-5dB, without adaptive thresholds
60
-5dB, with adaptive thresholds Codec mode switching thresholds that are adaptive to radio
conditions, referred to as Adaptive Thresholds, have been pro-
50
posed as a means to further improve AMR mode adaptation.
Simulations have shown that an adaptive thresholds algorithm
codec usage (%)

40 applied to AMR in GSM can improve objective speech quality


significantly. The improvements have also been confirmed in
30 informal listening tests.

20
R EFERENCES
[1] 3GPP TS 26.071, “AMR speech codec; general description.”
10 [2] 3GPP TS 26.171, “Wideband AMR speech codec; general description.”
[3] 3GPP TS 45.009, “Link adaptation.”
[4] S. Wänstedt, J. Pettersson, X. Tan, and G. Heikkilä, “Development of
0 an objective speech quality measurement model for the AMR codec.”
MR475 MR59 MR67 MR102
MESAQIN, 2002.
[5] 3GPP TS 45.008, “Radio subsystem link control.”
Fig. 6. The relative codec mode usage for the simulation with a 5 dB [6] ITU-T P.862, “Perceptual evaluation of speech quality (PESQ)”; ITU-
underestimation, with and without adaptive thresholds, compared to the T P.862.1, “Mapping function for transforming P.862 raw result scores to
relative codec mode usage for the case with correct C/I estimation. In all three MOS-LQO.”
cases the statistics are collected starting at the speech frame corresponding to
the last threshold adjustment in the simulation with adaptive thresholds.

cases, for overestimated as well as underestimated C/I mea-


surements, the speech quality improvements were confirmed.
Compared to the case without adaptive thresholds, the relative
improvements could be readily perceived.

S-ar putea să vă placă și