Documente Academic
Documente Profesional
Documente Cultură
Mode Selection
Tomas Lundberg, Peter de Bruin, Stefan Bruhn, Stefan Håkansson, and Stephen Craig
Ericsson Research
Ericsson AB
Sweden
Email: {tomas.lundberg, peter.de.bruin, stefan.bruhn, stefan.lk.hakansson, stephen.craig}@ericsson.com
Abstract— The speech codecs from the Adaptive Multi-Rate so that the total (gross) bit rate is constant. Consequently, the
(AMR) codec family enable provisioning of excellent speech qual- lower the source bit rate becomes, the more robust the codec
ity, at the same time providing a way forward towards state-of- mode is against bit errors, since a larger portion of the bit rate
the-art, spectrally efficient, high capacity cellular networks. One
straightforward way to characterize the benefit of AMR speech is used for channel error protection.
codecs is that the robustness to interference and noise in radio The gains from AMR are two-fold: When radio quality is
networks is increased and that this advantage over other, non- sufficiently good, a higher codec mode can be used, thereby
adaptive, speech codecs can be capitalized on in several different improving the perceived speech quality. On the other hand,
ways, e.g., by enhancing speech quality or improving system since lower AMR codecs modes are very robust, radio network
capacity. In this paper, improved mode adaptation, where codec
mode switching thresholds are adaptive to radio conditions, is planning can be made to allow an increased interference level
discussed. Example simulations show that an adaptive thresholds in the system, thereby increasing system capacity significantly,
algorithm applied to GSM can significantly improve objective comparing, e.g., narrowband AMR to the common Enhanced
speech quality. Corresponding improvements were also found in Full Rate (EFR) codec in GSM. Thus, introducing AMR may
informal listening tests. improve speech quality, increase capacity or yield a combined
I. I NTRODUCTION effect of both quality and capacity improvement.
This paper proposes Adaptive Thresholds as an improve-
Adaptive Multi-Rate (AMR) codecs are standardized by
ment to current AMR codec mode adaptation. Section II
3GPP for GSM [1], [2], the world’s most widespread cellular
describes mode adaptation in more detail and motivates the
technology, as well as for WCDMA. In GSM the codec
need for adaptive thresholds. An algorithm is exemplified in
mode adaptation is based on estimations of the radio link
section III, and simulations in section IV analyze its impact on
quality, while in WCDMA the AMR adaptation concept works
speech quality. Finally, conclusions are outlined in section V.
completely differently. Quality of Service control in WCDMA
basically corresponding to AMR codec mode adaptation in II. C ODEC M ODE A DAPTATION
GSM is done by means of fast power control, whereas codec A. General description
mode adaptation may be used as a tool for capacity control.
For codec mode adaptation, the receiving side performs
In this paper we will exclusively deal with AMR in GSM.
radio link quality measurements of the incoming channel
Two variants of AMR exist, narrowband AMR and wide-
yielding a quality indicator (QI), which is defined as an
band AMR (AMR-WB). Narrowband AMR consists of eight
equivalent carrier-to-interference (C/I) ratio [3]. The QI is then
codec modes with different source bit rates, from 12.2 kbps
compared to a set of pre-defined thresholds to decide which
down to 4.75 kbps. It provides the traditional audio bandwidth
codec mode to use. The thresholds are normally fixed during
of PSTN telephony of about 100–3500 Hz. AMR-WB contains
a call, but the system can initiate a change to the thresholds.
nine different codec modes with source bit rates from 6.6
To obtain the best possible speech quality it is important to
kpbs up to 23.85 kpbs,1 and with an audio bandwidth of 50–
properly select the thresholds for codec mode adaptation. This
7000 Hz. The increased bandwidth improves the intelligibility
implies that it is necessary to obtain a QI that correctly reflects
and naturalness of speech significantly at the same time as
the speech quality for any given radio condition, frequency
the quality for music and mixed content material is improved.
hopping scheme, and network configuration. Obviously, this
While we have focused on narrowband AMR in this paper,
procedure may be quite complicated. Furthermore, conditions
the same solution could be implemented also for AMR-WB.
vary over time. There may also be performance variations
Common to both AMR variants is that a number of codec
between different receiver units, both regarding actual perfor-
modes are collected into a pre-defined Active Codec Set
mance and QI estimation. This means that it is likely that even
(ACS), which usually is fixed during a call. The level of
well-selected adaptation thresholds will not be optimal at all
channel coding is adjusted depending on the source bit rate
times.
1 For GSM Full Rate channels three codec modes, with source bit rates of Fixed thresholds can be sub-optimal for the current condi-
6.6 kbps, 8.8 kbps, and 12.65 kbps, are feasible. tions by being either too high or too low. In the case where
correct
+2dB
+5dB
+8dB
Simulations were performed in an AMR link simulation
tool, using Typical Urban 3 km/h (TU3) as the fast fading
profile, additional slow fading (shadowing), and with different Fig. 2. Speech quality, according to [6], averaged over the whole simulation
frequency hopping scenarios. An ACS with the modes MR475, for TU3IFH and four different C/I overestimation levels.
MR59, MR67, and MR102 was used to evaluate the perfor-
mance gain, where a receiver with incorrect C/I estimation
was simulated. The incorrect C/I estimation could for instance 5 dB. For each such segment the speech quality is given as
be due to unusual radio environments, broken antennas, a the difference in speech quality relative to the corresponding
stationary MS in a non-frequency hopping environment, etc. segment in a simulation with the correct C/I estimation. The
Moreover, since the standard does not contain any detailed speech quality for the first segment (given by the first two bars
description on implementation but merely lists performance from the left) is considerably lower compared to a simulation
requirements, another cause for inconsistencies in the C/I using correct C/I estimation, and this is caused by not selecting
estimation could be different, though standard compliant, a sufficiently robust codec mode due to the overestimation of
receiver and C/I estimator implementations. the C/I. At this initial stage the threshold adaptation is not
For simplicity and ease of interpretation, a fixed, systematic yet in steady state. The first threshold adjustment occurred
error was studied. However, it should be noted that the after approximately 5 seconds, and after that the speech
algorithm is not limited to deal with only this kind of error. The quality, on average, improved considerably. The fluctuations
incorrect C/I estimations were modeled by adding a positive in speech quality, and the seemingly strange phenomenon that
or negative offset to the C/I estimates before the codec mode the speech quality without adaptive thresholds can be higher
selection. The offset was kept fixed during a given simulation than with adaptive thresholds for a few segments, have a
run, and varied between different runs. A positive C/I offset simple explanation: when codec mode switching thresholds are
(i.e. an overestimation of the C/I) corresponds to setting the determined, the optimal thresholds are those that are optimal
thresholds too low, while a negative offset (i.e. an underestima- over the long term, i.e. on the order of minutes. The eight
tion of the C/I) corresponds to setting the thresholds too high. second long segments used in Fig. 3 are so short that, due
As previously discussed, using too low thresholds is a more to random fluctuations in the radio environment, a codec
serious problem than using too high thresholds. Consequently, mode other than the globally optimal codec mode can give
more emphasis was put on evaluating the former case. better speech quality for just that particular segment. The
The simulations were run for 22,000 speech frames aggregated speech quality, e.g. as shown in Fig. 2, is obtained
(440 seconds) and the speech quality was evaluated using by averaging the segments over time.
PESQ, a tool for calculating objective speech quality [6], as The corresponding view in the FER domain is shown in
well as informal listening tests. FER statistics and codec mode Fig. 4. Here the accumulated number of frame erasures is
usage were also registered and used in the evaluation. plotted against time, with and without adaptive thresholds,
together with the location and size of the threshold adjust-
A. Overestimated C/I ments. The first threshold adjustment, in which the thresholds
The speech quality evaluation results for TU3 with Ideal were adjusted 3 dB upwards, came after only 4.8 seconds. A
Frequency Hopping (IFH) are shown in Fig. 2, where “correct” second adjustment of 2.5 dB occurred after a little less than
means correct C/I estimation, “+2 dB” an overestimation of two minutes. In this particular case, the algorithm slightly
2 dB, etc. As can be seen in the figure, when employing the overcompensated the C/I overestimation of 5 dB, since the
adaptive thresholds algorithm the speech quality is maintained thresholds were adjusted by in total 5.5 dB. This was due
at approximately the same level as without any offset, even at to a deliberate design choice: it was deemed more important
the higher C/I estimation error levels. to make rather large steps in the threshold adjustments in
Fig. 3 shows a detailed plot of the speech quality for each order to speed up the adaptation than it was to avoid the
segment of 8 seconds for the case of an overestimation of risk of overcompensating the C/I estimation bias. Such an
0.8 3.8
+5 dB, without adaptive thresholds without adaptive thresholds
0.6 +5 dB, with adaptive thresholds with adaptive thresholds
speech quality difference (∆PESQ-LQO)
0.4 3.7
0 3.6
-0.2
-0.4
3.5
-0.6
-0.8
3.4
-1
-1.2
3.3
correct
-2dB
-5dB
-8dB
-1.4
0 50 100 150 200 250 300
time (s)
Fig. 3. Speech quality for each 8 second segment for the case of a 5 dB Fig. 5. Speech quality, according to [6], averaged over the whole simulation
overestimation. The speech quality is given as the difference relative to a for TU3IFH and four different C/I underestimation error levels.
simulation with correct C/I estimation. Only the first 300 seconds are shown.
600
correct speech quality in this case is mainly due to the difference in
+5 dB, no adaptive thresholds intrinsic speech quality of the different codec modes. This is in
500 +5 dB, with adaptive thresholds
ackumulated number of FE
2 The optimal speech quality is obtained with a certain FER that in most C. Listening tests
cases is not the same as the lowest attainable FER level. In fact, the lowest
FER level is obtained by exclusively using the most robust codec mode, but To verify the promising simulated speech quality results,
that does not generally give the optimal speech quality. informal listening tests were also performed. For all simulated
70
correct V. C ONCLUSIONS
-5dB, without adaptive thresholds
60
-5dB, with adaptive thresholds Codec mode switching thresholds that are adaptive to radio
conditions, referred to as Adaptive Thresholds, have been pro-
50
posed as a means to further improve AMR mode adaptation.
Simulations have shown that an adaptive thresholds algorithm
codec usage (%)
20
R EFERENCES
[1] 3GPP TS 26.071, “AMR speech codec; general description.”
10 [2] 3GPP TS 26.171, “Wideband AMR speech codec; general description.”
[3] 3GPP TS 45.009, “Link adaptation.”
[4] S. Wänstedt, J. Pettersson, X. Tan, and G. Heikkilä, “Development of
0 an objective speech quality measurement model for the AMR codec.”
MR475 MR59 MR67 MR102
MESAQIN, 2002.
[5] 3GPP TS 45.008, “Radio subsystem link control.”
Fig. 6. The relative codec mode usage for the simulation with a 5 dB [6] ITU-T P.862, “Perceptual evaluation of speech quality (PESQ)”; ITU-
underestimation, with and without adaptive thresholds, compared to the T P.862.1, “Mapping function for transforming P.862 raw result scores to
relative codec mode usage for the case with correct C/I estimation. In all three MOS-LQO.”
cases the statistics are collected starting at the speech frame corresponding to
the last threshold adjustment in the simulation with adaptive thresholds.