Sunteți pe pagina 1din 50

Signal processing for Direct Stream Digital

A tutorial for digital Sigma Delta modulation and 1-bit digital audio processing

Derk Reefman
derk.reefman@philips.com and

Erwin Janssen
erwin.e.janssen@philips.com

version 1.0 18 December 2002

Contents
1 Introduction 2 Characteristics of Direct Stream 2.1 Example: Filtering . . . . . . . 2.2 Example: Non-linear operations 2.3 Example: Anti-aliasing lters . Digital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7 9 10 10 11 12 14 15 18 18 18 20 20 21 26 28 30 34 36 38 40 45 47 48

3 Sigma Delta Modulation 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Bit stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Characteristics of SD modulators 4.1 SDM silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 SDM stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Idle tones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Design of SDM modulators: I 5.1 Loop-lter design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Enforcing SDM stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Design of SDM modulators: II 7 Signal processing 8 Dithering and linearizing SDMs 9 Non-linearity in a SDM 9.1 Pre-correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 SDPC and dither . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Performance of a realistic SDM with SDPC . . . . . . . . . . . . . . . . . . 10 Acknowledgements A SDM-code

Glossary

ADC: Analogue-to-Digital Converter. This device converts analogue input signals (from, e.g., a microphone) to a digital signal that can be used in computations (for example in a PC program) (Anti-) aliasing lter: Filter designed to remove any signal larger than the Nyquist frequency. Authoring: Process in which the nal disc image is created. This includes lossless compression, creation of the table of contents etc.. Class-D: Amplier topology that relies on Pulse Modulation. The pulses drive switches which connect the load (loudspeaker) either to the positive or negative supply voltage. Characterised by high eciency; often also called digital amplier. Clipping: The phenomenon that when a format is designed to handle signal levels no larger than a level C, every level larger than C is coded as C. For example, the digital format on a CD cannot handle more than 65536 sub-levels; any signal corresponding to a level larger than +32767 is represented as +32767 (and likewise for negative signals less than -32768). Clock jitter: Technically the unwanted phase shift of digital pulses over a transmission medium. A discrepancy between when a digital edge transition is supposed to occur and when it actually does occur. DAC: Digital-to-Analogue Converter: the reverse of a ADC. Distortion: Any deviation from a linear input/output relationship, where a linear relationship is dened such that the output equals (apart from a constant gain factor) the input. Dithering: The addition of a (quasi-)random number to the signal which is subsequently quantised. Due to the dither, the quantization appears as an (almost) linear process. DSD: The digital format stored in Super Audio CD. DSD is a format in which 2822400 times per second a 1-bit signal is stored. Lowpass-ltering this signal will restore the original waveform. DST encoding: Direct Stream Digital, a lossless compression algorithm specically tailored 3

to the lossless compression of DSD signals. Editing: In it simplest form, editing is the process of cutting and pasting the music such that undesirable parts of the recording are removed. Often, also volume changes are applied and mixing of dierent channels is performed. Filter ringing: The eect that a lter with a steep transition band in the frequency domain produces artefacts in the time domain that extent over a signicant period of time. Idle tone: Tone appearing at the output of a noise shaper that bears a simple relation to the input of the Sigma Delta Modulator. Limit cycle: Signal at the output of a Sigma Delta Modulator that requires a precisely dened input in order to occur, and disappears if the input deviates slightly from the mentioned precise value. Linearity: See distortion. Lossless compression: A way of compacting digital audio streams such, that when they are unpacked the original stream is restored. Comparable with the ZIP program on PCs. Mastering: Process in which the edit master is subject to processes such as EQ to obtain the best sound performance. Matching: The accuracy to which electronic components are the same. This is important if an electronic circuit relies on the cancellation of two signals: if the components are not exactly identical, a residual (undesirable) signal will remain. Noise shaping: The shift of spectral content of the (quantization) noise. For example, in a Sigma Delta Modulator the energy of the quantization noise is shifted to high frequencies, leaving no or little noise at low frequency. Nyquist Frequency: The largest frequency that can be represented by a digital format; the Nyquist frequency is half the sample frequency. PCM: Pulse Code Modulation. A digital format, used for example in CD, whereby a digital signal is represented by an accurate representation (e.g., 16 bits, meaning that the range -1,+1 is subdivided in 65536 sub-intervals) of the wave form at equidistant points in time (for example, in CD 44100 times per second a 16-bit approximation of the wave form is stored). Pulse Density Modulation: A form of pulse modulation where a large positive signal is represented by a long series of positive pulses; a zero signal is represented by alernating 4

positive and negative pulses. Recording: The process of storing the music signals on a medium - either in analogue form or in digital form. (Re-)Quantization: The mapping of a signal of innite precision to a signal with limited precision. On a CD, e.g., a signal is quantized to 16 bits. Sigma Delta Modulator: Device which transforms an analogue or PCM signal in a DSD signal. Often abbreviated to SDM, and also often referred to as Delta Sigma Modulator. Super Audio CD: Super Audio Compact Disc. Format for music distribution proposed by Philips and Sony. Super Audio CD is based on a new digital format called DSD. Topology: Particular way of connecting building blocks to create a circuit. Up/Down sampling: A signal processing technique whereby the sample rate of a digital signal is enlarged or reduced. In the latter case, this also corresponds to a loss of information.

Introduction

The introduction of Super Audio Compact Disc (SACD) as a successor to the CD, has introduced the need for a change in signal processing. Underlying this change, is the radically dierent signal format that is adopted in SACD compared to CD. Whereas in CD the audio format is called Pulse Code Modulation (PCM), a 16-bit word, at a sample rate of 44100 samples per second, for SACD this is Direct Stream Digital (DSD), a 1bit word at a sample rate of 64 times 44100 samples per second. In the early nineties, the time of the conception of DSD, analogue-to-digital converters (ADCs) and digital-toanalogue converters (DACs) were built with 1-bit technology [9]. The driving forces for the use of this technology were pure technical: in the CD era, demands for distortion levels were becoming more stringent, and it proved virtually impossible to create low distortion devices with many (16) bits. Contrary to that, it was much easier to create low-distortion converters using a digital format of 1 bit, which were running at very high sample rates such as 64 or 128 times 44.1 kHz. The conversion of this high speed, 1-bit format to 44.1 kHz/16 CD format can easily be accomplished in the digital domain using ltering and signal processing, which does not introduce any non-linear distortion. This technique has been highly successful, and the so-called oversampling and /or bitstream technology has dramatically increased the performance of the CD-players in the nineties. In fact, those CD-players were all generating their own DSD internally from the CD source; this DSD would then be fed into a high quality, 1-bit DAC. It therefore seemed logical to introduce a format that would store this 1-bit output directly, instead of the intermediate CD format: in this way, all ltering and signal processing needed to convert to and from the 1-bit format is eliminated which, by denition, can only increase the sound quality. After the rst experiments with DSD, it appeared indeed that the sound quality was signicantly better compared to the 44.1 kHz/16 bit format. Also, at the same time, new ADCs and DACs were appearing on the market, that were still using high sample rates (64 or 128 times 44.1 kHz), but exploited a few bits (1.5 to 5) instead of 1. Again, this had purely technical fundamentals: ingenious tricks to reduce the distortion problems of a multi-bit converter had appeared, and were feasible to implement for a limited number of bits. Because 1-bit converters are more sensitive to clock-jitter, the few-bit converters took their place in the high-end audio market. This re-introduced the need for some mild signal processing, because SACD can only store a 1-bit format. Interestingly, this did not lead to any observable degradation in sound quality. Therefore, it is now believed, that the very high sample rate of DSD is the key factor in the extremely good sound quality of SACD. The fact that the data is 1 bit instead of few bit, however, has retained its value because it reduces the storage requirements of the audio, thus creating the possibility to store over 70 minutes of stereo and multi-channel DSD on a single Super Audio CD. The purpose of this document is to explain some technical details of Direct Stream Digital. It tries to give an overview of several signal processing steps which are needed in the world of DSD, which are dierent from the accustomed way of doing things. Its purpose is not to give a full explanation of the perceived sound quality of DSD; this white paper is meant 6

to be an introduction to DSD and DSD signal processing for the educated DSD-novice. Reecting the importance for SACD, a crucial part in this paper is the 1-bit Sigma Delta Modulator (SDM). The design of such a device will be discussed in detail, and a working example will be designed to illustrate the design process. Another important issue that will be dealt with, is DSD signal processing. A typical signal processing chain for DSD is provided in Fig. 1. In Fig. 1, several steps are envisaged which occur typically in the creation of an SACD. Most of these steps involve analog or digital signal processing in one way or another. Starting with the AD converter, this is not necessarily a native 1-bit converter. Often, high-end AD converters are 3-6 bit converters running at sample rates between 128fs and 512fs , where fs is symbolic for a sample rate of 44.1 kHz or 48 kHz. These signal formats need to be converted to 1-bit formats, where any change to the signal information is to be avoided. As this introduces the need for a 1-bit SDM, we will start with some introduction to Sigma Delta Modulation, and the various options that exist to realize a SDM. In the editing phase, volume adjustments need to be done, and switching between bit streams is necessary. Switching of bit streams is a technique which is rather dierent from standard signal processing, and is detailed in a separate document [12]. In the mastering phase, heavy signal processing is often involved, ranging from relatively simple equalization to sophisticated reverberation techniques. In the sequel, it will be demonstrated how most of the sophisticated techniques developed for PCM can be easily adjusted for application to DSD. In this respect, it is essential to realize that DSD at 64fs is a consumerformat hence, not necessarily the format that is used in the studio which can be in principle any format as long as it is of equal or better quality compared to standard DSD. In the authoring phase, nally, no changes to signal content are made anymore. However, in most cases the format of the data will be transformed to DST (Direct Stream Transfer), which is the compressed format of DSD. This lossless compression scheme allows multichannel, high quality DSD data to t on a the approximately 4.7 Gbyte of a high density layer of an SACD disk.

Characteristics of Direct Stream Digital

Before diving into the generation of Direct Stream Digital (DSD), we will rst review some characteristics of the format as it is used within the context of Super Audio CD. First and foremost, DSD characterizes itself by the huge sample rate of 64 times 44.1 kHz, or 2.8 MHz. Rather irrespective of the number of bits, high sample rates in the digital world are desirable because the larger the sample rate, the less the audio artefacts introduced by the time quantization. We will review a few examples, which show up the phenomenon that 44.1 kHz (or a small multiple of it) is not enough to avoid signicant signal distortions due to the time quantization. 7

Recording

Editing

Mastering

Authoring/ DST encoding

SACD Player

Figure 1: Typical signal processing chain for DSD applications.

1.6

1.4

1.2

warped frequency

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8 analog frequency

1.2

1.4

Figure 2: The eect of warping the analog frequencies to the limited range of digital frequencies; the frequencies are in reduced units (i.e., 0 . . . ). Red: the warped frequency. Green: the original frequency. The vertical line shows till what frequency the warped frequency can be considered to be an accurate representation of the actual frequency.

2.1

Example: Filtering

A well-known issue in (time discrete) digital systems is the problem of mapping (warping) the innitely high frequencies, which are allowed in a time-continuous (analogue) system, to a system where the highest representable frequency is the Nyquist frequency (half the sample rate). Obviously, as the ultimate goal of digital signal processing is to present an improvement over analog signal processing, this is a very serious issue. Exemplary for this problem is the bi-linear transform, which maps the analogue frequency a to the digital frequency d according to: 2 atan(a T /2) (1) T where T is the sampling period. As illustrated in Fig. 2, this mapping is almost linear only for a limited frequency regime; for frequencies above 0.1 fs quite substantial deviations occur. As a result, mapping an analog lter (say, a Butterworth lter) to its digital equivalent causes signicant distortion of its frequency response. If the sample rate is very high (as, for example, with DSD) the mapping artefacts are benign in the frequency regime which is most important for audio reproduction. Obviously, it is still possible to create a lter which has the characteristics of a digital lter at low sample rate; hence with the use of DSD, one has signicant freedom in the choice of lters and lter characteristics. d = 9

2.2

Example: Non-linear operations

In audio signal processing, operations such as compression/limiting and clipping are quite common. In compression/limiting, the gain of the signal is adjusted according to the signal; this clearly represents a non-linear operation. Also in clipping, the signal transfer is highly non-linear. If these non-linear operations are performed in the analog domain, they will cause higher harmonics to appear. For example, if a 14 kHz signal is clipped, this will give rise to a third harmonic component at 42 kHz. In the analog domain, this could then be ltered o, if desired. If the clipping were done in the digital domain at a sample rate of 44.1 kHz, however, the 42 kHz harmonic would alias back to low frequency: 42 kHz is 19.95 kHz above the Nyquist frequency (22.05 kHz). The third harmonic would thus be aliased to (22.05-19.95) = 2.1 kHz, which would give very audible distortion as that frequency is not harmonically related to 14 kHz. Also, there is no way to remove this distortion by a lter operation. The only remedy is to up-sample to a high frequency, and do the non-linear operation at that high rate, thus ensuring that only high frequency, high order harmonic components are aliased. This causes less harm, because high order components tend to be of lower amplitude. Then, down-sample with the appropriate low pass ltering again to 44.1 kHz. Now, obviously, in DSD the sample rate is so high that non-linear operations behave as they would in the analog domain. Hence, no up- and down sampling is required, and the decision whether to remove high order distortion components or not is to the sound engineer - and not dictated by the format.

2.3

Example: Anti-aliasing lters

Because of the extremely high sample rate, DSD sets only very relaxed requirements for the anti-aliasing lters, which, hence, can be chosen to be rather sloppy. As a result, the ringing in the time domain is substantially lower compared to systems of lower sample rate where steep anti-aliasing lters are mandatory. This eect is clearly illustrated in Fig. 3. The impulse responses of 4 dierent systems in a multi-channel conguration are depicted: a 48 kHz system, with a bandwidth of 20 kHz (that is, 8 kHz transition bandwidth is allowed for anti-aliasing ltering), a 96 kHz system with 35 kHz bandwidth (26 kHz transition bandwidth), a 192 kHz system with 75 kHz bandwidth (42 kHz transition bandwidth) and an SACD system with 95 kHz bandwidth (and about 120 kHz transition bandwidth). Though none of the systems reproduce the input exactly, the DSD system shows the least artefacts. Clearly, the 48 kHz system has great diculty in reproducing the click; due to the steep ltering it starts wobbling, or ringing, at a -30 dB level with respect to the top of the response approximately 0.4 ms before the click, which is very audible (this is also the reason why many people prefer sloppy anti-alias lters in CD-players; even at the cost of reduced anti-aliasing characteristics). It also continues to ring after the click for the same length of time, but most possible this after-ringing is audibly masked by the click it self, and, hence, not as important as the pre-ringing. Apart from this eect, also the amplitude is only a fth from what it should be. Especially when the sound will traverse through a non-linear medium, such as the human ear, this may lead to even larger 10

0.25

test.48 test.dsd test.192 test.96

0.2

0.15 Amplitude

0.1

0.05

-0.05 0.0052

0.0054

0.0056

0.0058 time (s)

0.006

0.0062

0.0064

Figure 3: Responses (from left to right) of a DSD, a 192 kHz, a 96 kHz and a 48 kHz system on a -6 dB block input (click) of 3 s duration, and amplitude 0.25. Note the linear amplitude scale. perceived dierences than what can be concluded directly from Fig. 3. Also at the higher sampling frequencies, the ringing phenomenon cannot be removed, though it is reduced signicantly. Only the DSD system is very eective in suppressing the ringing eect, due to very slow ltering above 95 kHz. The price to pay for this is the increase in noise oor with respect to the other systems; however, as the noise oor contains only high frequency components which are uncorrelated with the audio, they are not perceptible.

Sigma Delta Modulation

In this section, it will be assumed throughout that the sample rate equals 64 times 44.1 kHz, ( 2.8MHz) i.e., the sample rate of SACD. By far the most common way to generate such a 1-bit DSD stream is by the use of a Sigma Delta Modulator (SDM), although it is nowhere stated in the denitions of Super Audio CD [10] that the bit-stream present on the disk must be generated by a SDM. In fact, recently many other methods have been developed which are not simply a (single) SDM. For example, in [3] a type of SDM with an elaborate re-ordering scheme is presented, and in [5] a so-called Trellis-SDM is presented. In [11], a cascaded structure of 2 SDMs is presented, which will be presented in a slightly modied form in Sec. 9. All of these new developments have in common that their performance is in some way better than that 11

0 -20 -40 -60 -80 Power (dB) -100 -120 -140 -160 -180 -200 -220 100

1000

10000 Frequency

100000

1e+06

Figure 4: Typical output spectrum of an SDM (4 kHz, -6 dB input). of an ordinary SDM, but at the same time there is a substantial increase in complexity. Because a single SDM is still at the basis of all these new developments, and because a standard SDM is still by far the most widely used device to generate a bitstream, we will continue by elaborating on the principles of a simple SDM.

3.1

Overview

Sigma Delta Modulation, often also known as noise shaping, is in most general terms a technique which allows (digital) quantization errors to be spectrally shaped. In the SDMs that are typically used for DSD applications, the aim of this spectral shaping is to push the gross quantization errors made by the course 1-bit quantizers to high frequencies, where these errors are inaudible. This is possible due to the high oversampling factor: 64, which leaves a band of approximately 80-100 kHz (which is determined by the maximum allowable input, as will be discussed later in Sec. 5) to 1.4 MHz (the Nyquist frequency) to accommodate virtually all the quantization errors. An illustration of this phenomenon is given in Fig. 4. Indeed, the spectrum illustrates that this SDM design allows for a very high dynamic range in the audio band (0-20 kHz), decreasing dynamic range in the band from 20 to 80-100 kHz, from where the dynamic range remains constant till 1.4 MHz. 12

Schematically, a SDM can be represented as in Fig. 5.


u
-

H(z)

u -

F(z)

Figure 5: Above: Sigma Delta structure (in feed forward conguration). Below: equivalent noise shaper structure. Historically, the SDM is preceded by the noise-shaper (NS) (also see Fig. 5). The most signicant dierence between a noise shaper architecture and a sigma delta structure is the position of the lter: in a noise shaper, the lter is in the feedback loop, in a SDM the lter in the feedforward loop. Due to the lter in the feedback loop, the error of the quantizer is spectrally shaped by the lter F (z) and fed back to the input of the quantizer. It is this process, which is called noise shaping of the quantization error. Though this appears rather dierent from a SDM, the noise shaper structure is virtually identical to the SDM topology. In fact, the SDM and the NS in Fig. 5 are identical if the lter F (z) in the noise shaper equals F (z) = H(z)/(H(z) + 1). In that case, the input still needs to be pre-amplied by the lter H(z)/(H(z) + 1) to obtain an identical signal transfer function. It is important to realize that, because of their equivalence, both a noise shaper and an SDM perform noise shaping of the quantization noise. Because of that reason, a SDM is often (mistakenly) called a noise shaper, even though the topology of a noise shaper is dierent from a SDM. The noise-shaper architecture is not often used in analog to digital converters because matching in the analog domain is dicult, and thus leads to implementation problems. Generally one resorts to SDM topologies, where one has less analogue problems. In the digital domain, where precision is arbitrary, matching is not a fundamental problem and 13

both structures can be used. Because of the identical nature, we will restrict the discussion to the SDM-like structures.

3.2

A linear model

For applications in SACD, the quantizer Q in a SDM is a 1-bit quantizer, which outputs only values of +1 and 1. This is a highly non-linear element, which has its ramications on the operation of the SDM. To gain some initial insight in the characteristics of the SDM, however, we will resort to a simple linear model and replace the highly non-linear quantizer by a (linear) gain c and a noise source n, which models the quantization error, as indicated in Fig. 6.
n u
-

H(z)

Figure 6: Linearization of Sigma Delta structure. The quantizer is replaced by a (signal independent) gain, and an additive noise source. The signal transfer function STF and noise transfer function NTF are dened by Y = ST F.U + N T F.N , where Y is the fourier transform of the output y, U is the fourier transform of the input u and N the fourier transform of the additive noise n. Doing this, we can write for the signal transfer function (STF) and the noise transfer function (NTF) the following expressions: ST F (z) = N T F (z) = cH(z) 1 + cH(z) 1 1 + cH(z) (2) (3)

Assuming that the quantizer gain c 1, this shows how, in a situation where the loop-gain H(z) is very large, the signal transfer function approximates 1. The noise transfer function, on the contrary, is negligible for large H(z). As the loop-lter H(z) typically is a low pass lter, with large LF gains, it shows that in SACD applications, the quantization noise is suppressed in the audio band. In Fig. 4, for example, the loop-lter is a Chebyshev type II design with a corner frequency of 90 kHz. It is of crucial importance, however, to realize that the replacement of the quantizer by a gain element c and an additive noise source, is a very crude approximation, the more so if c = 1 is taken. Typically, the Signal-to-Noise Ratios (SNRs) as calculated from simulations on the actual SDM with the non-linearity included, dier signicantly from those obtained 14

c1

c4

c 4

c 3

Figure 7: Above: A fourth order Sigma Delta structure in feed forward conguration. Below: a fourth order feedback topology. If c1 = c1 /c4 ; c2 = c2 /c4 etc., the NTFs of these modulators are identical. by the use of the linearized model. Also other characteristics, discussed in Sec. 4, are not properly, or not at all, explained by the linearized model. There also exist other SDM realizations. Whereas the SDM structure in Fig. 5 is referred to as a feed-forward topology, there also exist feedback topologies. A feedback topology is displayed in Fig. 7. Like in the comparison of the noise-shaper vs. feed-forward SDM, there is some equivalence between a feedback and feed-forward topology. We will see this in a next section. The choice of which topology to use is then dependent on the design of the complete system.

3.3

Bit stream

In Fig. 8, a characteristic output sequence of a SDM is shown, receiving a sinewave of amplitude 0.95 and frequency 20 kHz as its input. Even though Fig. 4 leaves no doubt 15

0.5

-0.5

-1 250 300 350 sample number 400 450 500

Figure 8: Comparison of the DSD output of a SDM (red) and the input to the SDM (blue). Clearly, in regions where the input sine wave is negative, the bits that are output from the SDM are predominantly negative, and vice versa. about the very high accuracy with which the signal is represented in the SDM output, it is hard to visualize the sine wave from a series of +1s and 1s. An idea is that the signal that is represented by the bit stream can be obtained by taking a local average of the bitstream: clearly, when the input sine wave is positive/negative, most bits that are output from the SDM are positive/negative too, and outnumber the opposite bits by far. Likewise, around zero input the number of positive and negative bits is roughly identical. Hence, the global wave form of the underlying (low frequency) signal can be estimated by taking a local average of the bit stream - akin to pulse-density modulation, which is sometimes used in Class-D ampliers. Obviously, this local average will not represent a highly accurate representation of the wave form. A better impression about the accuracy with which the input is represented is obtained by ltering the output of the SDM with a lter which removes the signal in the DSD stream above 20 kHz (in fact, local averaging is a low pass lter, albeit not a very good one). It is, therefore, informative to build a system as presented in Fig. 9. This system allows us to compare the original input signal with the signal which has passed through the Sigma Delta Modulator. To this end, the bit stream output of the SDM is lowpass ltered with a steep lter, such to remove any components above 20 kHz. The 16

SDM

nT

Figure 9: Setup which allows to compare an upsampled, low rate high resolution signal with its DSD equivalent. Note, that the down sampling is necessary only for the purpose of comparison. input signal can be any signal of large enough resolution; below, we will take signals with a (digital) resolution of 24 bits. Due to the lter after the SDM, the input signal has to be delayed by an appropriate amount to compensate for the delay introduced by the lter. If the input and output signals are subtracted, the residual signal can be inspected.
8e-07 1kHz 20kHz

6e-07

4e-07

Absolute difference

2e-07

-2e-07

-4e-07

-6e-07

-8e-07

5e-05

0.0001

0.00015

0.0002 Time (s)

0.00025

0.0003

0.00035

0.0004

Figure 10: Time domain representation of the dierence signal , for both a 1 kHz input signal (red) and a 20 kHz signal (green). In Fig. 10, two results are displayed: a rst residual signal , where the input signal was a sine wave (0 dB SACD, 1 kHz) and a second signal (0 dB SACD, 20 kHz). The resulting signal is, for both inputs, noise-like, with an amplitude that corresponds to a resolution of at least 120 dB. Like wise, this experiment can be performed with real audio signals, where the result will be the same. Obviously, when the low-pass ltering applied in the down sampling process is not suppressing the noise above 20 kHz to a level of -120 dB, the residual signal will be larger. 17

A separate issue that becomes clear from Fig. 8 is the fact that while negative parts of a sine wave are represented by predominantly negative output values, the pattern in which the +1s and 1s appear is never the same. This observation leads to the issue of editing DSD streams. While this is an important topic, the reader is referred to [12] for a discussion of editing and switching DSD, as this is a non-trivial issue.

Characteristics of SD modulators

Sigma Delta modulators represent a new class of devices, which will display other phenomena as we are used to in the PCM world. In the sequel, a few of these features which are important in practical applications will be highlighted.

4.1

SDM silence

Sigma Delta modulators have some characteristics which we are not familiar with in the PCM world. A rst important aspect is that the output of the SDM always has a power equaling 1, because the output can only take the values 1. As a result, silence, as referred to in DSD, only means that the power spectrum of the DSD is empty below a threshold, above which any signal cannot be perceived. For example, the following repetitive patterns are often used and are referred to as DSD silence patterns: pattern 01010101 10101010 10010110 01101001 hex code 0x55 0xaa 0x96 0x69 pattern frequency 1.4 MHz 1.4 MHz 352.8 kHz 352.8 kHz

Indeed, the patterns do not contain signal components below 80 kHz; however, they still represent signals with a total power of 1. Often, these patterns are referred to by their hexadecimal equivalent. The fact that these signals are silent, but still contain information, can be exploited to use these signal as synchronization words [4].

4.2

SDM stability

Another important aspect is that, while the output of the SDM varies between 1, its input most often cannot vary over this range because the SDM becomes unstable for inputs of high amplitude. While a full theoretical description of this phenomenon is still lacking, a wealth of heuristic knowledge [9] is available on the stability of higher (> 2) order SDMs. Because of all this experimentally obtained insight, accurate descriptions of instability are present that can be used in the design of properly functioning modulators (see Sec. 5). In g. 11, the performance of a SDM is shown as function of its input amplitude. Clearly, above a certain threshold, the performance collapses (in fact, the SDM gets into wild oscillations if no precautions are taken). The exact amplitude where the sudden collapse 18

140

120

100 Signal to Noise Ratio (dB)

80

60

40

20

0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 input amplitude

Figure 11: Graphical representation of the stability problems for large inputs: for a simple SDM (in red) discussed in Sec. 5 the SNR collapses for signal amplitudes (in this case: a 4 kHz sine wave) of more than 0.59. In green the result of so-called graceful degradation is shown.

19

occurs, is dependent on the wave form of the input and its frequency, and the SDM design, and is thus not an easy quantity to determine. In section 5, precautions that can be taken to prevent this uncontrolled behaviour are discussed, that lead to so-called graceful degradation: instead of a sudden collapse in performance, the performance drops in a much less aggressive way. This overload phenomenon is the reason why the SACD 0 dB reference level has been set to 50% of the maximum theoretically possible modulation depth [10]; in the cases discussed here this means that allowable input levels vary between 0.5 and 0.5. This denition introduces the possibility to allow signal levels which are larger than 0 dB, in contrast to PCM which has a clear limit at 0 dB: all inputs larger than 0 dB are harshly clipped to 0 dB. As will be clear in following sections, for SDMs this overload is possible at the limited cost of increased distortion (clipping of the internal integrators). In this respect, the DSD format compares to analogue tape recordings, which also allowed for serious signal overload, but also at the cost of signicant distortion. Obviously, for high delity recordings for Super Audio CD, the 0 dB level should never be crossed.

4.3

Idle tones

As discussed in the section Sec. 4.1, silence in DSD is often equivalent to having a high powered tone outside the signal band. These tones are called idle tones. For higher order SDMs, the 1-bit output signal still carries these idle tones, although they have much reduced amplitude compared to the purely repetitive patterns shown in Sec. 4.1, and are embedded in a large amount of uncorrelated noise. For non-zero DC inputs, these tones start to move down in frequency with increasing DC level; at the same time, tones may start to appear in the LF part, which can, potentially, be audible. The origin of the tones appearing in the LF part lies in the feedback character of a SDM: suppose, we have a DC input of 0.25. The most likely combination of bits which represents that value is 1, 1, 1, 1, 1, 1, 1, 1. If this sequence is repeated, a tone a frequency of 8 44.1 kHz will result. For each halving of the DC input, this frequency will be halved too; eventually, this tone will end up below 20 kHz. This phenomenon can be reduced, or even removed, in several ways, as will be discussed more extensively in Sec. 8, by dithering and other means. If the SDM is undithered, audibility of these tones depends on the SDM used. Typically, the higher the order of the SDM, the lower the power of the tone in the audio band. For a typical (undithered) SDM the tones in the audible band are below -130 dB.

Design of SDM modulators: I

In this section, a fully operational SDM will be designed. We will use the linearized model of the SDM to obtain values for the coecients of the SDM, following to a large extent the design route proposed in [2], and also discuss ways to ameliorate the stability problem. From the start, it is important to know that the only way to obtain reliable insight in the performance of the SDM, is by simulation; although the linear approximation usually 20

results in a working SDM, it is too crude to provide numbers about SNR and, even more important, it does not provide any insight in stability. Also, in the design process, we assume an eective quantizer gain c = 1. Simulations based on this design can give some idea about what the eective gain c actually is within the limitations of the linear model, and be used for further renement of the loop-lter.

5.1

Loop-lter design

A very convenient way to start the design of a SDM modulator is the linear model of Fig. 6, where we take the gain c = 1. We take a feed-forward structure from Fig. 7, and write down the NTF that is associated with it. We can write for the loop-lter H(z): z 1 z 1 2 z 1 3 z 1 4 + c2 ( ) + c3 ( ) + c4 ( ) 1 z 1 1 z 1 1 z 1 1 z 1 and making use of the relation N T F (z) = 1/(1 + H(z)) we arrive at: H(z) = c1 (4)

N T F (z) =

(1 z 1 )4 (5) (1 z 1 )4 + c1 z 1 (1 z 1 )3 + c2 z 2 (1 z 1 )2 + c3 z 3 (1 z 1 ) + c4 z 4

which is to be recognized as a lter of the appearance N T F (z) = (1 z 1 )n /Pn (z 1 ). This is the form of a Butterworth or a Chebyshev type II lter1 ; the choice of either of those realizations dictates the nal appearance of the polynomial P (z). Likewise, the STF can be computed as ST F (z) = 1 N T F (z), resulting in: c1 z 1 (1 z 1 )3 + c2 z 2 (1 z 1 )2 + c3 z 3 (1 z 1 ) + c4 z 4 ST F (z) = (6) (1 z 1 )4 + c1 z 1 (1 z 1 )3 + c2 z 2 (1 z 1 )2 + c3 z 3 (1 z 1 ) + c4 z 4 The approach that can now be followed is to design a high-pass lter for N T F (z), according to Butterworth or a Chebyshev-II (or any other) rules, and reorganize terms such that it is in the shape of Eq. (5). One way of approaching this is to use a symbolic manipulation package such as Mathematica [14], or to collect terms in powers of z and equate identical powers. From an engineering point of view, a very easy way of obtaining the coecients ci is by recognizing that 1/N T F (z) is linear in the coecients ci . It is then possible to set up a linear system for (at least as many as the order of the system) dierent values of z. These values must have no simple relation to each other, but need not be complex. In this way, it is also irrelevant whether the Butterworth lter is provided as a cascade of biquads, or as a direct realization. When we inspect the feedback structure (lower part of Fig. 7), we see that the transfer characteristic for the N T F (z) is identical to the NTF of the feed-forward structure discussed above.
albeit scaled such that the rst term c0 z 0 of H(z) equals zero. If this term were non-zero, the resulting SDM would not contain a delay in the closed loop and hence be not realizable.
1

21

However, the STF is given by z 4 (7) ST F (z) = (1 z 1 )4 + c1 z 1 (1 z 1 )3 + c2 z 2 (1 z 1 )2 + c3 z 3 (1 z 1 ) + c4 z 4 which, for low frequencies equals about 1 if the coecient c4 equals unity (this refers to the scaling as applied in Fig. 7). For higher frequencies, the STF displays an almost third-order roll-o. This is in contrast to the feed-forward topology, where the STF rolls o only very slightly (rst order) for high frequencies. As an example, we will design a fourth order SDM, with a NTF according to a Butterworth high-pass lter design. The cut-o frequency is chosen as 150 kHz. Because the SDM needs to be realizable, the total loop needs to embody at least a single delay, i.e., the term with z 0 in the STF needs to be zero. This corresponds with the requirement that the high pass lter should have 1 as its rst value of the impulse response. This can be accomplished by multiplying the high pass lter with a certain coecient (larger than 0), resulting in a HF gain which is larger than 1. With the above in mind, we obtain for the NTF: +1.00z 0 4.00z 1 + 6.00z 2 4.00z 3 + 1.00z 4 N T F (z) = +1.00z 0 3.13z 1 + 3.75z 2 2.03z 3 + 0.42z 4 This results in the following coecients in the feed-forward structure: c1 c2 c3 c4 = 0.8707115357 = 0.3594322506 = 0.0811807847 = 0.0083240406 (8)

(9)

For the feed-forward structure, the STF is now given by: +0.00z 0 + 0.87z 1 2.25z 2 + 1.97z 3 0.58z 4 +1.00z 0 3.13z 1 + 3.75z 2 2.03z 3 + 0.42z 4 For the feedback structure, the STF is given by: ST F (z) = (10)

z 4 (11) ST F (z) = +1.00z 0 3.13z 1 + 3.75z 2 2.03z 3 + 0.42z 4 In Fig. 12, the dierent STFs for a feed-forward and feedback structure, with an identical NTF, have been calculated. The NTFs are designed as 4th order Butterworth high pass characteristics, with a cut-o frequency of 150 kHz. Clearly, the strong roll-o characteristic of the feedback structure can be observed. Interestingly, the feed-forward topology displays a strong peak in its transfer characteristic at the cross-over frequency. This feature is not obvious from Eq. (3) if only the magnitude response |H| is used. The maximum peak height is in this case about 6 dB. This loop-lter design gives rise to an SDM with a maximum input of about -5 dB (i.e., 0.57 w.r.t. the feedback signal from the quantizer). At an input of a sine with an amplitude 22

10

FF.STF FB.STF

-10

Gain (dB)

-20

-30

-40

-50

-60

10

100

1000

10000 frequency (Hz)

100000

1e+06

1e+07

Figure 12: Signal transfer functions for a feed-forward topology (red) and a feedback topology (green) with identical NTFs.

23

cut-o (kHz) 100 120 150 170

DR (dB) 85 90 97 100

max. input level 0.77 0.70 0.57 0.49

Table 1: Trade-o of the maximum input range and the SNR in the base-band. of 0.5, the (unweighted) Signal to Noise Ratio (SNR) in the band 0-20 kHz is about 97 dB. In SACD applications, this is not sucient: a signal-to-noise ratio of at least 100 dB is desirable. However, one might argue that the A-weighted SNR is much better, because the noise oor is large only for frequencies close to 20 kHz. Indeed, for this example, the A-weighted SNR amounts to about 105 dB. More important is the maximum modulation depth of the modulator. The denition of the 0 dB level in SACD is 50% modulation depth, i.e., the sine wave from the previous example would correspond to 0 dB SACD exactly. Peaks in the signal of +3.1 dB are allowed (though for a short period only)2 . Hence, the SDM needs to be stable for inputs up to a level of about 0.71. For every SDM design, there is a trade-o between stability of the modulator and the SNR in the base-band. As an example, consider the results in table 1 for dierent 4th order SDMs, which have all been created using Butterworth high pass lters as design NTF. Clearly, for these modulators it is not possible to obtain a dynamic range (unweighted) exceeding 100 dB, while maintaining the possibility for seriously overloading the SDM to a level of +3.1 dB. One way of increasing the SNR in the audio band, while hardly reducing the maximum input level, is to use higher order lters for the NTF, and to use a Chebyshev type II -like high pass lter for the NTF design instead of a Butterworth characteristic. Chebyshev type II high pass lters can easily be created in SDMs by the construction of resonator sections, as displayed in Fig. 13. The construction in Fig. 13 is, in principle, applicable to a feed-forward topology; for a feedback topology, a similar arrangement with a feedback loop over two integrator sections is possible. In Fig. 13, two outputs of the resonator section are indicated, R1 and R2 ; the relation between these is that R2 (z) = h(z)R1 (z), designating the transfer characteristic of the integrator section as h(z) = z 1 /(1 z 1 ). Also, two dierent realizations of the feedback path (with coecient f ) are possible. The full drawn curve in Fig. 13 doesnt incorporate the delay that the dotted realization does. The eects of the dotted feedback structure can be obtained as follows. The transfer R 2 (z) of the resonator section becomes: R2 (z) =
2

h2 (z) 1 + f h(z)2

(12)

These and other audio requirements are in part 2 of the SACD scarlet book[10]

24

R2

R1

Figure 13: A cascade of two integrator sections in a SDM, with a feedback loop between the integrators. The two dierent ways of incorporating the feedback loop result in slightly different pole characteristics. Indicated are the two dierent outputs, which are characterized by a transfer function R1 (z) and R2 (z), respectively. This function has a pole at zp when z = zp solves (1 + f )z 2 2z 1 + 1 = 0, i.e., Hence, the norms |zp | > 1. The reduced frequencies fpole of these poles are thus given by fpole = atan( f) (14) In the case of the full feedback path in Fig. 13, the resonator has a transfer function R1 , R2 (z) given by: R1 (z) = h(z) ; 1 + zf h(z)2 R2 (z) = h(z)R1 (z) (15) zp = 1 i f (13)

In this case, the poles are given by i f 4f f 2 (16) 2 2 Contrary to the previous case, these poles are exactly on the unit circle. The pole frequencies are given by: zp = 1 f ) (17) 2 which, for small values of f , virtually coincides with the pole frequencies given by Eq. (14). As such a feedback loop over two integrator sections transforms the two poles at DC (z 1 = 1) in two complex conjugate poles away from DC, care should be taken that there is enough DC gain in the loop-lter to avoid DC drift. As an example, consider the 4th order SDM with a Butterworth design, corner frequency 150 kHz. Choosing the poles to move from DC to 10 and 19 kHz, the numerical values of the feedback coecients obtained are 0.000496 and 0.001789. The SDM obtained has a maximum input of 0.57 (0.57 without resonators) and a SNR of 107 dB (97 dB without resonators). Indeed, the addition of the fpole = acos(1 25

+C

Figure 14: Principle of a clipped integrator. The absolute value of the output of the integrator cannot exceed a value of C. poles, turning the Butterworth characteristic in a Chebyshev II - like characteristic, gives signicant better SNR; the DC suppression of the looplter is still better than 120 dB, which is sucient. Compared to the A-weighted SNR gures, the improvement is less, because the poles primarily serve to suppress the noise between 10 and 20 kHz. A further improvement can be obtained when using a fth order SDM, with a Butterworth NTF design (corner frequency 110 kHz) plus the poles at 10,19 kHz: in that case the SDM is stable to inputs up to 0.58, with a SNR of 120 dB. Note, that in this case, there is still 1 integrator with a pole at DC, and thus there cannot be any DC drift. To clarify the operation of such a SDM, pseudo-code of the SDM is provided in App. A. A drawback of the above implementations of resonator sections is that the resulting lter is not minimum phase; due to this, not the full potential that noise shaping oers can be realized. Although the improvement that can be realized by a minimum-phase lter is (in this case) limited, a very interesting suggestion is the following3 . Suppose that we create a resonator section, which contains both the dotted and the full drawn realization of the feedback. Denote the feedback coecient in the full drawn realization by f1 , the feedback coecient in the dotted structure by f2 . The poles ((1 + f2 ) (1 f21 )2 ) are then given by zp = (1 f1 )i 2 (1 + f2 ) (1 f1 2 ) 2 (18)

1+f2 and have |zp | = 1 + f2 , with reduced pole frequencies fpole = atan( (1f1 /2)2 1). Hence, the radius and pole position can be adjusted independently, and it is possible to have |zp | < 1 at the cost of an additional feedback path in the resonator section.

5.2

Enforcing SDM stability

So far, we have not bothered about what happens if the SDM input exceeds its maximum: the SDM gets into wild oscillations, with constantly increasing amplitude in the integrator states and decreasing frequency. Even worse, when the input is removed from the system, the SDM does not return to its original state. To avoid such a situation, it is customary to use clippers in each integrator stage. In Fig. 14, a schematic representation of a clipped
3

This observation has been made by prof. S.P. Lipshitz.

26

integrator is given. The idea is that the output of the integrator can never exceed its clip value, C. In other words, the integrator section simply stops integrating when the cliplevel C has been reached The purpose of these clippers is to avoid a situation where the values in the integrator stages get too high (and cause the SDM to start to oscillate), while still allowing integrator values which occur during normal operation. Whereas the main purpose of the clippers is to let the SDM return to normal operation after overload, it is also desirable to avoid serious distortion in the signal if clipping occurs. Heuristic ways of obtaining reasonable numerical values for the clipper levels are monitoring the integrator levels during very large sine wave inputs and square wave inputs, close to overload of the SDM. The clipper levels C1 and C2 of the rst 2 integrator stages can be set according to these values. If the higher integrator stages are assigned values according to this recipe as well, the situation occurs that the SDM returns to normal operation after overload, but can have all clippers activated simultaneously. This will cause serious clicks and pops (especially if the rst integrators run in their clippers). Hence, the higher order clippers should be designed such that the high order clippers are activated rst, before the low order clippers are activated. As an example, let us consider the fth order SDM designed previously. Its feed-forward coecients are: c1 = 0.79188240; c2 = 0.30454538; c3 = 0.06992965; c4 = 0.00949572; c5 = 0.00060680 with resonator coecients: f1 = 0.000496; f2 = 0.001789 The pseudo-code of this SDM is provided in App. A, suitable for easy implementation in any programming language. Without any clippers, the SDM is stable for sine inputs up to 0.58; for higher amplitudes, the SDM gets fully unstable. Looking at the maximum integrator values during operation close to overload, we obtain a value C1 and C2 for the rst and second clipper respectively of about 4 and 9. The following clipper values are chosen such, that the product Ci ci of the clipper value and the corresponding feed-forward coecient is reduced by about 1.5 - 2 per integrator stage. This is illustrated in table 2. From table 2, we can obtain some idea about the inuence of the clippers on the SDM operation. The clippers are sometimes activated during operation at 0.5 input level, which causes a small reduction in SNR with respect to the 120 dB without clippers. However, whereas the original SDM turned unstable at inputs of 0.59, its clipped version shows continuous stable operation. Even at inputs of 0.65, the rst integrator is not clipped, indicating that the signal distortion is still limited, and highly audible clicks are absent. In fact, only at input levels exceeding 0.75, the initial integrator will clip, which causes a clearly audible eect. At the level of 0.75, the SNR has dropped to about 60 dB. As an alternative to clipping in the SDM, clipping before the SDM might be considered. However, in this case dynamic range must be sacriced, although the resulting system is unconditionally stable for large inputs. 27

Integrator 1 2 3 4 5

Ci 4 9 25 92 700

ci 0.7918824022 0.3045453872 0.0699296548 0.0094957213 0.0006068024

ci Ci 3.16 2.7 1.75 0.87 0.42

Input level 0.5 0.55 0.59 0.60 0.65

C1 0 0 0 0 0

C2 0 0 0 5 512

C3 0 0 12 48 2283

C4 0 0 57 175 3258

C5 1836 6595 16285 18829 38155

SNR (dB) 118 117 107 104 67

Table 2: Determination of clipper values for a SDM (above) and the inuence of the clippers on the normal SDM operation (below). The columns with clippers Ci indicate the number of times a clipper was activated in a run of 300,000 samples. The above route has given a complete design example of a modulator reaching the magical 120 dB SNR limit. In practice, however, it is dubious whether 120 dB SNR is necessary. As most electronic equipment seems to be closer to 110 dB, and human hearing seems not capable of reaching a dynamic range of more than 100 dB, 110 dB SNR in the SDM design seems realistic. As an alternative, one might consider a SDM design according to a Butterworth NTF design with a corner frequency of 95 kHz. The resonator poles remain unchanged. The coecients for such a modulator are: c1 = 0.68402124; c2 = 0.22813609; c3 = 0.04563584; c4 = 0.00542804; c5 = 0.00030590 with clipper levels C1 = 5; C2 = 12; C3 = 40; C4 = 200; C5 = 1100. This SDM is stable (without clipping) up to inputs of about 0.65, while reaching a SNR gure of about 115 dB. These gures seem to represent a very agreeable compromise between dynamic range and maximum allowable input. However, for every application this balance should be re-judged.

Design of SDM modulators: II

In the previous section, a design method is presented which in general leads to SDMs of good performance with a Butterworth high-pass type NTF. However, sometimes there may be specic demands which necessitate the use of other designs. An example of such a demand may be the specication of a limited amount of HF noise in the band above 40 kHz. Though several designs exist which allow for this, we will outline two. 28

test.AvgPwr

-50

-100

-150

-200

-250

10

100

1000

10000

100000

1e+06

1e+07

Figure 15: Example of a SDM which has been created according to NTF design by cascading a third order high pass lter and a fourth order high pass lter.

The rst, in line with the previous section, consists of cascading 2 (or more) high pass lters, which then make up the SDM NTF. For example, one could wish to create a SDM which is third order starting from 150 kHz, and than turns 7th order at about 40 kHz. An example of such a design is given in Fig. 15. That SDM has been obtained by designing an NTF as a cascade of a third order Chebyshev high pass lter, with a corner frequency of 150 kHz, and a fourth order lter of the same type with a corner frequency of 40 kHz. The cascade is hence 7th order below 40 kHz, and in this way some of the merits of a low order and high order SDM can be combined. A more heuristic approach is to set each coecient ci in the SDM to a fraction of its previous coecient ci1 . An example of such a SDM in non-delayed feed-forward topology is given in Fig. 16, which represents a 7th order SDM where each coecient is 0.475 times its previous coecient. Note, that this is really a recipe; the actual performance of the SDM is determined too by its topology (e.g., a SDM with delayed feedback topology would be unstable with these coecients). It is interesting to see, that a NTF characteristic as displayed in Fig. 16 can be approximated by a cascade of rst order lters with dierent corner frequencies. In that case, there is full control over the SDM design. 29

test.AvgPwr

-50

-100

-150

-200

-250

10

100

1000

10000

100000

1e+06

1e+07

Figure 16: Example of a SDM which has been created by setting each feed-forward coecient ci to 0.475ci1 (c1 = 1).

Signal processing

A crucial point in any audio chain is signal processing, ranging from simple volume adjustments to complex equalizations. It is immediately apparent, that a direct translation of the PCM-way of signal processing does not exist in DSD. For example, if a DSD signal is volume-adjusted, with a gain g = 0.123456, the resulting output (the one-bit signal multiplied with g) is a multi-bit word. Hence, any signal processing for DSD is always consisting of a cascade of the actual processing step, followed by a re-quantization as shown in Fig. 17. It is possible to contract some signal processing steps and the SDM re-modulator. An example, where an IIR lter is contracted with a SDM, is shown in Fig. 18. It is important to note, however, that such a device is not dierent from the cascade of signal processing/remodulation, although the intermediate multi-bit path is absent. To obtain a realizable system, a low pass lter is generally necessary as indicated in Fig. 19. The reason for this is that the SDM which is used as a re-modulator, cannot cope with the high signal levels the DSD presents. As virtually all of the power of these signals is above 100 kHz, a low pass lter operating above this frequency is sucient to remove enough power such that the re-modulator remains in stable operation. In this respect, the feedforward and feedback structures have quite dierent behaviour. As elaborated in Sec. 5, the feed-forward structure has little suppression of the input signal over the whole band (up to Nyquist), and sometimes even a gain just at the corner frequency of the NTF lter characteristic. The feedback structure, on the contrary, has strong suppression of the input 30

DSD input

Gain

Multibit intermediate

DSD output

High rate!
DSD input Multibit intermediate DSD output

IIR

Figure 17: Examples of DSD signal processing: gain adjustment and lter operations.

DSD input

DSD output T T T T T

Figure 18: Contraction of IIR lter characteristic and SDM, giving a structure with DSD input and DSD output.

DSD input

Gain

Multibit intermediate

DSD output

High rate!

Figure 19: Advisable way of performing two operations on DSD data. First, a gain adjustment is applied, after which an IIR lter operation is applied without leaving the intermediate high rate, multi-bit domain.

31

20

Total

-20

Magnitude (dB)

-40

-60

-80

-100

-120

-140

50000

100000

150000 200000 frequency (Hz)

250000

300000

350000

Figure 20: Transfer function of a lter which can be used to remove the HF of a DSD signal, such that it can be input to a subsequent SDM.

32

Amplitude (dB)

20

100

frequency (kHz)

Signal quality

# requantizations

Figure 21: Schematic presentation of the eect of multiple quantizations.

signal from the fore mentioned corner frequency (see also Fig. 12). Hence, a feed-forward SDM will need more severe ltering of its input signal compared to a feedback SDM in order to maintain stability. The response of a (64 taps) FIR lter which gives sucient HF suppression to allow subsequent re-quantization, is shown in Fig. 20. The total signal transfer characteristic of the cascade of a feed-forward SDM and this lter will be roughly identical to the STF of a feedback SDM. Clearly, the application of such a lter will turn the 1-bit signal directly in a multi-bit signal. It is therefore important to realize, that the benets of DSD are in the high sample rate, they are not in the fact that DSD is 1-bit! The importance of this remark is further emphasized by the following notion: suppose, that the sequence of signal processing steps is necessary. If each of these steps is built according to Fig. 17, the total signal path will contain multiple requantizations. As a result of this, build-up of HF noise will occur. This eect is illustrated in Fig. 21, where schematically the eect of multiple requantizations is displayed. This gure can be explained as follows. If we have a DSD signal, its noise starts to rise above 20-30 kHz, and reaches an almost at level at about 90 kHz. If, in a subsequent re-quantization, the bandwidth of DSD is maintained, the signal is low pass-ltered at a frequency of about the same value (90 kHz). If this signal is fed to a next SDM, its output signal will contain both its own quantization noise, as well as the quantization noise that has been input to it. If this cascade is repeated, it is easy to see why there will be a build-up of HF noise in the area of about 80-90 kHz. Eventually, this signal will be large enough to drive the SDM into its clippers, or, worse: instability. This eect is shown in the right of Fig. 21; as the number of requantizations increases, the signal quality drops slowly. At the moment that the HF noise is large enough to activate the clippers, the signal quality drops rapidly. Hence, all signal processing should be done in a multi-bit domain; only after the nal signal processing step the conversion to 64fs 1-bit signals should be made. 33

Dithering and linearizing SDMs

SDMs are devices with a quantizer; as we are used to with the quantizers from the PCM world, we need to linearize the devices that use a quantizer. With the multi-bit quantizing PCM devices, it is common knowledge that the quantizers need to be dithered with TPDF dither (dither, distributed according to a Triangular shaped Probability Density Function) of full width at half height of 1 LSB [6]. Such dither can easily be obtained by adding 2 random numbers from a uniform distribution of width 1 LSB. For SDM, this recipe is a contradiction in terminis, since the quantizer spans only one bit and, hence cannot accommodate the afore mentioned tpdf dither which spans 2 bits. Still, dithering in what we will coin the classical sense is a very useful technique and has been well-researched; see [8] and [9] and references therein. Even so, new dither techniques are being discussed, which are more appropriate for 1-bit coders; see, e.g. [3]. Next, we will discuss some aspects of dithering in the classical sense. As dither is used to remove the eect of non-linearity, we can distinguish two dierent appearances of the non-linearity: limit cycles4 , idle-tones and distortion. As the idle tones and distortion are heavily suppressed by the looplter, we will ignore it for the moment. In Sec. 9, a more detailed discussion about non-linearity in an SDM is presented. Limit cycles, however, can be very annoying: they can appear in the audible range and, even in the audible range, have high power. Consider, for example, an SDM with the topology at the top of Fig. 7, characterized by the following feedforward coecients: c 1 = 2048; c2 = 768; c3 = 128; c4 = 16; c5 = 1. Clearly, this SDM is extremely well-suited for implementation in hardware, as the coecients represent simple powers of 2, except c2 , which is the sum of two powers. Its spectrum, input zero, is displayed in Fig. 22 which does not show any resemblance with the familiar noise-shaped curve: it is a limit cycle. A limit cycle is a purely repetitive pattern of certain length; for example, a repeated sequence (representing zero input - see also Sec. 4.1) of 1, 1, 1, 1, 1, 1, 1, 1 represents a limit cycle of length 8. The limit cycle in Fig. 22 has length 32, as can be read from its fundamental at 88 kHz. Fortunately, little needs to be done to break up the limit cycle. For example, any input signal exceeding an amplitude of -90 dB will remove the limit cycle completely. To allow for digital silence, though, the use of dither is required, and a very useful way is by applying dither with a rectangular PDF (RPDF dither) just before the quantizer. In the case of the SDM we are discussing here, an appropriate amount of dither has a pdf with a width of 200 (and a mean of 0), and needs to be added immediately before the quantizer. The resulting spectrum is displayed in Fig. 22. This has the advantage, that the dither will become noise-shaped too (as the quantization error) and the increase in noise oor will be marginal. In this case, the undithered SDM has a dynamic range of 98.4 dB (full scale SACD), whereas the dithered SDM has a dynamic range of 98.0 dB. The maximum input, before the SDM turns unstable, has been reduced from 0.7104 to 0.7098 for an input of a
In the literature, these tones are sometimes also called idle tones. We reserve the name idle tones for signals which are not purely repetitive - see also Sec. 9.
4

34

-50

-100 Power (dB)

-150

-200

-250

-300 100

1000

10000 frequency (Hz)

100000

1e+06

Figure 22: Example of a limit cycle occurring in a SDM with zero input (green). In red, the spectrum after application of dither (also zero input) is shown.

35

-20 -40 -60 -80 -100 Power (dB) -120 -140 -160 -180 -200 -220 100

1000

10000 Frequency

100000

1e+06

Figure 23: A noise shaper which is typically used in SACD applications. The spectrum has been coherently averaged 100 times, and this has been repeated 10 times to obtain a power averaged spectrum. 1 kHz sine wave. Hence, this amount of dither has hardly any drawbacks, and signicant advantages. The distortion introduced by the SDM amounts to -150 dB in the band 0-20 kHz (see Sec. 9 for a more detailed discussion about non-linearity in an SDM). The dither added to the quantizer, will hardly change that number, but it is disputable that this amount of distortion (in PCM, this would have been below the 25 bit level) would lead to audible eects.

Non-linearity in a SDM

To present a realistic situation, a spectrum of a SDM that is typically used in SACD applications is presented in Fig. 23. For the purpose of this discussion, this SDM has not been dithered. The input to this SDM has been a 4 kHz sine (-6 dB SACD amplitude). If we are interested in the base-band, extending from 0 to 20 kHz, the relevant distortion products are the 2nd up to the 4th component. From inspection of Fig. 23, it can be concluded that the distortion components are all at most -165 dB, where the noise in the FFT obscures any information deeper than that. The noise oor of this SDM is at -127 dB, resulting in a DR of about 120 dB (recall, that the SACD reference 0 dB level has been dened as -6 dB with respect to the level in the feedback path). It is also instructive to extend the region of interest to the band 0-80 kHz. Obviously, the noise oor is increasing 36

DIGITAL LPF

n-BIT SDM DAC

ANALOGUE LPF

DSD; 64fs

multi-bit; m.64fs

n-bit; m.64fs

analogue

Figure 24: Example of an audio chain found in an SACD-capable player. The DSD is rst low pass ltered in the digital domain, followed by up-sampling to m fs , typically, 128 or 256 fs . This high-rate signal is then fed to an n-bit SDM, where n typically varies between 1.5 and 5. Finally, the analog output is passed through an analog low pass lter. steeply (in the case presented in Fig. 23, this increase is fth order) causing the maximum Signal-to-Noise Ratio (SNR) to drop to about 90 dB in the band 0-40 kHz, and about 55 dB in the band 0-80 kHz. Any harmonic distortion component, however, is at a level at least below -95 dB. Clearly, any harmonic distortion component that we are dealing with in the broader sense of the audio band, is extremely small, and its importance for the perceptual audio quality can be doubted. In view of the fact that this SDM has not been dithered, it is clear that dithering will even further reduce these numbers. In fact, if this SDM is dithered to its maximum level (where it is just not overloaded) the distortion components in the audio band are all below -180 dB, only observable after 5000 coherent averages, and the components in the broader audio band are below -110 dB. Still, the total amount of coherent power that is present in the dithered signal is signicant. The amount of coherent power can easily be estimated if the actual noise is assumed to have no correlation with the signal. It appears that the total amount of coherent power which is present in Fig. 23, is about -10 dB. It is obvious that this power is mostly above 1 MHz; 99.99% of the coherent power is found in this high frequency area. The exact value of the frequency above which most of the correlated signal is found, is dependent on the signal which is input to the SDM; it will, however, never be very much lower than the quoted 1 MHz. It is beyond doubt, that the origin of these signals in the very high frequency area is in the non-linear behavior of the SDM. Indeed, if a triangular pdf dithered multi-bit quantizer is used in the noise-shaper, the high frequency components disappear. Thus, the coherent signal above 1 MHz can be considered in some sense to be distortion. To judge whether these distortion components are harmful, we need to look at the full audio chain which is used to replay DSD in a typical SACD-capable player. Such a conguration is shown in Fig. 24. A typical DAC-chip (see e.g. [1] or [7]) contains the rst 4 blocks displayed in Fig. 24. The digital lter in the path leading to the n-bit SDM is a crucial part, where most of the HF signal present in the DSD signal can be removed without any compromise. As an example, consider a lter that is designed according to the following criteria: pass-band: 0-100 kHz, at within 0.01 dB; transition band 100 kHz - 900 kHz; stop band: 900 kHz - 1.4MHz, suppression 100 dB. This leads to a lter with only 22 taps, and thus does not pose any additional constraint in terms of hardware; the lters which are necessary to do proper up-sampling from a low sample rate format to the required m 64f s , are much more demanding. Also, the digital LPF does not inuence the impulse response 37

of DSD [13], as the transition width is extremely large. It is clear, that the application of this ltering will lead to signicant suppression of the high frequency components present in the original DSD stream. Still, the signal contains substantial amounts of HF, which is foremost white noise. The signal is then up-sampled to a frequency that is used to perform the digital-to-analog conversion on. The SDM will noise-shape this signal into an n-bit signal, where n typically varies between 3 [1] and 5 [7]. It is this signal, which is converted to the analog domain. Due to the noise shaping process, which is intrinsic in modern, high-end DA converters, and is the sole basis for their very high performance, some additional high frequency noise extending to frequency regimes well above 1 MHz is introduced. This noise is usually removed by an analog low pass lter of rst or second order. This ltering is most often passive, and can thus be performed with exceptionally low distortion and inter-modulation. In most SACD players, some additional ltering is provided, to reduce the amount of HF noise (which by then, is mostly due to the DSD signal) even further to levels well below -30 dB. It is important to remark, that the HF signal levels at which these additional lters need to operate are quite low due to the digital pre-ltering (which removed a very substantial amount of HF signal causing the total signal power to be substantially less than 1); hence, the linearity of the lters can be quite high and the ltering operation is performed without additional inter-modulation products. This example of a typical SACD signal path shows, that the non-linearity above 1 MHz is not important at all, and does not inuence the signal quality. In fact, one can argue that these components are favorable. Because the total power of the SDM output is constant and equals 1, the power which is present in these high frequency tones causes the SNR in the lower frequencies to be higher than anticipated on basis of the linear noise transfer function. Hence, they contribute favorably to the dynamic range of an SDM. This discussion then leads to the question whether it would be possible to linearize a SDM in the important signal band, without bothering about its high frequency behavior.

9.1

Pre-correction

In order to have a system which demonstrates in a clear way the eects that we will study in this section, a third-order SDM has been designed. Such low order SDMs are notorious for their relatively bad signal properties [9]. The spectrum of the third order SDM that will be used in the sequel of this paper is shown in Fig. 25. While this third-order SDM has a dynamic range of about 90 dB, its third harmonic is at a level of -104 dB. While this is still a rather respectable number, it is about 60 dB larger than the distortion component of the SDM shown in the previous section. The higher order harmonic distortion products are signicant, too. Also in the broader signal band (0-80 kHz) the distortion components are larger. It should be remarked, that this type of SDM is not recommended for practical use. When we model the SDM as a non-linear element , its transfer characteristic can be written as: 38

-20

-40

-60

-80 Power (dB)

-100

-120

-140

-160

-180 100

1000

10000 Frequency

100000

1e+06

Figure 25: Spectrum of the third order noise-shaper used in the analysis of the precorrection technique. The input signal is a 3 kHz sine wave, -6 dB SACD. To obtain this spectrum, a series of 4 coherent averages and 10 power averages has been used.

(x) = x + 2 x2 + 3 x3 + . . . Now, if we could create a signal s(x) according to: s(x) = x 2 x2 3 x3 . . . then the resulting output signal f (v(x)) would be given by:
2 (s(x)) = x 22 x3 + O(x4 )

(19)

(20)

(21)

In other words, the second harmonic distortion component has been completely removed, and the third harmonic component has been substantially reduced (note, that for the low distortions we are dealing with, i 1). An estimate of the signal s(x) can be obtained using the structure depicted in Fig. 26. The topology of Fig. 26 operates as follows. The rst SDM generates a signal, which is subtracted from the original input signal x. This dierence signal v now contains all the distortion components which are generated by the SDM, and the uncorrelated noise which has been added to the signal because of the noise shaping. This signal v is now low-pass ltered in the lter F , which has, for example, a cut-o frequency of 100 kHz. This results in the signal denoted F (v) in Fig. 26. Next, the original input signal x (after the appropriate delay to correct for the delay in the lter f ) is added to F (v), resulting in 39

Delay

SDM

F(v) +
+

s(x)

SDM

SDPC

Figure 26: Basic Sigma Delta Pre-Correction (SDPC) structure. the signal s (x). While the ltering action has removed all HF noise, more in particular, it has removed the strong signals above 1 MHz, it has not removed any noise in the band below 100 kHz. Hence, the signal s (x) presents only an approximation to the signal s(x) in Eq. 20. The signal s (x) is than input to a next SDM, which is identical to the SDM used to generate v, resulting in the nal output signal y. To gain some insight in the performance of this algorithm, which we will refer to as Sigma Delta pre-correction (SDPC), we have applied it to the third order SDM displayed in Fig. 25. The spectrum of the resulting signal y is displayed in Fig. 27 in the range 0100 kHz. The huge suppression of the distortion components is clearly visible. Typically, the distortion has been reduced by about 20 dB. For higher frequencies, the suppression becomes less eective, even though the signal s (x) contains all distortion components unattenuated in the frequency regime. As always, there is a price to pay for this improvement in THD, which in this case is an increase in the noise oor by 3 dB. This is clear from inspection of Fig. 27, when one realizes that the corrected spectrum has been obtained using twice as many coherent averages which lowers the noise oor by 3 dB, and that the noise oor is identical to the noise oor of the uncorrected spectrum. This also corroborates the fact that this is white noise indeed; if it was correlated, it would result in a more than 3 dB increase. The origin of the increase of the noise oor is the fact that the signal s (x) still contains the quantization noise present in the low frequency range; the second SDM in the cascade adds its own quantization noise to it. Though not visible in Fig. 27, the high frequency signals above 1 MHz are completely unchanged using the new topology, which is expected on basis of the absence of correction components in the signal s (x).

9.2

SDPC and dither

To appreciate the eect of SDPC, it is also instructive to study the combined action of dither and pre-correction. To that end, we have applied a dither level of 0.1 (the SDM starts overloading at levels of 0.8) to the SDM. Spectra of the original SDM, and the SDPC spectrum are displayed in Fig. 28. Also in this case, the suppression of the distortion components is at least 22 dB in the band 0-20 kHz; in fact, even after 64 coherent averages, no distortion components can be observed. Note, that distortion has decreased to levels below -135 dB! Hence, the combined action of small amounts of dither, and the pre-correction technique result in extremely low distortion 40

-20

-40

-60

-80 Power (dB)

-100

-120

-140

-160

-180

1000

10000 Frequency

100000

Figure 27: Spectra of the original SDM (green), and its implementation according to Fig. 26 (red). The spectrum of the original SDM has been obtained using 4 coherent averages and 10 power averages; the other using 8 coherent averages and 10 power averages. The fact that the noise oors of the spectra coincide precisely illustrates the 3 dB loss in SNR due to SDPC.

41

-20

-40

-60

-80 Power (dB)

-100

-120

-140

-160

-180

1000

10000 Frequency

100000

Figure 28: Spectra of the original (dithered) SDM (green), and its implementation according to Fig. 26 (red) using the same dither. The spectrum of the original SDM has been obtained using 8 coherent averages and 10 power averages; the other using 64 coherent averages and 5 power averages.

42

0.06

0.05

0.04

Phase (rad)

0.03

0.02

0.01

-0.01 100

1000 Frequency

10000

Figure 29: Phase characteristic of the signal transfer function of the third order SDM used in this paper.

gures. Again, the reduced distortion suppression for higher frequencies is visible; for example in the region above 40 kHz, the suppression is typically only 8-10 dB. While the higher harmonics are suppressed less than the lower harmonics, which is shown by Eq. (21), this does not fully explain the reduced suppression. Another origin of this reduced suppression for higher frequencies lies in the fact that the phase characteristic of the SDM used here is not straight for frequencies above 20 kHz. This results in some phase distortion, which is not accounted for in the pre-correction technique according to Fig. 26. To obtain an estimate of the signicance of these errors, consider a single harmonic h(t) = A sin (t), which is positioned around 50 kHz. The absence of phase correction will cause incomplete cancellation of the harmonic; a residual power of 4A2 2 will remain. In this case, this results in a maximum power reduction of the harmonic by only 14 dB. An improved pre-correction technique is therefore displayed in Fig. 30. In this diagram, the phase error introduced by the SDM, is corrected for by the lter L. Another improvement can be obtained by cascading the structures displayed in Figs. 26 and 30. In a non-cascaded structure, the cancellation of lower order terms, causes the generation of higher order terms, albeit of much lower amplitude, as can be concluded from Eq. (21). These new, higher order terms, can in turn be canceled in exactly the same way as the lower order ones were canceled, resulting in cascading the structure in Fig. 30. 43

Delay

SDM L

v +

F(v) + +

s(x)

SDM

SDPC

Figure 30: Improved pre-correction structure. By cascading the Sigma Delta PreCorrection structure (SDPC) n times, n harmonics can be removed.

-80

-100

-120 Power (dB)

-140

-160

-180

-200 10000 20000 30000 40000 Frequency 50000 60000 70000 80000

Figure 31: Fifth order SDM, with a 3 kHz input of -6 dB. The uncorrected spectrum (green) has been obtained after 16 coherent and 10 power averages; the corrected spectrum (red) after 2048 coherent and 10 power averages.

44

-60

-80

-100

-120 Power (dB)

-140

-160

-180

-200

-220 100

1000 Frequency

10000

100000

Figure 32: Fifth order SDM, with a DC input of 1/1024. The uncorrected spectrum (green) has been obtained after 4 coherent and 10 power averages; the corrected spectrum (red) after 32 coherent and 10 power averages.

9.3

Performance of a realistic SDM with SDPC

To end with a realistic situation, and to show how SDPC also suppresses DC tones, a standard fth order SDM has been designed, with a SNR of 118 dB over 0-20 kHz. As illustrated in Fig. 31, harmonic distortion levels of this SDM in the phase-corrected SDPC structure are reduced to well below -185 dB if undithered, which amounts to an improvement of about 35 dB compared to 20 dB improvement with the standard SDPC. If the SDM is slightly dithered, the distortion levels drop to much deeper levels, which numerically appeared to be inaccessible (i.e., below -220 dB). Also, distortion levels at higher frequencies are reduced more compared to the standard SDPC algorithm. As with the uncorrected SDM, the SNR in the base-band (0-20 kHz) is slightly reduced from 118 dB to about 115 dB (no dithering) or 114 dB (with dithering). The eects of a DC input to the SDPC system are illustrated in Fig. 32. As input to this system, a DC value of 1/1024 has been applied, which results in a tone around 5.5 kHz. The SDM has not been dithered. In the spectrum of Fig. 32, a tone can be observed with an amplitude of about -145 dB. Application of the pre-correction algorithm, in its basic form, reduces this amplitude to about -165 dB. If a small amount of dithering (RPDF with amplitude 0.05) is applied, which is much less than the maximum allowed amount of dither (0.4 RPDF), the amplitude of the tone cannot be observed after 256 coherent averages, indicating that the tone is at least less than about -175 dB. Also application of the improved SDPC results in values for 45

spurious signals that are not easily accessible numerically.

46

10

Acknowledgements

The authors want to thank prof. S.P. Lipshitz, prof. J. Vanderkooy, Dr. J.D. Reiss and H. ten Pierick for their valuable comments and proofreading of the manuscript.

47

SDM-code

In this appendix, we provide the C-like pseudo code for the SDM discussed in Sec. 5.2. The code simulates 100000 clock cycles of the SDM, with a DC input of 0.1 . /* Coefficients: */ c = { 0.791882, 0.304545, 0.069930, 0.009496, 0.000607 }; f = { 0.000496, 0.001789 }; /* Initialization */ s0 = s1 = s2 = s3 = s4 = 0; y = 1; N = 100000; /* Main loop */ for (i = 0; i < N; i++) { sum = c[0]*s0 + c[1]*s1 + c[2]*s2 + c[3]*s3 + c[4]*s4; if (sum >= 0) y = 1; else y = -1; x = 0.1; s4 s3 s2 s1 s0 } } = = = = = s4 s3 s2 s1 s0 + + + + + s3; s2 - f[1]*s4; s1; s0 - f[0]*s2; (x-y);

48

References
[1] B. Adams, K. Nguyen, and K. Sweetland. A 116 db snr multi-bit noise shaping dac with 192 khz sample rate. In Proceedings of the 106th AES convention, 1999. preprint 4963, Munich (1999). [2] R.W. Adams, P.F. Ferguson, A. Ganesan, S. Vincelette, A. Volpe, and A. Libert. Theory and practical implementation of a fth order sigma-delta a/d converter. J. Audio Eng. Soc., 39:515528, 1991. [3] M.O.J. Hawksford. Time-quantized frequency modulation with time dispersive codes for the generation of sigma-delta modulation. In Proceedings of the AES 112th convention, 2002. Preprint 5618, 2002 may 10-13 munich. [4] H. Inose and Y Yasuda. A unity bit coding method by negative feedback. Proc. IEEE, 51:15241535, 1963. [5] H. Kato. Trellis noise-shaping convertors and 1-bit digital audio. In Proceedings of the AES 112th convention, 2002. Preprint 5615, 2002 may 10-13 munich. [6] S.P. Lipshitz, R.A. Wannamaker, and J. Vanderkooy. Quantization and dither: a theoretical survey. J. Audio Eng. Soc., 40:355375, 1992. [7] S. Nakao, H. Terasaw, F. Aoyagi, N. Terada, and T. Hamasaki. A 117db d-range current-mode multi-bit audio dac for pcm and dsd audio playback. In Proceedings of the 109th AES convention, 2000. preprint 5190, Los Angeles (2000). [8] S.R. Norsworthy and D.A. Rich. Idle channel tones and dithering in delta-sigma modulators. In Proceedings of the AES 95th convention, 1993. preprint 3711, 1993 october New York. [9] S.R. Norsworthy, R. Schreier, and G.C. Temes. Delta-Sigma Converters, Theory, Design and Simulation. IEEE Press, New York, 1997. [10] Philips and Sony. Super Audio CD System Description. Philips licensing, Eindhoven, The Netherlands, 2002. [11] D. Reefman and E. Janssen. Enhanced sigma delta structures for super audio cd application. In Proceedings of the AES 112th convention, 2002. preprint 5616, 2002 may 10-13 munich. [12] D. Reefman and P.A.C.M. Nuijten. Editing and switching in 1-bit audio streams. In Proceedings of the AES 110th convention, 2001. preprint 5399, 2001 may 12-15 amsterdam. 49

[13] D. Reefman and P.A.C.M. Nuijten. Why direct stream digital is the best choice as a digital format. In Proceedings of the AES 110th convention, 2001. preprint 5396, 2001 may 12-15 amsterdam. [14] S. Wolfram. The Mathematica Book. Wolfram Media/Cambridge University Press, Cambridge, 4 edition, 1999.

50

S-ar putea să vă placă și