Documente Academic
Documente Profesional
Documente Cultură
SVERKER NYSTROM
IR-SB-EX-0501
Master of Science thesis work by Sverker Nystrm, December 2004 Westinghouse/WesDyne TRC Department of Signals, Sensors and Systems Royal Institute of Technology (KTH) Examiner: Bjrn V lcker
Contents
1 Abstract 2 Introduction 2.1 Background 3 Ultrasonic Equipment and Techniques 3.1 Ultrasonic Instruments 3.2 Ultrasonic Theory 3.3 Inspection Sequences 4 Compression Methods 4.1 Lossless Compression 4.1.1 Huffman Coding 4.1.2 Lempel- Ziv Coding 4.2 Lossy Compression 5 Transform Theory 5.1 Fourier Transform 5.2 Short Time Fourier Transform (STFT) 5.3 Wavelet Transforms 5.3.1 Scale and translation. 5.4 Standard Wavelets vs. Wavelet Packets 5.5 Two Dimensional Transforms 5.6 Transform Compression 5.7 Mathematical Error 6 File Compression 6.1 Hardware compression 6.2 Compression Scheme 6.3 Lossy File Compression 6.4 Noise 7 Experimental Result and Evaluation 7.1 Signal Pre Processing 7.2 Error Analysis 7.2.1 Visual Error 7.2.2 Visual File Evaluation 7.3 Lossless Compression Results 7.4 Lossy Compression Evaluation Results 7.5 Reality Compression Tests 8 Conclusions and Future Work 9 References
1 (49)
Appendices A1 Defect 4 A1.1 Mathematical Error Calculation Results A1.2 Visual Pulse-echo Evaluation Results A1.3 Pulse-echo Plots A1.4 Visual TOFD Evaluation Results A1.5 TOFD Plots A2 Defect 5 A2.1 Mathematical Error Calculation Results A2.2 Visual Pulse-echo Evaluation Results A2.3 Pulse-echo Plots A2.4 Visual TOFD Evaluation Results A2.5 TOFD Plots A3 Defect 8 A3.1 Mathematical Error Calculation Results A3.2 Visual Pulse-echo Evaluation Results A3.3 Pulse-echo Plots A3.4 Visual TOFD Evaluation Results A3.5 TOFD Plots
2 (49)
1 Abstract
This thes is is made at WesDyne International in Pittsburgh, USA and at WesDyne TRC in Tby, Sweden. WesDyne International is a company that performs non-destructive testing using different inspection techniques such as ultrasonic and eddy current. The inspection objects are mainly welds in nuclear power plants. Normally data collection and data analysis are done at site. There has been a wish to have some of the analysis done at the head quarter instead. The task for this thesis is to investigate the possibilities to transfer inspection data from the inspection site to the head quarter via a slow transmission line. Due to many jobs short duration in time it is not always cost effective to set up a high-speed transmission line and therefore the only possible transmission channel is a telephone modem. Relatively large file sizes, around 200 MB demand a high compression ratio and the suggested solution is to use a destructive transform compression technique with wavelets in combination with WinZip. The results show that a compression ratio of 1:13 is achievable if a wavelet from the Daubechies family is used which would reduce a 200 MB file size to around 15 MB.
3 (49)
2 Introduction
2.1 Background
WesDyne Internatonal is a company that performs non-destructive testing of welds in nuclear power plants worldwide. The inspection techniques used are the ultrasonic technique (UT) and the eddy current technique (ET). An inspection team consists of around 20 persons where each person has his own field of expertise. A crew can roughly be divided into three groups, equipment personnel, acquisition personnel and analyst personnel. Inspections are often performed at different sites simultaneously which some times cause a shortage of qualified personnel. This shortage could be reduced if some of the inspection work could be done off site, e.g., from the head quarter. Only one of the three groups mentioned previously is suited for distance work namely the analysis of the inspection data. If the collected data could be transferred from the inspection site to the head quarter the analyst personnel could evaluate data from more than one inspection at the same time and, therefore, reduce the shortage to some extent. Due to many jobs short duration in time it is not always cost effective or even possible to set up a high speed transmission line and therefore the remaining alternative is to use the standard telephone net. The problem is that if a file is to be transferred over the telephone net via a standard telephone modem which has a maximum bit rate of 56600 bits/s and the file size is 200 MB the total transfer time would theoretically be:
200000000 8 = 7 hours 51 minutes 56600
(1)
In reality the transmission time will most likely be twice as long due to network overhead, congestion and shared use. This means that to get a reasonable transmission time of around 60 minutes the file size has to be compressed at least 15 times. A natural first step is to use any of the many commercial compression programs on the market and see how effective they are. As seen later in the report this simple approach will only yield a compression ratio of around 1:2 which is not high enough to meet the original criteria. The idea in this thesis will be to investigate if it is possible to improve this compression ratio by pre processing the raw ultrasonic data and then as a final step use the well known compression program WinZip.
4 (49)
FILE HEADER
..
FILE CONTENTS GROUP
..
ACQUISITION DATA
FILE HEADER
ACQUISITION DATA
5 (49)
To investigate if a weld or a piece of metal has any defects, such as internal cracks or material defects, a sound wave is sent into the material. The sound travels with different speed in different materials and therefore it is possible to calculate the time
t= s v
s = distance , v = velocity
(2)
it should take for the sound wave to travel through the material, reflect on the backside and return to the transducer. If the transmitted wave returns in t ' < t time units something else has reflected the puls e. This could for example be an air cavity or a crack which both have a lower density than metal. The frequency of the signal is selected depending on which material to be inspected since the attenuation in the material is frequency dependent. The pulse generated by the UT instrument ha s the form of a negative square wave, half a period long, see figure 3.3. The pulse length t can be varied between 25 to 500 ns. The pulse amplitude can be set from 0 to 200 Volts. Although the pulse shape from the instrument is a square wave the actual sound wave coming out of the transducer has a sinusoidal shape due to the stiffness in the piezoelectric material, figure 3.4. The knowledge of the pulse shape will be used later in this work.
U U
Figure 3.3
Figure 3.4
6 (49)
The ultrasonic instruments also have a number of signal processing features. Different filters can be combined so that band pass, low pass and high pass filters can be applied. Another form of filtering is the averaging functio n which takes the average of a number of samples. If the built in compression function is activated a choice can be made to only store the highest peak of 2, 4, 8 or 16 consecutive samples. This can be seen as a form of down sampling which will change the relative bandwidth of the signal. The effect of this can be seen on the stored pulse as the shape gets more saw tooth like due to the removal of samples, see figures 3.5 and 3.6. There are two different types of ultrasonic techniques that are used when inspecting a weld.
Pulse-echo: A sound pulse is transmitted and received from either one or two transducers. The most common penetration angles used are 0 degrees, 45 degrees and 70 degrees. The aim is to detect any potential indications (cracks, air pockets etc). When evaluating a file the amplitude of the signal is important, i.e., signal amplitudes higher than a certain level are considered to be indications and need further evaluation. Figure 3.5 shows the pulseecho principle together with a plot of a so called A-scan. An A-scan is a picture that shows the echo signal from a transmitted pulse.
UT probe(s)
Weld Crack
Wave
20
Amplitude
-20 50 Samples
7 (49)
Time of Flight Diffraction (TOFD): This method is used only if any indications have been found with the pulse-echo method and it is used to measure them with respect to depth, length and width. The important information in this method is the phase of the received signal. The amplitude is of less interest. It is important that the above mentioned signal properties are preserved as much as possible by the compression method. Figure 3.6 shows the TOFD principle together with a plot of an A-scan.
Tr. probe Rec. probe
Weld
10000
Crack
Amplitude
8 (49)
9 (49)
A-scan
B-scan view
Amplitude
Time
Figure 3.7: Scan sequence (top), colour coded A-scan (bottom). Step Scan
A-scan direction
10 (49)
4 Compression Methods
The purpose of compressing a file is to reduce the size and this can be done in a number of different ways. Generally, compression is achieved by representing an original set of data more effective. This can be seen as reducing the amount of redundancy. The compression ratio is defined as:
Compression Ratio =
(3)
The compression techniques can be separated into two groups, lossy and lossless. In the case of lossless compression the file is completely reproducible and does not lose any information. This compression is achieved by representing the compressed data with fewer bits than the original data. Two lossless compression algorithms will be described in the section below, Huffman and Lempel-Ziv. They are both used in combination in the WinZip program.
4.1.2 Lempel- Ziv Coding Lempel-Ziv is sometimes referred to as a substitutional or dictionary based encoding algorithm. The algorithm builds a data dictionary of the data in an uncompressed data stream. Patterns of data (substrings) are identified in the data stream and are matched to entries in the dictionary. If the substring is not present in the dictionary a code phrase is created based on the data content of the substring and it is stored in the dictionary. The phrase is then written to the compressed output stream. When a matching substring occurs in the data the phrase in the dictionary is written to the output instead. Because the phrase value has a physical size smaller than the substring it represents data compression is achieved. The example in figure 4.1 [Proakis, Salehi 94] shows a data sequence encoded with the Lempel-Ziv algorithm.
Data seq.
0100001100001010000010100000110000010100001001001
Dictionary Location 1 0001 2 0010 3 0011 4 0100 5 0101 6 0110 7 0111 8 1000 9 1001 10 1010 11 1011 12 1100 13 1101 14 1110 15 1111 16
Dictionary Contents 0 1 00 001 10 000 101 0000 01 010 00001 100 0001 0100 0010
Codeword 0000 0000 0001 0011 0010 0011 0101 0110 0001 1001 1000 0101 0110 1010 0100 1110 0 1 0 1 0 0 1 0 1 0 1 0 1 0 0 1
Encoded sequence: 0000 0, 0000 1, 0001 0, 0011 1, 0010 0, 0011 0, 0101 1, 0110 0, 0001 1, 1001 0, 1000 1, 0101 0, 0110 1, 1010 0, 0100 0, 1110 1 Figure 4.1
12 (49)
13 (49)
5 Transform Theory
A useful definition in transform theory is the scalar product of two continuous functions defined as:
f , g = f ( x) g ( x)dx
a b
(4)
This can also be seen as the projection of the function f(x) onto the basis function g(x). Another interpretation is, how similar are f(x) and g(x). If f , g = 0 , the two functions are called orthogonal. Figure 5.1 shows a simple projection example in a Cartesian coordinate system.
A = (0,3)
V = (3,1)
V A = (3 0) + (1 3) = 3 V B = (3 4) + (1 0) = 12 A B = (0 4) + ( 2 0) = 0 orthogonal
B = ( 4,0)
f (t ) =
a0 + (a n cos( nT t ) + bn sin( nT t )) 2 n =1
T =
2 T
(5)
14 (49)
This sum is called the Fourier series of a signal [Petersson 97]. The basis functions used in the Fourier series are sine and cosine and have the following properties
The coefficients a 0 , an , bn can easily be calculated due to the orthogonality of the basis functions. A continuous aperiodic signal can not be written as a Fourier series but as a Fourier integral [Petersson 97].
F () =
f (t) e
jt
dt
(9)
1 f (t ) = 2
F ( )e
j t
(10)
The function F ( ) is called the Fourier transform (FT) of f (t ) and conversely f (t ) is called the inverse Fourier transform (IFT) of F ( ) . It should be noted that a Fourier transformed signal does not contain any information of where the different frequencies occur in time. It only gives the overall spectral content of the signal. This is d ue to the assumption that the signal to be transformed is stationary. One criteria of stationarity is that the frequency content does not change over time. In the discrete time-domain a discrete version of the Fourier series (DFT) has to be used. The formulas for the DFT and the IDFT are [Proakis, Manolakis 96]
N 1 n= 0
DFT
X (k ) = x( n)e x( n ) =
j 2 kn
k = 0, 1, 2, ..., N 1
(11)
IDFT
j 2 kn 1 N 1 X ( k )e N N k =0
n = 0, 1, 2, ...., N 1
(12)
15 (49)
5.2 Short Time Fourier Transform (STFT) To overcome the time resolution problem of the Fourier transform the signal is cut into small slices followed by a Fourier transformation of these slices. This can be seen as moving a rectangular window along the signal t time units at a time as shown in figure 5.2. At each instant the window function w(t) is multiplied with the signal f(t) and the product is then Fourier transformed. f(t)w(t)
w(t)
f(t)
t 0 Figure 5.2: Windowing principle. To avoid large Fourier coefficients due to the sharp edges of the window function in figure 5.2, smoother window functions are normally used. Hanning, Hamming and Bartlett are examples of windows commonly used. Figure 5.3 shows a smoother window function. t
f(t)w(t)
w(t)
f(t)
16 (49)
The resulting local time- frequency analysis procedure is called Short Time Fourier Transform (STFT) or windowed Fourier Transform. The STFT is defined as
STFT ( , ) =
f ( t ) w (t )e
j t
dt
so the window function basically controls the time- frequency resolution according to Narrow window Wide window good time resolution, poor frequency resolution. good frequency resolution, poor time resolution.
This time- frequency resolution compromise has its roots in the Heisenberg uncertainty principle. It simply states that one can not know the exact time-frequency representation of a signal, i.e., one can not know what spectral components exist at what instances of times.
17 (49)
The variable time-frequency resolution of the wavelet transform (WT) compared to FT and STFT is shown in figure 5.4. The pictures show how the STFT and the wavelet transform has a time resolution as well as a frequency resolution as a comparison to the FT. It can be seen how the wavelet transform has a varying time- frequency resolution depending on the frequency range of the signal. The high frequency content in a signal gets a lower frequency resolution but a better time resolution than the low frequency content.
Freq.
Freq.
Freq.
Time
Time
Time
a)
c)
18 (49)
( s, ) =
f (t)
s ,
(t )dt
(14)
This shows how a signal f(t) is decomposed into a set of basis functions s, ( t ) , where the two variables s and represent scale and translation. The function 1,0 (t ) is called mother wavelet and the wavelets are generated from this single function which is defined as
s , (t ) =
t s s
(15)
where
1 s
The mother wavelet is similar to the windowing function in the STFT. A difference between wavelet transforms and other transforms, e.g., Fourier transform is that the basis function (t ) can be chosen and designed by the user as long as it satisfies certain conditions. Each wavelet family has a number of subclasses distinguished by the number of coefficients. The wavelets are often classified in each family by the number of vanishing moments. This is an extra set of mathematical relationships for the coefficients that must be satisfied and is directly related to the number of them. A higher number gives a smoother wavelet due to the increased number of coefficients. Figure 5.5 shows the Daubechies wavelet with 3 different vanishing moments and the Haar wavelet. The wavelets used in this report are the Daubechies wavelet with 8 vanishing moments and the Haar wavelet.
19 (49)
Haar transform.
From a formal point of view the mother wavelet (t ) has to satisfy a number of conditions and the two most important ones are the admissibility and the regularity conditions [Valens 99]. The admissibility condition is defined as
( )
d < +
( ) = Fourier transform of (t )
(16)
20 (49)
( ) = 0 = 0
2
(17)
This means that the average value in the time domain must be zero as well, i.e., the positive and negative areas under the curve must cancel out
(t ) dt = 0
(18)
A function with these properties is an oscillating function which means that (t ) is a wave. This is where the word wavelet comes from. The regularity conditions has to do with the approximation order of the wavelet transform and the decay of the coefficients ( s, ) so if the wavelet has N vanishing moments then the approximation order of the wavelet is also N.
5.3.1 Scale and translation. The two parameters s (scale) and (translation) are used when a signal f(t) is transformed. The s parameter can be seen as a zooming tool which dilates and compresses the mother wavelet and, thus, changes the frequency resolution. A small s value will compress the mother wavelet whereas as high value will dilate or stretch it out. The parameter is used to slide the wavelet over the signal at the different scales. At every time instant t the signal f(t) is multiplied with the wavelet and integrated over all times. This procedure is repeated until the end of the signal is reached. Then the scale is changed and it all repeats again. When the scale has a value that makes the wavelet curve similar to the signal f(t) at a certain time instant the multiplication and integration will give a large value compared to when they do not match very well. Figures 5.6, 5.7 and 5.8 illustrate how an input signal is affected by a wavelet with 3 different scale values. In this example the input signal f(t) (fig. 5.6) is a wavelet called Mexica n hat. Figure 5.7 shows 3 plots of the Daubechies wavelet with 8 vanishing moments and 3 different scale values. The last plot, figure 5.8, shows how the output signal varies in amplitude depending on how well the input signal f(t) matches the wavelet. From the output plot it can be seen that the middle wavelet matches the input signal best as it gives the highest output of all 3 wavelets.
21 (49)
Amplitude
t
time Figure 5.6: Mexican hat wavelet as input signal f(t).
Amplitude
t
time Figure 5.7: Daubechies wavelets with 3 different scale values.
22 (49)
Amplitude
Figure 5.8: The output signals from the 3 differently scaled wavelets with the Mexican hat as input signal.
k =
x[k ]h[n k ]
(19)
23 (49)
ylow[n ]
l[n] = LP filter
yhigh[n]
h[n] = HP filter
Figure 5.9: Low and high pass filtering of the signal x[n].
If the original signal x[n] has a frequency content of 0, 2s and the HP and LP filters used are half band filters, the filtered outputs ylow[n ] and yhigh[n] will have the frequency bands (0, 4s ) and ( 4s , 2s ) respectively. After the filtering operation, half of the samples can be eliminated according to Nyqvists rule and the signal is therefore decimated by 2 by discarding every other sample. Figure 5.10 shows the filtering and decimation process.
f f f
( )
f
ylow[n ]
yhigh[n]
Figure 5.10
A signal that is passed trough such a filter is said to be decomposed one level. The level of decomposition is varied by repeatedly filter the LP part of the signal until only two samples remain which means that the signal has been fully decomposed. A three level DWT decomposition of the signal x[n] is shown in figure 5.11 as an example.
24 (49)
LP LP LP x[n] HP 2 2 HP 2 2 HP
2 2
As can be seen only the LP part of the signal is passed on to the next level of filtering and this is what differs between the wavelet packet and the wavelet decomposition. A wavelet packet decomposition uses both the LP and the HP part of the filtered signal, see figure 5.12, which will increase the possibilities of an efficient representation of the signal.
LP HP LP 2 2
2 2
x[n]
HP LP HP 2
.
2
Without going too deep into the theory of wavelet packets an advantage is the possibility to choose the best basis for a given application. This means that the optimal representation of a signal is calculated with the help of a so called cost function [Jensen, la Cour-Harbo 01]. After the signal has been fully decomposed, the cost function is applied to the elements in each level of the decomposition. The best representation of the signal is then found by taking the elements which correspond to the lowest cost values. This means that the user can design the cost function in such way that it chooses the best basis for a specific application. One example is the picture format JPEG 2000 where the cost function has been designed according to the sensitivity of the human eye. The use of different cost functions will change the time- frequency resolution of a signal and it will no longer always look like it does in figure 5.4c. Figure 5.13 shows four example plots of different time- frequency resolutio ns obtained with different cost functions. In this report we will however stick with the description in figure 5.4c.
25 (49)
Freq.
Freq.
Time
Time
Freq.
Freq.
Time
Time
Figure 5.13: Examples of time-frequency representations obtained with different cost functions.
F (u , v ) =
1 MN
x = 0 y =0
f ( x, y )e
j 2 (
ux vy + ) M N
M = rows, N = columns
(20)
When a 2D transform is applied to a data matrix it involves a number of 1D transformations. More precisely, it is achieved by first transforming each row, replacing each row with its transform and then transforming each column replacing it with its transform. The same theory is applicable on the 2D wavelet transforms.
26 (49)
(21)
In the time domain the zeroing operation would result in the signal g(t) which is plotted in figure 5.15.
27 (49)
This signal is completely different from the original signal and it contains only one frequency. If the original signal f(t) is Fourier transformed it will have two peaks with different heights corresponding to the two sinusoid signals, see figure 5.16. The height differences represent different energies and are due to the different amplitudes of the two signals.
1
Energy
fs 2
fs 2
Frequency
28 (49)
If the same operation is done on F(f), i.e. setting 50% of the smallest coefficients equal to zero, E(f), and then inverse transform the sig nal it will look like figure 5.17.
1
Amplitude
1
500
Samples
This signal is almost identical to the original and it has been reconstructed with only half the number of samples. This is the basic principle how to lossy compress signals with the use of transforms and still achieve a good result. The result is however dependent of what kind of signal it is. In this example the signal contained two sinusoids only and since the Fourier transform has sinusoids as its basis functions a good result should be expected. A downside with the Fourier transform is that it has no time resolution. It only shows the different frequencies in a signal but not where in time they occur. This is no problem if the signal is stationary i.e. all frequencies exist at all time. The signal f(t) is however nonstationary. To visualize this effect a stationary signal, s(t) containing the same two frequencies is Fourier transformed. The signal is:
s(t ) = sin( 0 t ) + 0.05 sin( 50 0 t ) 0 = 2f 0
(22)
In this signal the two frequencies occur at the same time and have the same amplitude and frequency as the previous signal f(t). Figure 5.18 shows a plot of s(t).
29 (49)
Amplitude
500
Samples
To show the similarities between F(f) and S(f) two comparative plots have been made. Figure 5.19 shows F(f) again and figure 5.20 shows S(f).
Energy
fs 2
0
Frequency
fs 2
30 (49)
Energy
fs 2
0
Frequency
fs 2
Figure 5.20: S(f) with normalized energy. As can be seen it is not possible to determine which of the two frequency plots that come from which signal.
Error =
( x
i =1
xi ) 2 m = number of samples
2 i
x
i =1
(23)
where x is the original signal and x is the reconstructed signal after compression. The error value from the different compression methods is then divided with the sum of the square of the original samples and plotted in a diagram. This way of measuring the error is motivated by the compression technique used i.e. by setting small coefficient values to zero. Plots of the calculated errors are found in section 7.2. The error tables are sorted defect wise in the appendices.
31 (49)
6 File Compression
The examined files contain three cracks in total and each crack has been scanned with one pulse-echo and one TOFD probe. This gives a total of six files to be compressed. To make sure no false indications are created in the compression process one of the examined cracks is below detection level which means that it is a crack but not big enough to be reportable. To save computational time the files have been modified so they only contain the defect areas. This has saved a lot of time as each file has been compressed and decompressed four times using three different compression algorithms. The software used is Matlab Student Version 6.0 R12 with the Wavelet toolbox and Borland C++ 5.02. All compression work has been done with Matlab and the data extraction and file modification has been done with Borland C++. To be able to read the headers in the UT files a special RDTIFF reading program, RDTV from R/D Tech has been used. This program makes it possible to determine the size of the raw UT data and where it is stored in the file.
32 (49)
5000
Amplitude
5000
Samples
228
15
Amplitude
15
Samples
540
33 (49)
UT- file
Extract UTdata
UT-data
File Header
Split File
Transform File Header Lossy Compression WinZip Decompression WinZip Decompression UT-data
WinZip Compression
WinZip Compression
Inverse Transform Rebuild Compressed File Rebuild Original File File decompression schem e. Figure 6.3
34 (49)
The files have been compressed with four different compression ratios and only with the 2D compression method. A comparison with the 1D transform would be interesting but would be too time consuming due to the extra evaluation work required. The compression ratios are 1:5, 1:6.7, 1:20 and 1:100. A compression ratio of 1:5 means that one fifth of the transformed coefficients have been saved and that four fifth have been set to zero.
6.4 Noise
The addition of unwanted noise is always a problem because it distorts the signal and thus makes the data more difficult to analyse. The noise has various sources of origin. The most common is thermal noise which is present all the time. The two methods (pulse-echo and TOFD) used are not equally sensitive to noise because they are evaluated differently. As the purpose of the pulse echo method is to detect cracks the only interest is the amplitude of the received signal and normally the noise is not a problem. The TOFD method on the other hand requires the use of a preamplifier to boost up the received signal and this means that the noise as well is amplified. When a TOFD signal is evaluated the phase of the received signal is what is important and the added noise often makes it hard to determine the phase. The UT instruments has a built in noise reducing feature called averaging and this means that the average of 2, 4, 8 or 16 consecutive Ascans is taken which reduces the noise to some extent. One downside with this function is that the scanning speed has to be reduced when higher averaging is used e.g. 8 and 16.
35 (49)
36 (49)
10 0
Error
10
% saved elements
20
10 0
Error
10 1
102
10
% saved elements
20
37 (49)
10 0
Error
10
% saved elements
20
10 0
Error
10 1
10
% saved elements
20
38 (49)
10 0
Error
101
10
% saved elements
20
10 0
Error
10 2
10
% saved elements
20
39 (49)
To show the importance of combining a visual evaluation together with a mathematical error calculation the following example can be studied. It shows an uncompressed A-scan compared with the same A-scan compressed with the FFT and the Haar transform respectively. The A-scan comes from defect 8 and is scanned with a pulse-echo probe. The chosen compression ratio is 1:20 which means that 95 % of the coefficients are set to zero. According to table AT3 in appendix A3.1 this compression would give approximately the same mathematical error for both methods i.e. FFT = 0.435510 and Haar = 0.440680. The difference between the two compressed scans, figure 7.8 and 7.9 and the origina l scan, figure 7.7 is clearly visible.
10
Amplitude
10
50
Samples
5
Amplitude
50
Samples
10
Amplitude
10
50
Samples
40 (49)
7.2.1 Visual Error As mentioned earlier there are two different types of UT- files, pulse-echo and TOFD. The evaluation is done using different criteria depending on which type it is. As the pulse-echo method is used to detect cracks the received signal amplitude is what is of most interest. When such a file is analyzed the signal is often rectified and if the peak amplitude at a certain distance reaches over a predefined level it is considered to be an indication. From a compression point of view it is important that the compression program affects the signal amplitude as little as possible. The way the compression is achieved in this case i.e. by setting a number of small coefficients in the frequency domain to zero should not affect a strong echo signal too much as it would be represented as a large coefficient in the frequency domain. When a TOFD- file is analyzed the phase of the signal is what is most important. When there is a crack or air pocket in the material the signal is reflected due to different reflection index when it hits the material/air intersection. This reflection causes it to change 180 degrees in phase. So in this case it is important that the compression does not change the phase of the signal. Again zeroing a number of small coefficients only decrease the frequency content of the original signal and not the phase so an assumption is that there will not be any significant changes of the signal phase. Even if the compression does not change the phase notably it must not change the shape of the ultra sonic signal in this case the A-scan. It is however inevitable that a signal shape is not preserved if it has been transformed with basis functions that do not match the signal in the first place. This phenomenon was seen on the Haar compressed files when they were to be evaluated. The original pulse shape was too distorted by the Haar wavelet basis function, see figure 5.5, that the method was dismissed from the beginning and thus never evaluated. This conclusion would not have been drawn with the mathematical error calculations as the only quality measurement as the Haar and the FFT methods roughly seem to be giving similar error results. The signal distortion will be most apparent on the Haar compressed TOFD files as they are recorded without the hardware compression and thus not as saw tooth shaped as the pulse-echo files. Figure 7.10 show a TOFD A-scan which has been compressed and reconstructed with the Daubechies and the Haar wavelets, figure 7.11 and 7.12. The square wave look on the Haar signal originates from its basis function and it was considered to be too distorted to evaluate.
41 (49)
5000
Amplitude
5000
50
samples
5000
Amplitude
5000
50
samples
5000
Amplitude
5000
50
samples
42 (49)
7.2.2 Visual File Evaluation The compressed files have been evaluated by two persons with a level 2 ultra sonic certificate. This means that they are qualified to collect and evaluate ultra sonic data at nuclear power plants in Sweden and in other countries around the world. The persons were given the original files together wit h the compressed ones without knowing the compression level on each file. The compressed files are labelled as the example below shows. pe4_2d_db8_1 = FileType_CompDimension_Transform_FileNumber FileType: CompDimension: Transform: FileNumber: Pe = pulse-echo or to = TOFD + defe ct number. Dimension of compression method, always 2D. Used transform. FFT, Db8 or Haar. Random number between 1 and 4.
The random numbering of the files is done to remove the compression ratio information that could affect the evaluation result. The need to have the original file as a reference is however making the evaluation a bit biased but could not be avoided. The three defects are called defect 4, defect 5 and defect 8. The three files are all a bit different from each other.
Defect 4: This file contains an indication that is below the reportable limit. In this case reportable limit means that the received signal amplitude must reach over a specific level in three consecutive scans. A reportable defect has to be scanned with a TOFD probe to determine its size. This defect was chosen to see if any of the compression methods and ratios made it reportable i.e. made the signal level reach over the critical level. The TOFD file is eva luated for this report anyway. The signal does not contain a lot of noise. Defect 5: This file has one reportable indication which is measured with a TOFD probe. The TOFD file is however very noisy. This will make it interesting to see if the compression reduces the noise and thus have a positive effect on the signal. Defect 8: This file also has one reportable indication but the TOFD file is much less noisy than defect 5.
43 (49)
Table 7.1 The resulting compression ratios achieved with this straight forward method is however not sufficient but it gives a good indication of the efficiency of lossless compression applied to ultra sonic files.
44 (49)
Defect 4 Daubechies compressed One non reportable indication, original signal not noisy % saved elements Pulse-echo TOFD 20 No visual change in A-scan 15 shape. No visual change in A-scan Minor change in A-scan shape. 5 shape. 1 Change in A-scan shape. Table 7.2 A-scan not too good.
From table 7.2 it is seen that saving 15% of the pulse-echo and TOFD samples can be done without loosing any significant information. From the Daubechies curve in figure AP1.2 in appendix A1.1 it seems likely that 8-10% of the TOFD samples can be saved without significant changes in pulse-shape as the error only increases marginally.
Defect 5 Daubechies compressed One reportable indication, original signal noisy % saved elements Pulse-echo TOFD 20 No visual change in A-scan No visual change in A-scan 15 shape and ampl. shape. Small change in A-scan Minor change in A-scan 5 shape and ampl. shape, most noise gone. 1 Change in A-scan shape. Table 7.3 A-scan shape not good
From table 7.3 it is seen that saving 5% of the TOFD samples can be done without loosing any significant information and 15% of the pulse-echo samples. From the Daubechies curve in figure AP2.1 in appendix A2.1 it seems likely that 10% of the pulseecho samples can be saved without significant changes in pulse shape as the error only increases marginally.
45 (49)
Defect 8 Daubechies compressed One reportable indication, original signal not noisy % saved elements Pulse-echo TOFD 20 No visual change in A-scan No visual change in A-scan 15 shape and ampl. shape. Small change in A-scan Minor change in A-scan 5 shape and ampl. shape. Change in A-scan shape 1 A-scan shape not good. and ampl. Table 7.4
From table 7.4 it is seen that saving 15% of the TOFD and the pulse-echo samples can be done without lo osing any significant information. From the Daubechies curve in figure AP2.1 in appendix A2.1 it seems likely that 5-10% of the pulse-echo and TOFD samples can be saved without significant changes in pulse shape as the error only increases marginally. All background data to these tables can be found in the appendices. Error calculations, error plots and pictures from the visual evaluations are listed defect wise. The Haar transform is only represented in the error calculation table and in the error plots but not in any of the evaluation tables due to its poor visual compression results.
46 (49)
The problems described above could in the worst case make the compressed file even bigger than the original one if for example a file with 8 bit data values is saved with a 32 bit representation. Fortunately WinZip almost removes the effect of this problem but not all which means that there still is room for some improvement when it comes to effective data storage. The file in this test is to4, i.e. the TOFD file from defect 4. The whole file is now compressed compared to the modified versions previously used where the defect area was cut out. A TOFD file was chosen because it only contains data from one transducer whereas the pulse-echo files have data from two different transducers. The coefficient vector has been saved in two parts to overcome the storage problem described above (nr 2). Part one has ~7% of the coefficients and was saved with 32 bits and part two with the remaining 93% was saved with 16 bits.
47 (49)
Three different compression ratios were WinZipped and compared with the original TOFD file. Table 7.5 shows the result.
Pre processed compression ratio 1:6.7 (85% zeros) 1:10 (90% zeros) 1:20 (95% zeros)
to4 Size = 6406 KB WinZipped WinZipped pre processed file original file size size (KB) (KB) 1181 854 3903 486 Table 7.5
As can bee seen the calculated compression ratio does not exactly match the compression in reality when the file has been WinZipped. A possible explanation could be the storage problem mentioned before.
48 (49)
(24)
as opposed to the original 7 hours and 51 minutes. Normally the files are smaller than that and of course the transfer time decreases accordingly. The results also show the importance of choosing the right transfer function depending on the signal shape. In this case the Daubechies wavelet matched the transmitting pulse the best which could be seen in both the visual and the mathematical plots and tables. A factor that has not been taken into account is the time the compression and decompression process takes which can be substantially if the files are large. The compression/decompression time can probably be reduced if the algorithms are implemented in C instead of using the wavelet toolbox in Matlab. The practic al test results in section 7.5 show that in order to get a final compression ratio of 1:10 a slightly higher pre processing compression ratio must be used. This result is however based on the way the file was saved with the 32 and 16 bit word lengths. This file splitting was far from optimal and a more effective method would probably result in a better final compression ratio. Interesting work for the future would be to use wavelet packets together with the best basis concept and see what the improvements in compression ratio and signal quality are like.
49 (49)
9 References
[Proakis, Salehi 94] John G. Proakis, Masoud Salehi, Communication Systems Engineering, Prentice Hall Int. Editions 1994.
John G. Proakis, Dimitris G. Manolakis, Digital Signal Processing, Prentice Hall Int. Editions 1996.
[Petersson 97]
[Salomon 97]
[Valens 99]
50 (49)
Appendices
Error % saved elements FFT 20 15 5 1 0.317940 0.406600 0.687840 0.811960 Pulse-echo Haar 0.233810 0.301030 0.622270 1.000000 Db8 0.175670 0.248250 0.568250 0.881240 FFT 0.004870 0.010118 0.086005 0.152450 TOFD Haar 0.017057 0.026142 0.069284 0.220020 Db8 0.001660 0.004193 0.025588 0.091936
FFT
File name % saved elem. Max ampl. dB -3,5 -3,4 -3,3 -4,4 Max scan deg 84,2 84,6 84,6 84,6 Max index mm 149 149 149 149 Soundpath mm/2 14,7 13,9 13,9 13,9 Scan 1 6dB deg 79 79 83,4 83,4 Scan 2 6dB deg 87,8 86,2 86,2 86,6 Index 1 -6dB mm 149 149 149 149 Index 2 -6dB mm 161 161 149 153 Scan length deg. 8,8 7,2 2,8 3,2 Length index mm 12 12 0 4 Comment Uncompressed file Distorted A-scan Distorted A-scan Distorted A-scan Signal amplitude to low
Db8
File name pe4_2d_db8_3 pe4_2d_db8_2 pe4_2d_db8_1 pe4_2d_db8_4 % saved elem. 20 15 5 1 Max ampl. dB -3,5 -3,5 -3,5 -3,1 Max scan deg 84,2 84,2 84,6 84,6 Max index mm 149 149 149 149 Soundpath mm/2 14,7 14,7 13,9 13,9 Scan 1 6dB deg 83 83 79 83 Scan 2 6dB deg 86,2 86,2 87,4 85,8 Index 1 -6dB mm 149 149 149 149 Index 2 -6dB mm 149 149 161 149 Scan length deg. 3,2 3,2 8,4 2,8 Length index mm 0 0 12 0 Comment
pe4m orginal
pe4_2d_fft_3
pe4_2d_fft_1
pe4_2d_fft_2
5% saved elements
pe4_2d_fft_4
1% saved elements 3
pe4_2d_db8_3
pe4_2d_db8_2
pe4_2d_db8_1
5% saved elements
pe4_2d_db8_4
1% saved elements 5
Defect 4
FFT
File name to4m to4_2d_fft_1 to 4_2d_fft_2 to 4_2d_fft_4 to 4_2d_fft_3 % saved elements Original 20 15 5 1 Max ampl. dB 24,4 24,8 24,9 21,7 17,6 Lath 7,894 7,894 7,894 7,894 7,894 Tip 1 9,877 9,877 9,877 9,877 9,877 time 1,983 1,983 1,983 1,983 1,983 Depth (mm) 12,63 12,63 12,63 12,63 12,63 Num. of scan lines 11 11 11 11 11 Signal shape not to good Noise Comment
Db8
File name to 4_2d_db8_3 to 4_2d_db8_1 to 4_2d_db8_2 to 4_2d_db8_4 % saved elements 20 15 5 1 Max ampl. dB 24,1 24,7 24,5 25,0 Lath 7,894 7,894 7,894 7,894 Tip 1 9,877 9,877 9,877 9,877 time 1,983 1,983 1,983 1,983 Depth (mm) 12,63 12,63 12,63 12,63 Num. of scan lines 11 11 11 11 Signal shape not to good Noise Comment
Defect 5
Error % saved eleme nts 20 15 5 1 FFT 0.198140 0.287320 0.627750 0.745690 Pulse-echo Haar 0.203560 0.256490 0.549540 0.817920 Db8 0.167630 0.249140 0.438320 0.821220 FFT 0.040112 0.089521 0.343180 0.580810 TOFD Haar 0.059895 0.089051 0.227780 0.561480 Db8 0.020100 0.036558 0.147360 0.374410
FFT
File name Pe5m pe5_2d_fft_3 pe5_2d_fft_2 pe5_2d_fft_1 pe5_2d_fft_4 % saved elem. Original 20 15 5 1 Max ampl. dB -2,6 -2 -2 -3,4 Max scan deg 169,2 169,2 169,2 171,2 Max index mm 27 27 27 31 Soundpath mm/2 25,6 25,6 25,6 29,9 Scan 1 6dB deg 165,7 165,7 164,2 163,7 Scan 2 6dB deg 173,7 173,7 178,7 182,8 Index 1 -6dB mm 21 21 18 18 Index 2 -6dB mm 47 47 49 49 Scan length deg. 8 8 14,5 19,1 Length index mm 26 26 31 31 No detectable indication Comment Uncompressed file
Db8
File name pe5_2d_db8_2 pe5_2d_db8_1 pe5_2d_db8_4 pe5_2d_db8_3 % saved elem. 20 15 5 1 Max ampl. dB -2,6 -2,2 -1,4 -1,9 Max scan deg 169,2 169,2 169,2 169,2 Max index mm 27 27 27 31 Soundpath mm/2 25,6 25,6 25,6 26,4 Scan 1 6dB deg 165,7 167,2 167,2 167,2 Scan 2 6dB deg 173,7 174,2 173,7 172,7 Index 1 -6dB mm 18 23 23 23 Index 2 -6dB mm 49 47 39 39 Scan length deg. 8 7 6,5 5,5 Length index mm 31 24 16 16 Comment
10
pe5m orginal
11
pe5_2d_fft_3
pe5_2d_fft_2
pe5_2d_fft_1
5% saved elements
pe5_2d_fft_4
1% saved elements 13
pe5_2d_db8_2
pe5_2d_db8_1
pe5_2d_db8_4
5% saved elements
pe5_2d_db8_3
1% saved elements 15
Defect 5 FFT
File name to5m to5_2d_fft_3 to5_2d_fft_1 to5_2d_fft_2 to5_2d_fft_4 % saved elements Original 20 15 5 1 Max ampl. dB 26,4 17,6 24,8 24,9 21,7 Lath 13,694 13,694 13,694 13,710 13,710 Tip 1 15,360 15,344 15,360 15,360 15,377 time 1,666 1,650 1,666 1,650 1,667 Depth (mm) 14,90 14,81 14,90 14,81 14,90 Num. of scan lines 11 11 11 11 11 Noise Comment Noisy signal Noisy signal Noisy signal Noisy signal Noisy signal, signal shape not ok
Db8
File name to5_2d_db8_1 to5_2d_db8_2 to5_2d_db8_3 to5_2d_db8_4 % saved elements 20 15 5 1 Max ampl. dB 24,7 24,5 24,1 25,0 Lath 13,694 13,710 13,694 13,694 Tip 1 15,344 15,344 15,360 15,344 time 1,650 1,634 1,666 1,650 Depth (mm) 14,81 14,73 14,90 14,81 Num. of scan lines 11 11 11 11 Noise Comment Noisy signal Noisy signal Most noise gone Signal shape not ok
16
17
18
Defect 8
Error % saved elements 20 15 5 1 FFT 0.143830 0.204240 0.435510 0.689470 Pulse-echo Haar 0.091941 0.144500 0.440680 0.795590 Db8 0.069915 0.096112 0.272820 0.865840 FFT 0.005039 0.021851 0.206570 0.383470 TOFD Haar 0.007032 0.012428 0.051234 0.117070 Db8 0.000722 0.001901 0.015114 0.069977
19
FFT
File name pe8m p e8_2d_fft_2 p e8_2d_fft_1 p e8_2d_fft_4 p e8_2d_fft_3 % saved elem. Original 20 15 5 1 Max ampl. dB -1,1 -1,1 -1,1 -1,6 -3,2 Max scan deg 295,9 295,9 295,9 295,9 296,2 Max index mm 140 140 140 140 140 Soundpath mm/2 26,3 25,5 25,5 25,5 25,5 Scan 1 6dB deg 294,1 293,8 294,1 294,1 294,1 Scan 2 6dB deg 298,9 298,9 298,9 298,9 298,9 Index 1 -6dB mm 120 120 120 120 120 Index 2 -6dB mm 140 140 140 140 140 Scan length deg. 4,8 5,1 4,8 4,8 4,8 Length index mm 20 20 20 20 20 Comment Uncompressed file
Db8
File name pe8_2d_db8_2 pe8_2d_db8_1 pe8_2d_db8_4 pe8_2d_db8_3 % saved elem. 20 15 5 1 Max ampl. dB -1,4 -1,4 -1,4 -1,3 Max scan deg 295,9 295,9 295,9 295,6 Max index mm 140 140 140 140 Soundpath mm/2 26,3 26,3 26,3 27,1 Scan 1 6dB deg 293,8 293,8 294,1 294,1 Scan 2 6dB deg 298,9 298,9 298,9 298,9 Index 1 -6dB mm 120 120 120 120 Index 2 -6dB mm 140 140 140 140 Scan length deg. 5,1 5,1 4,8 4,8 Length index mm 20 20 20 20 Comment
20
pe8m orginal
21
pe8_2d_fft_2
pe8_2d_fft_1
pe8_2d_fft_4
5% saved elements
pe8_2d_fft_3
1% saved elements 23
pe8_2d_db8_2
pe8_2d_db8_1
pe8_2d_db8_4
5% saved elements
pe8_2d_db8_3
1% saved elements 25
Defect 8
FFT
File name to8m to8_2d_fft_3 to8_2d_fft_1 to8_2d_fft_2 to8_2d_fft_4 % saved elements Original 20 15 5 1 Max ampl. dB 24,4 24,6 25,6 17,8 Lath 7,277 7,277 7,277 7,277 Tip 1 8,444 8,444 8,444 8,460 time 1,167 1,167 1,167 1,183 Depth (mm) 8,29 8,29 8,29 8,35 Num. of scan lines 18 18 18 18 Noise Comment Some noise before latheral Some noise before latheral Signal noisier Signal noisier No measurable signal
Db8
File name to8_2d_db8_3 to8_2d_db8_1 to8_2d_db8_2 to8_2d_db8_4 % saved elements 20 15 5 1 Max ampl. dB 24,9 24,4 20,7 12,5 Lath 7,277 7,277 7,244 7,277 Tip 1 8,444 8,444 8,427 8,427 time 1,167 1,167 1,183 1,150 Depth (mm) 8,29 8,29 8,35 8,21 Num. of scan lines 18 18 18 18 Noise Comment No noise No noise No noise Bad signal shape
26
27
28