Sunteți pe pagina 1din 50

-1-

DIGITAL IMAGE PROCESSING AND INTERPRETATION*


Qiming Zhou Department of Geography, Hong Kong Baptist University Kowloon Tong, Kowloon, Hong Kong Phone: (852) 23395048, Fax: (852) 23395990, E-mail: qiming@hkbu.edu.hk Digital images, particularly those from remote sensing technology, have become an important source of spatial information. In modern Geographical Information Systems (GIS), digital remotely sensed images are widely recognised as one most practical means for spatial information updating, especially in real-time applications. However, in most todays applications, the remotely sensed data may only be used with their greatest potentials if they can be correctly interpreted, classified and presented in the same way as other terrestrial spatial information, such as thematic maps. This lecture note demonstrates the methodologies and techniques of extracting thematic information from digital images. For introduction, the nature of digital images and characteristics of earth objects in relation to image interpretation are discussed. Then the discussions are focused on the techniques of image enhancement, interpretation and autoclassification using black-and-white (or single band) images or multispectral images. Methods and techniques used for integrating digital images with spatial information systems are also discussed. For the purpose of this note and ease of discussion, the digital images referred here only include those from passive remote sensing.

The Nature of Digital Images

In our everyday experience, we are exposed to images in magazines, on billboards, on television and with snapshot photos. It is easy to think of all images as photographs, but photography is only one way that an image can be made. It is common today to use video cameras, digital cameras or other machines to record images, as well as conventional photography. In many cases, a hard copy of the image is distributed on photographic film, when the original images were recorded in some other manner. Photographs are routinely converted to digital images for computer enhancement. It is important to understand the distinction between the photographic process and electronic image recording processes.

1.1 Photographic Film


Photographic film reacts chemically to the presence of light. The more light, the greater is the chemical reaction in the film. Light causes a reaction in the silver salts in the film that turns part of the grains to silver. A chemical developer is used to complete the conversion of exposed grains to pure silver. In a black and white negative film, the silver salts are washed away leaving only black silver where the light exposed the film. The density of silver grains controls how dark the negative appears.

Lecture Notes for Subject GEOG3610: Remote Sensing and Image Interpretation, Department of Geography, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong. Copyright 1999 Qiming Zhou. All rights reserved.

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The shading between very light and very dark objects in the film appears as a smooth transition to the eye, even though it is caused by a number of solid black grains (Figure 1). Through a microscope, however, photographic film is merely a sea of black particles on a clear base. Photographic paper is essentially the same as film except that it is backed with paper rather than clear plastic. To create a print, photographic paper is exposed using an apparatus that focuses the negative onto the Figure 1. A photo appears continuous paper. Colour photography works in a similar manner to however individual grains of silverblack and white except that the silver grains are replaced halide turn dark in response to light. with colour pigments during the developing process.

1.2 Photo-Electrical Imaging


Electronically recorded images are not made using a chemical reaction, but are created using light-sensitive computer chips. Certain types of computer chips create an electrical current when illuminated by light. This is known as the photo-electrical effect. If light is focused on a photoelectric chip (a photo-detector), the amount of electricity created is proportional to the amount of light hitting the chip. If an array of detectors is placed onto the back of a camera instead of film, an electrical image is produced whereby each picture element (or pixel) has an electrical intensity instead of a different density of exposed silver halide grains (Figure 2). The size of the pixel depends on the size of the array of detectors and the size of the object that is focussed onto the array (Figure 3).

Electricity
Figure 2. Light from the object is focused onto a light-sensitive computer chip. The electricity produced is proportional to the brightness. Figure 3. An image made using photoelectric chips records the relative amount of light hitting each chip. If the film in a camera were replaced with an array of 16 chips the light pattern shown at right would create an image as shown at left.

Digital still-frame cameras and video cameras work on much the same principle as a photographic camera (Figure 4, top and middle). The digital still-frame camera replaces the film with an array of photo-detectors. Video cameras store the image as a series of lines sampled from the image in the camera. Each line is a record of the brightness of that strip across the image. Many earth resource remote sensing systems focus the image directly onto the photo-detector, rather than using a camera-like system. The idea is the same as if you viewed the earth through a cardboard tube, recording the image in strips much like a video camera, scanning across each line (Figure 4 bottom).

Page 2 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Digital still-frame camera

Array of photodetectors

Digital video camera

Lines across image scanned onto photodetectors

Electro-optical scanner

Mirror scans object onto photodetectors

Figure 4. Digital images can be created by measuring the light from a number of sources.

1.3 Analogue versus Digital


An analogue signal is a continuous record of phenomena. The record can be physically etched into a surface (e.g. the sound energy etched into the grooves of an LP record, or the inked lines drawn by a seismograph); stored on magnetic tapes (e.g. the sound energy stored by magnetising magnetic tape in a cassette); or broadcast electromagnetically (e.g. the sound energy carried on an electromagnetic wave as radio sound). Figure 5 shows an analogue signal of the level of brightness across the image above the graph (draw any line horizontally across the shaded image). One of the main problems with a recording of an analogue signal is that any smudges, impurities, marks or anything that changes the recording in any way is interpreted as part of the signal during playback. The crackle and pop of old LP records, for example, are dust and marks in the grooves, whilst the hiss on a cassette tape is the random background magnetic signal on the tape.

Signal strength

Distance
Figure 5. Analogue signal (cross-section of top image shown on bottom)

Page 3 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

1.4 The Digital Signal


One solution to the problems with analogue signals is to convert them to a numeric representation of the original signal. The numbers can be stored in any available manner, and then used to reproduce the original signal without noise or distortion. The process of dividing the analogue signal up intervals is called X sampling. Figure 6 shows how both the intensity (Y axis) and distance (X axis) can be broken up into a Figure 6. The first step of digitising number of discrete regions. The greater the number of determining the sampling interval. If this regions or bins, the greater the sampling density. is applied to Figure 5, X is distance
across the image, Y is the signal strength.

Once the sampling density along the distance axis (Figure 5, X in Figure 6) has been determined, the signal must be reduced to a single value for each bin (Figure 7a and Figure 7b) which ideally is the average of the signal within the bin. Generally the number of intensity levels is fixed, requiring that the digital value set to the closest possible alternative (Figure 7c the darkest boxes show the digital signal that is to be stored). The sampling interval along the horizontal axis in Figure 7 (the base in Figure 8) is a known interval, and thus does not have to be recorded explicitly. The intensity values (vertical axis) are all that needs to be recorded as long as they are kept in order. The digital values for the signal shown in Figure 5 are shown along the base in Figure 8 (4, 8, 11, 11, 6, 2, ). The sampling process described here is performed using an analogue to digital converter (DAC).
13 12 11 10 9 8 7 6 5 4 3 2 1 0

Figure 7 (Up). Sampling an analogue signal along one axis (a), the sample signal (b), and reducing to a discrete intensity value (c).

Figure 8 (Right). The resulting digital signal (along the bottom)

4 8 1111 6 2 3 4 6 8 8 6 5 3 2 2 3 3 4 5 6 7 7 7 6 6 5

1.5 Pixels and Digital Images


The numeric values recorded for each sampled element of the signal are known as a digital number or DN. To convert a photograph into a digital image the image is divided into a regularly spaced grid (an array) where the average brightness of each cell is measured and Page 4 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

stored as a DN. Each cell of the grid that makes up the image is known as a picture element or pixel. The DN recorded for each pixel represents the average brightness for the area covered by that pixel. The process of measuring each pixel is known as scanning, which is why the term scanner crops up repeatedly in desktop publishing and remote sensing (desktop scanner, multispectral scanner). Figure 9 shows a photograph that was scanned using a various number of pixels.

Figure 9. A digital image and its spatial resolution. As fewer and fewer pixels are used, the area within each pixel is averaged. From left to right, 275x337 pixels, 92x112, 28x34 and 7x8.

The number of pixels used to create a digital image depends on the degree of clarity that is needed and the particular characteristics of the scanner. The fewer pixels that the image is divided into, the less clear the image appears. If any digital image is looked at very closely, it will appear blocky, as the individual pixels become visible. Satellite sensors focus a small patch of the surface of the earth on a single detector, recording a pixel value for an area that varies from a few metres across, to tens or hundreds of metres across. A digital sensor in a camera that is used by news reporters can produce an image with pixels so small that it is difficult to tell the difference between the digital image and a photograph. A desktop scanner used for computer publishing can be controlled to measure pixels in micrometers. The actual size of each pixel on the ground or on the subject is controlled by the distance from the target and the individual characteristics of the sensor. The size of the pixels in an image scanned from a photograph depends on the scale of the original photograph, and the number of pixels that it is divided into.

1.6 Digital Numbers


In a remote sensing image, the range of values of DNs in the image depends on the particular sensor. The electric signal that is created by the sensor is converted to a digital number by a DAC the resulting numeric value ranges from some minimum value (when the target is very dark), to a maximum value (when the target is extremely bright). The DAC converts that range into numerical values that are stored as binary integers often in the range of 0 to 255 or 0 to 65,536 (in one or two bytes). The data range is often arbitrary, although it is related to the radiometric sensitivity of the sensor. Figure 10 shows an image with an arbitrary range of ten values from 0 to 9.
8 8 8 8 8 8 8 6 4 4 4 6 8 6 0 0 2 6 8 6 0 0 2 6 8 6 2 2 2 6

Figure 10. The string of digital numbers on the left is to represent the image on the right.

Page 5 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The entire range of data values is often not used. The sensor and the DAC are designed so that very seldom is the image source darker or brighter than that sensor is capable of reading (and thus seldom are the DNs at the minimum or maximum value). Most images are neither very dark, nor extremely bright, and fall somewhere in between the minimum and maximum digital number. However, in some rare cases, an object may be brighter than that the sensor can measure, such as the cases with snow or clouds, thus the surface would be recorded with the maximum value (this is also called saturated).

1.7 Image Geometry


Digital images can be thought of as a regular grid of (normally) square pixels. The pixel shape is generally assumed to be square by most computer image processing programs, which may not be the case with raw data. The degree of distortion depends on the type of sensor and platform, the width of the view angle or Field of View (FOV), and the altitude of the platform. Distortion is of two types: systematic distortion and non-systematic distortion. Systematic distortion occurs in a regular, predictable manner usually due to known characteristics of the platform, scanner or the orientation of the platform relative to the earth, while non-systematic distortion is due to external factors that cannot be calculated or computed from known variables. The distortions in aircraft scanner images are often much more complex than with satellite-based scanners because the attitude of the aircraft changes along the flight path, and the scanning arc is generally greater than with a satellite-based system. Distortion due to flight-path perturbation can be removed systematically only if the exact orientation of the aircraft is known from gyroscopic data recorded at the time of the flight. 1.7.1 Systematic distortion

Systematic distortion is a regular distortion of an image that can be calculated and corrected. The causes of systematic distortion include, e.g., the FOV of the sensor, earths curvature, and earths rotation. The type of systematic distortion that occurs in an image depends largely on the altitude of the platform, and the type of sensor and platform used. The large view angle common with aircraft-based scanners and satellite-based sensors such as NOAA AVHRR causes the pixel size to change as the distance from the flight path increases (Figure 11). The extent of the distortion can be calculated from the FOV of the scanner, the altitude of the platform, and the look angle for each pixel. The curvature of the earth also needs to be taken into account with high-altitude imagery with a large view angle such as NOAA AVHRR. This is not a problem with Landsat MSS or TM, or SPOT MSS due to their high altitude and very small view angle. Satellite-based platforms such as Landsat or SPOT have a near-polar orbit that passes from north to south. If the earth did not rotate, the strip of image data captured by the sensor would have little geometric distortion since the sensors on board both satellites have a small view angle and normally look straight down at the earth (SPOT MSS can also look to the side). The earth rotates out from under the satellite, however, resulting in the strip slowly moving westward towards the south end of the pass (Figure 12). The westward skew can be calculated from the trajectory of the satellite orbit and the speed of the earth underneath the satellite at any given latitude. The data vendor generally takes care of systematic distortion unless the raw data is specifically requested.

Page 6 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Figure 11. If a scanner has a wide view angle pixels that are farther from the nadir (straight down) point are increasingly distorted. If this is not compensated for the edges of the image become compressed.

Figure 12. Earth resource satellites, such as Landsat, orbit at approximately 700 km above the earths surface in a near-polar orbit. The earth rotates out from under the satellite resulting in the image area moving progressively westward as the satellite passes from the north to the south.

1.7.2

Non-systematic distortion

Many factors that affect the geometric orientation of the image cannot be calculated because of unknown variables or perturbations in the orientation of the scanner. In the case of two satellite images, the absolute orientation will be different on each pass of the satellite (although it will be close). The orbital characteristics of the satellite will dictate where each IFOV is located, and although the corner of the image can be estimated to some degree using the known orbit, the images usually are offset from one another. 1.7.2.1 Radial Distortion The physical manner in which the photograph or image is captured leads to other types of distortion (see Figure 11 to show the geometry of an aircraft-based scanner). The arrows shown in the photograph and the image in Figure 13 show how Figure 13. Radial distortion in photographs, outward distortion in airborne scanner tall objects appear in a photograph and a side-to- images, and displacement due to elevation side image scanner. The camera focuses an entire differences. The degree of displacement on an image plane is shown in the right scene onto film at one time, while the scanner diagram: a shows the location on the image image is built up a line at a time (the tall object is at A, while b shows laid over along the scanned line). In these examples plane for a tree locatedelevated tree B. the location of the the aircraft is flying from the top of the photo or image to the bottom.

Page 7 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

1.7.2.2 Displacement due to Elevation The relative location of objects in both cases will be different than on a map due to location displacement caused by elevation differences. In Figure 13, the tree B appears on the photograph or image at point b on the imaging plane (a side view of the photo or image). If the tree was located at A the position on the imaging plane would be a. The displacement from elevation is used by the eye to gauge distances. The difference in location of objects in two images or photographs can be used to determine the elevation of the ground. The size of the difference is known as parallax, the closer the object is to the camera, the greater the parallax will be in subsequent stereo photographs. The same effect can be seen when driving down the highway; the nearby telephone poles or fence posts pass by quickly (large displacement), while distant objects pass by slowly.

Factors to be Considered for Thematic Information Extraction

Remotely sensed images are presented in a way similar to a normal photograph, except it is in digital numbers that represent brightness values. To extract thematic information out of these digital numbers, three basic factors must be considered, namely spectral, spatial and temporal characteristics. The spectral characteristics refer to the nature of electromagnetic radiation that is emitted or reflected from the earths surface and the capability of the sensor to detect it. The spatial characteristics describe the size of the earths objects to be detected in comparison with the spatial resolution of the sensors. Given that we live in a changing world, the temporal characteristics must be considered while attempting monitoring our environment, not only about the nature of phenomena, but also the capability of the sensor.

2.1 Spectral Characteristics


Human eyes are a form of remote sensing detector. They react to light that comes from or is reflected from things around us. The light that human eyes see, however, is just a small portion of a continuous spectrum of energy called the electromagnetic spectrum. The electromagnetic spectrum is made up of energy that is known as electromagnetic radiation (EMR) because the pulses or waves can be measured both electrically and magnetically. Specific names are used to describe the different wavelengths that make up the electromagnetic spectrum (Figure 14). The light that human eyes can see is called visible light only because it is visible to the eye. Indeed, there is nothing particularly special about the visible portion of the electromagnetic spectrum other than the fact that humans can see it.

Page 8 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Short wavelengths (high frequency)

Long wavelengths (low frequency)

Gamma

X-ray

Visible

Infrared

Thermal infrared

Microwave

Radio

nanometres blue green

micrometres red Visible light 0.4 - 0.7 micrometres

centimetres

metres

Figure 14. The range of wavelengths known as the electromagnetic spectrum. Our eyes can only see a small part of this energy known as visible light (see zone marked visible light).

2.1.1

The Nature of Electromagnetic Radiation

The energy of EMR can be detected when it interacts with matter. In the absence of matter (in a vacuum) EMR travels at just under 300,000 km per second. In matter the speed is slightly slower (such as through the atmosphere). In fact, the denser the matter, the slower is the speed. Electromagnetic radiation interacts with different types of matter in different ways. In fact, the way that the EMR is viewed causes it to appear to be either a wave or a particle. For instance, the refraction (or bending) of light as it passes through glass is best explained by describing the radiation as a wave. The light with a shorter wavelength, such as blue, is slowed more than that with a longer wavelength, such as red. The shorter waves are bent or refracted more than longer waves causing the light to spread out like a fan with the longest wavelengths at one side of the fan and the shortest wavelengths at the other (Figure 15).

Figure 15. White light is separated into a fan of light with distinct wavelengths as it passes through a prism. The shortest wavelengths slow the most (and thus are bent the greatest amount).

The wave oscillates at a frequency that is inversely proportional to the length of the wave (the distance between the peaks of two waves).
c =

(1)

were

c = speed of light = 299,792.8 km/second = frequency (oscillations per second in Hertz) = wavelength (m)

The energy associated with EMR is proportional to the frequency of the radiation. As the energy level of the radiation increases it does so in discrete steps as if the radiation was made of individual vibrating bundles of energy. These bundles are known as photons or quanta and can behave in a manner similar to particles except that they have no mass. The amount of energy in a photon can be calculated using either the frequency or wavelength of the energy. Page 9 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

E = h =

hc

(2)

where E = Energy of a photon in Joules h = Plancks constant 6.626 10-34 Joules second = frequency (oscillations per second in Hertz) = wavelength (m) Most remote sensing is based on the detection of EMR, whether the detector is a human eye, a camera or a scanner on a satellite (an electronic camera). What can be learned from remote sensing depends on what type of radiation is detected and how it interacts with the surface of the earth (or any other surface that is being looked at). Human vision is particularly important because any information that is collected generally is interpreted visually from a printed image or on a computer screen. 2.1.2 The Electromagnetic Radiation from Earth Materials

All matter in the universe that is warmer than 0K (or -273.15C) emits electromagnetic energy. Molecular motion stops at 0K the coldest possible temperature, which is also known as absolute zero. All objects in every day life are warmer than absolute zero. The wavelength of EMR ranges from very short (nanometres) to very long (kilometres) as was shown in Figure 14. The amount and type of energy that is emitted depends on the temperature of the object. Very cold objects only emit energy with very long wavelengths, while warm objects emit both long and short wavelengths. In addition to emitting a wider range of wavelengths, the warmer object emits more energy than the cold object. This higher energy level is caused not only by an increase in the amount of EMR being emitted, but also by the fact that shorter wavelength EMR contain more energy (Figure 16).

Figure 16. A very cold object (bottom) only emits long wave energy. As the object is heated (middle) the amount of emitted energy increases, and the average wavelength of the energy decreases. The hottest object (top) not only emits the most energy, but also emits short-wave energy

The total amount of energy and the range of wavelengths of that energy that are emitted from a surface are predicted by a set of equations. The term blackbody refers to a surface that absorbs all incoming energy (and hence, looks black) which re-emits that energy in perfect accordance with the Stefan-Boltzmann law:
M = T 4

(3)

Page 10 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

where M = Total radiant exitance from surface in W/m2 (W = Watts) = Stefan-Boltzmann constant = 5.6697 10-8 Wm-2K-4 T = absolute temperature of the surface (K). To quote Lillesand and Kiefer (1994, pp. 7) a blackbody is a hypothetical, ideal radiator that totally absorbs and re-emits all energy incident upon it. Actual objects only approach this ideal. The wavelength at the peak of the blackbody curve is the wavelength of maximum emittance which is directly related to the temperature of the surface (Wiens displacement law):

max =

W T

(4)

where max = Wavelength of maximum emittance in m W = Wiens constant = 2,897 mK T = absolute temperature (K). A surface needs to be very hot before it emits energy at such short wavelengths. Figure 17 shows a diagram of the amount of energy and the wavelengths of energy emitted from the sun, a hot stove element, and the human body. The light from the sun is slightly yellow because there is slightly more green and red light than blue light being emitted. In addition, some of the blue light that is emitted by the sun is scattered by the atmosphere, further reducing the amount of blue and making the sky appear blue. A hot stove element looks red because it is not hot enough to emit blue and green light, only red light and energy with longer wavelengths than we can see (infrared and longer). The energy that is emitted from the human body cannot be seen at all by human eyes because all of the energy is at wavelengths that are much longer than the eye can see. A surface that emits energy close to that of a blackbody is called a greybody, while a surface that emits close to a blackbody at some wavelengths but not at others is called a selective radiator. Most earth materials are selective radiators (Figure 18).
Sun at 5700C
Amount of energy

Hot iron at 800C (red hot) Human body at 37C


BGR

Amount of energy

A B C

Visible light Shorter wavelength Longer wavelength Shorter wavelength Longer wavelength

Figure 17. The wavelength and amount of energy that is emitted from the sun (top curve), hot iron (such as a glowing stove element, middle curve), and the human body (bottom curve).

Figure 18. Curves A, B and C show the energy emitted from three surfaces with different emission characteristics: (A) a blackbody, (B) a greybody, and (C) a selective radiator.

2.1.3

Spectral Characteristics of Sensors

One important consideration in selecting the wavelength range in which a remote sensor will detect is the atmospheric transmittance. The Earths atmosphere itself selectively scatters and absorbs energy in certain spectral range, allowing the rest of solar energy transmitted through it. Areas of the spectrum where specific wavelengths can pass relatively

Page 11 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

unimpeded through the atmosphere are called transmission bands, or atmospheric windows, whereas absorption bands, or atmospheric blinds, defined those areas where specific wavelengths are totally or partially blocked. For a remote sensor that is capable of seeing objects on the ground, the detectors must use the transmission bands (Figure 19).

Suns energy (at 6000K) Energy Infrared Visible Earths energy (at 300K)

0.3m 1m (a) Energy sources 100%

UV

10m Wavelength

100m

1mm

1m

Transmission

Energy absorbed and scattered

0%

0.3m 1m (b) Atmospheric transmittance

10m Wavelength

100m

1mm

1m

Human eye Photography Thermal IR scanners

Imaging radar

Electro-optical sensors

Passive microwave

0.3m

1m

10m Wavelength

100m

1mm

1m

(c) Common remote sensing systems

Figure 19. Spectral characteristics of energy sources, atmospheric effects and remote sensing systems (after Lillesand and Kiefer, 1994).

The spectral responses that a remote sensor can see is dependent upon the spectral bands that the sensor detects. In remote sensing, the spectral range is usually composed of a number of spectral bands (fall within the atmospheric windows), ranging from a single band image (panchromatic image) to several hundreds (hyper spectral image). Usually, a term multispectral is applied to the images that are composed of several spectral bands. The spectral characteristics of commonly used space-borne sensors, their spectral bands and primary use are listed in Table 1.

Page 12 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Table 1. The spectral characteristics of some currently operational space-borne remote sensors.

Satellite Landsat

Sensor / image MSS

No. of Bands 4

Band 1 2 3 4 1

Spectral Range (m) 0.5 0.6 0.6 0.7 0.7 0.8 0.8 1.1 0.45 0.52

Primary Use Culture features, clear water penetration Vegetation/soil discrimination Delineating water bodies, geology Delineating water bodies, vegetation vigour and biomass Coastal water mapping, soil/ vegetation discrimination, forest type mapping, and cultural features Vegetation discrimination and vigour assessment, and cultural features Plant species differentiation, and cultural features Vegetation types, vigour, and biomass content, delineating water bodies, and soil moisture Vegetation and soil moisture content, differentiation of snow from clouds Vegetation stress analysis, soil moisture discrimination, and thermal mapping applications Discrimination of mineral and rock types, and vegetation moisture content General mapping, DTM generation Cultural features, clear water penetration Vegetation/soil discrimination and plant species differentiation Delineating water bodies, vegetation types vigour and biomass Daytime cloud and surface mapping, snow and ice extent Surface water delineation, snow and ice extent Detecting hot targets (e.g., forest fires), nighttime cloud mapping Determining cloud and surface temperatures, day or night cloud mapping Determining cloud and surface temperatures, day or night cloud mapping, water vapour correction

TM

2 3 4 5 6

0.52 0.60 0.63 0.69 0.76 0.90 1.55 1.75 10.4 12.5

7 SPOT PAN XS 1 3 1 1 2 3 NOAA AVHRR 5 1 2 3 4 5

2.08 2.35 0.51 0.73 0.50 0.59 0.61 0.68 0.79 0.89 0.58 0.68 0.725 1.1 3.55 3.93 10.3 11.3 11.5 12.5

2.1.4

Spectral Signatures of Some Earth Materials

For passive remote sensing, the light that a sensor detects is mainly the reflectance of the sunlight, which has an energy distribution over the entire spectrum, although some sensors do have the ability to detect energy emitted from earth surface itself (e.g. thermal infrared). In theory, the reflectance of sunlight from different kind earth materials is different from each other. The spectrum of the reflectance for a certain earth material is often unique, therefore it is called the spectral signature of the material. In remote sensing, it is fundamental to investigate the spectral signature before a correct image interpretation may be achieved.

Page 13 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

There are a huge variety of materials on the earths surface, therefore the recording of their spectral signatures (also known as spectral library) requires substantial financial and time investments. For years, efforts have been made to establish such spectral libraries and some of which have already been available, often associated with remote sensing image processing software packages. It is now becoming the common knowledge on the differentiation of some typical earth materials spectral signatures such as vegetation, soil and water (Figure 20).
NOAA AVHRR SPOT HRV Landsat TM Landsat MSS 60 Vegetation (green) Dry bare soil (grey-brown)

Reflectance (%)

40

20

Water (clear) 0 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 Wavelength (m)

Figure 20. Typical spectral reflectance curves of common earth surface materials in the visible and nearto-mid infrared range. The positions of spectral bands for some remote sensors are also indicated (after Richards, 1993).

For information extraction and image interpretation, the selection of appropriate bands of multispectral images for the application objectives is crucial task. The comparison of multispectral bands which present the most distinct difference between the cover types of interest gives the most promising hope for correct interpretation and classification, whereas difficulties are often experienced in separating cover types with image bands which records spectral regions where the cover types present similar response.

2.2 Spatial Characteristics


Another important factor for information extraction from digital images is the spatial extent of the objects to be interpreted and the sensors spatial resolution. By theory, if the objects are smaller than the image resolution, the objects cannot be reliably perceived so that their will not be correctly interpreted. In reality, however, some small objects may be visible on images with a lower resolution, provided that the objects have enough contrast against their background. In Figure 21, the linear features in the ancient lake are clearly visible indicating the road crossing the area. Since the road has a high contrast against its

Page 14 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

surrounding background, it is shown on the image regardless its width is far less than the 82m resolution of the MSS imagery. According to the sampling theory, the spatial resolution of the digital image must be at least the half of the size of the smallest object that are of interest, so that the shape of the object can be reliably presented on the image. This sets up the bottom-line limitation of the digital image concerned. Table 2 outlines the spatial resolution of commonly used remotely sensed data and their interpretation features and primary applications related to their spatial resolution.

Figure 21. Landsat MSS image of an ancient dry lake near Pooncarie, western New South Wales, Australia. Table 2. Examples of spatial resolution of some commonly used remotely sensed imagery.

Platform Aircraft SPOT

Sensor/ image Digital airphoto PAN XS

Spatial resolution 1 2m 10m 20m 30m 82m

Interpretation features Control points on cultural features Houses and streets Crop fields, water bodies, urban areas Crop fields, water bodies, urban areas Landforms, forest, pasture and agriculture areas Regional landforms, coastline Clouds, coastline

Primary applications Photogrammetry and mapping, urban management Urban planning Regional planning, agriculture, land use change Regional planning, agriculture, land use change Environment and pasture, rangeland management Regional monitoring, coastline and oceanography Weather forecast, oceanography, global change

Landsat

TM MSS

NOAA GOES

AVHRR GOES

1.1km 2.5 5km

One of the greatest challenges in remote sensing is the techniques and methodologies that deal with so-called mixel (i.e. the pixel covering various cover types within its spatial resolution). There has been large number of reports in the literature on the sub-pixel component modelling which intended to extract quantitative information from these mixels. Detailed discussion on this issue, however, is out of the scope of this note.

Page 15 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

2.3 Temporal Characteristics


Since we are living in a changing world, the frequent regular monitoring our environment is one of the major application areas for digital remote sensing. Multitemporal remotely sensed images make it possible for this. For thematic information extraction, the temporal factors that may influence the interpretation process and potential applications include, for example, acquisition date and time, frequency of coverage, and history of coverage. 2.3.1 Acquisition date and time

Date and time is often, in addition to the geographical position, the first criteria applied to acquisition of remotely sensed data. For many application projects, it is fundamental to acquire simultaneous imagery (or near simultaneous) to match up the ground truthing information. Even for applications that the simultaneous coverage is less critical (e.g. geological applications), appropriate timing for image acquisition can often provide the great assistance to the image interpretation. For example, the shading effect and shadow, which may vary on different acquisition dates (e.g. winter or summer), may provide great assistance in interpretation of topography, whereas day and night images acquired by some sensors may be used to distinguish different types of surface materials. 2.3.2 Frequency of coverage

The frequency of coverage of a particular type of remote sensor determines how often that we may use the derived images to monitor a given area. For most of todays commonly available satellite data, which are basically orthographic in nature, the re-visit frequency is determined by the satellite orbital characteristics. A satellite designed for earth resource monitoring usually takes a sun-synchronous orbit, meaning that the satellite passes over all places on the earth having the same latitude at approximately the same local time. However, some satellites (e.g. SPOT) have ability to take the order to acquire side-looking images, thus to provide more frequent coverage than the orthographic images only.
Table 3. Temporal characteristics of some commonly used satellite data and their area coverage.

Satellite SPOT Landsat NOAA GOES

Sensor/ image HRV PAN HRV XS MSS TM AVHRR GOES

Daylight crossing local time at the equator 10:30am 9:45am Even No.: 7:30am/pm Odd No.: 2:30am/pm

Re-visit period at the equator 26 days up to 7 passes/26 days 16 days 2 passes/day

Pixel resolution 10m 20m 82m 30m 1.1km 2.5 5km

Swath width 60km 185km 2800km Full-disk view

Geo-stationary

For some applications, such as flood and bush fire monitoring, the re-visit period is crucial. These events have a critical demand on the timely coverage and situations may Page 16 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

change very rapidly. Generally speaking, for a given sensor, the re-visit period and the spatial resolution are negatively related. This is because that the lower resolution imagery often covers a larger area so that the same area would be more frequently covered again. Given this fact, it is quite understandable that the most todays large area remote sensing applications relying on land monitoring most likely use lower resolution data (e.g. NOAA AVHRR). 2.3.3 History of coverage

Since the first launch of Landsat satellite, we now have near 30-year regular collection of satellite-borne imagery. This does not count the even longer collection history of aerial photographs and meteorological satellites. The historical collection of remotely sensed imagery gives us an opportunity to study long-term change and human impact in many aspects of our living environment. For a particular objective involved with past remotely sensed imagery, it is necessary to examine the history of coverage of the data concerned. This also involves the way by which the data were acquired and archived, as many past data sets have already been unavailable because of technical and management problems.

Feature Identification and Image Interpretation

In todays GIS environment, digital remotely sensed data has becoming one of the major input data source for information updating. However, the digital images need to be interpreted and transformed into thematic maps (or classified images) that are presented in the same way that other data layers are stored and manipulated. Correct feature identification and image interpretation in given application context, therefore, are the key to the successful utilisation of remotely sensed information.

3.1 Image Interpretation Keys


Identifying cover types that represent basic materials on the ground is both exercises in recognising patterns and interpretation of the colours or shades in an image. The ability to recognise the shape of particular features and the colour of various cover types is crucial to determining what is happening on the landscape. Those who are familiar with the landscape are one step ahead. The colour, or spectral response, of natural and man-made features is often similar because of the similarity of the materials. Trees, whether planted by man or occurring naturally, appear the same colour. However the patterns formed by the plants usually makes it possible to distinguish between them due to the species mix and the symmetrical nature of artificially planted forests. Crops are easily detected from their colour and dense pattern. Pavement and many rock types are often indistinguishable by colour alone because soils, sand and gravel are often used as construction materials. Concrete can look like certain rock types because the primary constituents of it are gravel and limestone derivatives. Looking for patterns is the first step in interpreting an image, and the next step is to determine what material a surface is likely to be made from.

Page 17 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

3.1.1

Grey Scale and Colour

The brightness and appearance of a cover type in an image depends on the wavelength band of the image, i.e., how much radiation is reflected in that colour or wavelength, and the relative brightness of other features in the same image (contrast). Boundaries between vegetated areas and non-vegetated areas show up well in the near-visible infrared (NIR) because of the high reflection of vegetation, the relatively lower reflection of other surfaces, and the lack of haze and atmospheric scattering in the infrared. Moreover, water absorbs most NIR, resulting in sharp water-land boundaries. The visible bands are good for identifying man-made objects, partially because they tend not to be made of growing vegetation. This also applies to identify agriculture crop fields and monitor the growth of crops (Figure 22). Vegetation absorbs most red light because chlorophyll in the leaves uses red light (and blue light, and to a lesser extent, green light) to turn nutrients and water into plant matter, and in the process using carbon dioxide and releasing oxygen into the atmosphere. The bright response from concrete, brick and gravel tends to show up well against the dark background of vegetation (in parks, gardens, fields and pastures). The stark contrast between bright vegetation in NIR, and bright Figure 22. Digital airphoto showing an agriculture concrete and rock in red, means that a colour area in North New South Wales, Australia. The composite that includes red and NIR difference in brightness indicates if the field is cropped or left as bare. Man-made features such as wavelengths is very useful for identifying road, quarry and houses are also clearly shown features. The mid-infrared bands of Landsat TM imagery are especially useful in rural environments. Soil differences tend to show up in the mid-infrared (MIR) generally because of differences in water and mineral content. Water absorbs radiation in the MIR area of the spectrum. Certain minerals, specifically clays that contain water in their structure, absorb mid-infrared radiation for this reason. The water content of plants will change the amount of MIR radiation that is absorbed. For example, dry grass will reflect much more MIR than the wetter grass. Other minerals such as carbonates absorb radiation only in the longer MIR. The usefulness of the MIR bands is one reason that TM data, even with lower spatial resolution compared with SPOT HRV, is most commonly used in rural areas. 3.1.2 Pattern and Texture

Spatial patterns are formed by the overall shape of things on the ground, and the texture of the image is due to repetitive features such as rows of trees, patches of woody weed or blocks of cultivated land. Naturally occurring patterns tend to be somewhat curvilinear and random, while man-made patterns are often rectilinear and regular. Table 4 outlines some of the key cues that can be used to determine whether a particular feature is likely to be

Page 18 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

naturally occurring or artificial. Figure 23 shows examples of typical spatial patterns and textures for natural and artificial features.
Table 4. Pattern and texture of some natural and artificial features.

Natural Features Irregular patterns - curves, meanders Hills, valleys, gullies Rivers, streams, lakes, dry riverbeds Vegetation type - trees, bushes, grasses Drainage patterns Vegetation patterns

Artificial Features Regular patterns - lines, regular shapes Paddocks, fields, bare soil or dense crops Roads, railroad tracks, canals Buildings, bridges, dams Road networks, fire breaks, power lines Vegetation patterns (linear, rectangular)

Forest (airphoto)

Gullies (SPOT Pan)

Buildings (SPIN-2)

Road network (airphoto)

Figure 23. Some examples of spatial patterns and textures for natural and artificial features.

In Figure 23, it is quite noticeable that artificial features are likely to follow some regular patterns with shape contrast between features and background. Natural features, on the other hand, rarely show such regular pattern. Rather, they often show a highly irregular pattern with relatively lower contrast between the features and background. 3.1.3 The Shape of Objects

The shape of an object describes the external form or configuration of the object. Cultural objects tend to have geometrical shapes and distinct boundaries, whereas natural features tend toward irregular shapes with irregular boundaries. From a vertical perspective, such as the view from a satellite, the shape of some objects is so distinctive that they can be conclusively identified by their shape alone. Figure 24 shows a well-known cultural feature that shows a unique shape needing very little effort to interpret. Other cultural features having easily recognised shapes include airport runways (on the right side of Figure 25), agricultural fields, and certain archaeological sites. Many natural features also have distinctive shapes, such as sand dunes, volcanic cinder cones, alluvial fans and riverbeds (Figure 25).

Page 19 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Figure 24. SPIN-2 imagery showing Great pyramids of Giza (Cheops, Cephren and Mycerinus) near Cairo, Egypt.

Figure 25. SPOT Panchromatic image of Yulin City, Shaanxi Province, China. The sand dunes on bottom show distinctive shapes for interpretation.

3.1.4

The Spatial Context

Certain objects are geographically linked to other objects, so that identifying one tends to indicate or confirm the other. The spatial context is one of the most helpful clues for identifying cultural features that comprise aggregate components. For example, in Figure 26, the bright area, which shows the spectral response of concrete materials, next to the runway are characteristic features of an airport terminal, whereas scattered brighter features on the water can reasonably be guessed as sea vessels. Spatial context for natural features, however, needs some knowledge about geography. For example, Figure 21 shows an ancient lake that is dry in modern days. On its lee side (right side on the image) shows a bright strip that indicates a sandbank (also called lunette) as the result of the long-history of aeolian sedimentation.

Figure 26. Landsat TM image (band 3) showing the geographical association between airport runway and adjacent concrete area.

Page 20 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

3.2 Procedure for Image Interpretation


Image interpretation is a task that relies on human analysts knowledge and experience. Although the knowledge on general principles about remote sensing is fundamental for successful interpretation, the extensive knowledge and experience on the features of interest (i.e. application fields such as geography) will in many ways enhance our ability to make correction feature identification and image interpretation. To make a map through image interpretation, the following procedure would help, although in practice some of the steps may be varied. Establish interpretation goals: This is arguably the most important step. The aim and objectives for the image interpretation must be clearly established in the beginning of the project. Without clear goals, the analyst may waste a great deal of resources and time on something that may be not even relevant. Decisions need to be made at this stage include: a) types of maps to be produced, b) map scale, and c) output products (digital or hardcopy). Establish classification system: Since the most interpretation projects aim to produce maps, it is essential to establish the classification for the end product. Sometimes, a disciplinary or national standard must be followed. Practically, however, it may be difficult to completely follow the standard, therefore, a compromise might be needed to simplify or modify the classification system that may be different from the standard. Regardless to what extent that the classification has to be varied, it is important to understand that the classification must serve the project goals properly. Image acquisition: Based on map scale and classification system, images are acquired for the interpretation exercise. While acquiring the images, the spectral, spatial and temporal characteristics of the images and corresponding requirements for feature identification and image interpretation need to be matched. In many cases, the ratio of image cost against benefit for image interpretation also has to be balanced. Collection of reference material: Reference materials such as maps, field records and photographs can greatly help image interpretation, particular in the area where the analyst is not very familiar. With todays development of GIS technology, thematic information can also be available from sources such as national and regional geographical databases, or even from Internet. For example, Microsofts terra server web site now provides on-line catalogue and ordering services for image data distribution, together with ancillary materials such as digital maps, text and multimedia presentations. Establish interpretation keys: Interpretation keys need to be established in the early stage of interpretation. Usually this is done by selecting the typical combination of colour, pattern and texture for a known feature. These keys will then be used as the guidance for later mapping and interpretation operations. Interpretation and draft map: With the established interpretation keys and classification system, the image can be interpreted and classified, and draft map can be produced. Accuracy assessment: The draft map needs to be further checked against the reference data sets, so that appropriate accuracy assessment exercises can be carried out as outline in Section 4.5 of this note.

Page 21 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

3.3 Limitations
Although feature identification and image interpretation are the basic skills that a remote sensing analyst must master, they have some significant limitations in the real-world applications if they are sole methodology to be used. First, the image interpretation (here we mean the image interpretation by a human interpreter) has a high demand on the analysts skills and experience, not only in remote sensing, but also in the application discipline (e.g. geography, geology, agriculture and urban). This often presents a major constraint in an application project and could be very costly if the required expertise is not readily available. Secondly, because manual operations form the major methodological input to the interpretation, they can be quite subjective, resulting in delivery of different results from different interpreters. Thirdly, for monitoring purposes, the visual interpretation operations would take too long time for some applications to deliver the results which is required for decision making at the real time. Examples of these include bush fire monitoring and control, agriculture management (particularly with todays precision farming concept), and environmental hazard detection and monitoring (e.g. oil slick in the sea). Due to these limitations, the image interpretation operations are now commonly associated with the machine processing of digital images which can largely reduced the demand on human resources and also make it possible to deliver the interpretation results at the real-time or near real-time.

Image Processing for Thematic Information Extraction

Machine processing for digital images involves the manipulation and interpretation of digital images with the aid of a computer. The central idea behind digital image processing is quite simple. The digital image is fed into a computer one pixel at a time with its brightness value, or its digital number (DN). The computer is programmed to insert these data into an equation, or series of equations, and then store the results of the computation for each pixel. These results form a new digital image that may be displayed or recorded in pictorial format, or may be further manipulated by additional programs. Although the possible forms of digital image manipulation are literally infinite, from point of view of this note, we only discuss some important operations related to thematic information identification and extraction, namely, geometric correction, image enhancement, image arithmetic and image classification.

4.1 Geometric Correction


This section looks at image warping techniques that are used to geometrically correct a distorted image. Systematic distortion, such as image skew due to the earths rotation (see 1.7.1), can be corrected using a known mathematical correction. Image distortion due to elevation differences (see 1.7.2.2), for example, cannot be predicted as easily.

Page 22 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

In most cases the goal of geometric correction is to create a new image that is oriented the same as a map grid such as UTM, or in such a manner that it fits exactly on top of an existing image. The data can then be used with existing knowledge of the ground, information about specific locations can be easily found, and images from different times can be compared. For the purposes of this note, a map is used to define the orientation of the new image, but another image could just as easily be used. The process of geometric correction involves four distinct steps: Step 1: Creating a new blank image oriented with the pixels centred on evenly spaced map coordinates. Step 2: Locating points that can be easily identified in both the original image and the map. The location in both the original image and the new map-orientated image can then be determined. Figure 27 shows how the original image would look if fitted to a map (1 on top). A new image fitted to map coordinates is also shown in Figure 27 (2 on bottom). Step 3: Calculating an equation that transforms the original image coordinates (i.e. in columns and rows) to map image coordinates (in Easting and Northing) as in Figure 27 (1). The equation is a best average in most cases because the geometry of the original image is different than the map image, the map may be incorrect, and it is difficult to locate the discrete pixels on the continuous map coordinate system. (errors are common).

Figure 27. The process of using Ground Control Points as a reference between an existing image and an image oriented to map coordinates.

Step 4: Inverting the equation to find a value for each pixel in the map image. If an equation can satisfactorily transform the image pixels into their known map coordinates, the inverse of the equation can be used to locate the pixels in the map image (which are presently valueless) in the original image. The pixel values shown in Figure 27 (2) can then be filled. 4.1.1 Image Coordinates versus Map Coordinates

4.1.1.1 Image coordinates The coordinate system of a raw image is expressed simply in terms of columns and rows or pixels and lines with the origin of the data set located in the top left hand corner at location [1, 1] (which may also be referred to as [0, 0]). Each pixel usually has a known width and height enabling the calculation of relative distances within the image (e.g. TM has a pixel size of 30 m). The distance between pixels is taken from the pixel centre resulting in even multiples of the pixel size along columns or rows. Two similar satellite images, such as two TM scenes of the same area on different dates, may seem close but they usually wont

Page 23 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

overlap exactly. For this reason image to image resampling often needs to be done before images from different dates can be compared. 4.1.1.2 Map coordinates Map coordinate systems are usually oriented in a Cartesian manner with the origin at the lower left corner. The units may be Northing and Easting, such as in UTM, latitude and longitude, or even x and y. The coordinates are continuous. The first step towards creating a new image is to decide if the pixels or pixel boundaries in the new rectified image are centred on map grid lines. The location of pixel centres then becomes a multiple of the pixel size, which may be different from that in the original raw image (Figure 28).
Columns or pixels

Northing or y

Rows or lines Easting or x Figure 28. Image coordinates (left) and map coordinates (right).

4.1.2

Locating Ground Control Points

Finding the location of Ground Control Points (GCPs) is often a difficult task. The process is complicated by the fact that human activity and natural processes change the landscape over time, altering the location and appearance of features in the landscape. Points that are relatively stable could be road intersections, airport runway intersections, bends in a river, prominent coastline features, and perhaps purposely located reflectors on the ground. If such points can be found, the first step is to locate the map coordinates of each point, and develop transformation equations that would convert the Ground Control Points from the image coordinate system to the map coordinate system. 4.1.3 Linear transformations

If the transformation involves only rescaling and rotation, the equation will be quite simple. If an image is being placed into a map projection, however, the process usually involves a change of scale and perhaps warping to fit the image into the map projection. The equations for basic changes of scale, translation (shift of coordinates) and rotation are straight forward: Let [x, y] denote old coordinates and [u, v] denote new coordinates, there are: Translation u = x + A; v = y + B Scaling u = Cx; v = Dy Rotation u = xcos + ysin; v = ycos - xsin

where A and B are the shift in x and y, respectively; C and D are the scaling factor in x and y, respectively; is the rotation angle.

Page 24 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

These equations can be added together to produce a combined transformation known as a linear transformation. A linear transformation, also known as a first order transformation, changes the orientation and scaling of the image but does not bend or warp it. A first order equation contains only variables of the first degree (x, y as opposed to second degree x2, y2, Figure 29).
Figure 29. The shape of an image can be changed with a first-order (linear) equation. The above examples show a) the original image, b) a scale change in x, c) a scale change in y, d) a skew, or translation that varies with y, e) a skew that varies with x, and a rotation.

4.1.4

Non-linear transformations

If the image needs to be warped, rather than just re-oriented, second-degree or greater equations need to be used. Non-linear equations, such as the second-order equation form a curve. Figure 30 shows the difference between a simple two-dimensional linear equation and a second- and third-order equation. The second-order equation contains at least one minima or maxima. This example contains a maxima (the top of the curve). The third-order equation has both a minima and maxima.

Figure 30. The diagrams show the appearance of a line that results from a first-order equation (linear on the left), a second-order equation (middle) and a third-order equation (right).

A non-linear transformation applied to an image can be used for warping (Figure 31). The higher the order of transformation, the greater the undulations in the surface. One problem with the high order polynomials (the type of equation described here) is that they can become unpredictable at high orders. Generally a second or third degree polynomial is sufficient for image warping. Equations of higher order than this will often become unpredictable at the edges and may oscillate between high and low values if there are not enough control points (Figure 32).

Page 25 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Figure 31. The original image (a) can be warped using a non-linear polynomial equation (b).

Figure 32. A three dimensional surface that is produced by a linear equation (a), a second-order equation (b), and a higher-order equation (c).

4.1.5

Residuals

If the orientation of GCPs on a map is slightly different from those on an image (due to real distortion, or measurement error), the new transformed image coordinates will not exactly coincide with the chosen GCPs on the map. This difference is known as a residual. If the residual for a GCP is much larger than that of the others, a measurement error is often the likely cause. The concept of residuals can be shown easily using a line of best fit in a twodimensional graph. Figure 33 shows a series of points with a first, second and third order lineof-best-fit. The fit, in this case, was done to minimise the distance from the line to each point. The difference between the predicted value (the line) and the actual point is known as the residual or error. Note that in this case, the third order line-of-best-fit has greater error than the second order line.
Figure 33. A figure showing lines-of-best-fit (regression on y).

Figure 34. The error is given as a radius from a known GCP. The root-mean-square (RMS) is the root of the average error, squared.

In the image, this residual is an error between the location of a known GCP and the value calculated from the transformation equation. In some cases the use of a very high order equation could fit the new image almost exactly to the coordinates, but the precision associated with this can be spurious. Not only is it expected that there is some measurement error associated with the GCP location coordinates, but also high order equations can become unpredictable between control points, actually increasing error away from the GCPs. Root Mean Square (RMS) error, defined as the distance between the map coordinates and the transformed coordinates of the GCPs, is a common means of reporting error in a transformed image. (Figure 34).

Page 26 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

4.1.6

Selection of GCPs

The choice and accuracy of ground control points determine how well the transformation may be. GCPs must be well distributed throughout the image in order to give the best fit over the entire image. In a region with little control (few GCPs) the transformation equation can extrapolate large amounts of distortion that may make the transformation result unrealistic (Figure 35). The effect is the same as in Figure 33 where the end-points of the line (beyond which there is no control) continue increasing or decreasing.

a) Map showing the distribution of GCPs.

b) Transformed image using the GCPs and linear equation.

c) Transformed image using the GCPs and high-order equation.

Figure 35. GCPs should be spread out as evenly as possible around the image - generally just beyond the area of interest. The map (a) shows the distribution of the GCPs that correspondingly fell into and around the box shown on image (b) and (c). Within the box, the image is well registered. In contrast, outside the box, the transformation extrapolated a large amount of distortion. It should also be noted that the extrapolated distortion increases dramatically with a high-order transformation equation.

4.1.7

Resampling

Resampling is the process of determining the new DN (or brightness) value for each of the pixels in the transformed image. The transformation equation that was calculated to locate the GCPs on the new image is inverted so that the DN for each pixel in the new image can be determined. Even if the equation pinpoints where each new pixel is located on the old image, it is most likely that it will not coincide with the centre of pixels (Figure 36). The process of determining what DN to use, or how to estimate the new DN, is known as resampling. When DN values are estimated from those of nearby pixels, the Figure 36. The location of each pixel centre on the process is called interpolation. new image is located on the original image in order
to determine the DN value of each pixel. The pixel centres seldom coincide.

Page 27 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

4.1.7.1 Nearest Neighbour Resampling In nearest neighbour resampling the DN value of the closest neighbour is chosen as the DN for the new image. In Figure 37 the actual location of a pixel is w, however the closest pixel in the original image is d. The DN value is therefore taken from d. This technique often produces images that have a ragged appearance since the original location of pixel values has been changed. However since the original values have not been altered, this method is the best if further manipulation (e.g. classification) of the DN values is required. An example of a resampled image using nearest neighbour is shown in Figure 38.

Figure 37. Nearest neighbour resampling locates the closest pixel value in the original image and uses that in the new image.

Figure 38. The original image (a) is transformed into (b). There is a scale change and rotation associated with the transformation. The overlaid image are shown in (c). The resulting resampled image is shown in (d).

4.1.7.2 Bilinear Interpolation Bilinear interpolation treats the four neighbouring values as corners of a plane or facet. The DN of the transformed pixel is determined at the location where the centre of the pixel intersects with the facet. The new DN is essentially a weighted average of the neighbouring points. Figure 39 shows how a DN for point w is determined first by estimating u and v, where u is the weighted average of a and b, and v is the weighted average of c and d. The distance between all points are known. The final value for w is calculated from the weighted average of u and v.

Figure 39. The point w is determined using weighted averages.

Page 28 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

4.1.7.3 Cubic Convolution Cubic convolution, like bilinear interpolation, uses approximating curves to define a surface or facet to determine the value of the transformed pixel. However, the cubic convolution method uses a cubic function rather than a linear function to approximate the surface. A cubic function has third-degree terms, which allow a fair degree of flexibility along an interpolating curve. Figure 40 shows a two-step operation for the cubic convolution. At Step A, a curve is fitted along each of the four rows, which is subsequently used to interpolate the DN value at location a, b, c and d. A curve can then be fitted using the interpolated values to estimate the value at w (Step B).

Figure 40. A line is fitted through the points on each of the four lines (think of the pixel DN as a z value on a 3-dimensional graph). The value of a, b, c and d is calculated using the known distance along the line. Line B is then calculated to fit those points, and subsequently estimate the value at w.

4.2 Image Enhancement


Image interpretation is often facilitated when the radiometric nature of the image is enhanced to improve its visual impact. Specific differences in vegetation and soil types, for example, may be better visualised by increasing the contrast of an image. In a similar manner, differences in brightness value can be highlighted either by contrast modification or by assigning different colours to those levels. Moreover, the geometric details in an image may be modified and enhanced. In contrast to the pixel-by-pixel operation used for radiometric enhancement, techniques for geometric enhancement are characterised by operations over neighbourhoods. Although the procedure still aims to modify the DN of an image pixel, the new value is derived from the DN of its surrounding pixels. It is this spatial interdependence of the pixel values that leads to variations in the perceived image geometric detail. The geometric enhancement that is of most interest in remote sensing generally relates to smoothing, edge detection and sharpening. 4.2.1 Contrast stretch

Given one digital image with poor contrast, such as that in Figure 42a, it is desired to improve its contrast to obtain an image with a good spread of pixels over the available brightness range. In other words, a so-called contrast stretch of the image data is required. Consider a transfer function that maps the intensities in the original image into intensities in a transformed image with an improved contrast. The mapping of brightness values associated with contrast modification can be described as y = f(x) (5)

Page 29 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

where x is the original DN and y is the corresponding new brightness value after the contrast stretch. One of the simplest contrast modifications is the linear contrast enhancement that can simply be described as y = f(x) = ax + b (6)

Relative to the original image, the modified version is shifted owing to the effect of b, is spread or compressed, depending on whether a is greater or less than 1 and is modified in amplitude. Practically, the linear stretch is applied to each pixel in the image using the algorithm

x Bmin y= B B (Lmax Lmin ) min max

(7)

where Bmin, Bmax, Lmin and Lmax denote the minimum and maximum of the old DN, and the minimum and maximum of the corresponding new brightness values, respectively. For most todays image visualisation systems, Lmin = 0 and Lmax = 255. Frequently a better image product is given when linear contrast enhancement is used to give some degree of saturation (Figure 42b). In this way the variances at the very dark or bright areas of the image (which are of no interest) will be compromised to expand the range of interest to the maximum possible dynamic range of the display device (Figure 41a).
255 Output DN 255 Output DN 255

Input DN

255

Input DN

255

Output DN 0

Input DN

255

a) Saturating linear stretch

b) Sinusoidal stretch
Figure 41. Contrast Stretch functions.

c) Non-linear stretch

The algorithm for saturating linear contrast enhancement is the same as the Equation 7 except that Bmin and Bmax are user-defined. Typically, an image processing system employs the saturating linear stretch for automatic contrast enhancement by determining the cut-off and saturation limits (i.e. Bmin and Bmax) using the mean brightness and its standard deviation. A sinusoidal stretch is designed to enhance variance within homogeneous areas in the image, such as an urban area and water body (Figure 42c). The stretch parameters are usually determined by interpreting the histogram of the image. The distribution is divided into several intervals or ranges and each of these is expanded over the output range (Figure 41b). The reason this stretch is called sinusoidal is that when input and output DNs are plotted against each other, a sinusoidal curve is formed. Because several different old DNs can be mapped to

Page 30 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

one output value, sinusoidal stretches are usually applied to three multispectral bands to form a colour composite to reduce the possibility to map different features with an identical colour. Non-linear stretches have flexible parameters that are controlled by DN frequencies and the shape of the original distribution (Figure 41c). One frequently used non-linear stretch is the uniform distribution stretch (or histogram equalisation), with which the original DNs are redistributed on the basis of their frequency of occurrence. The greatest contrast enhancement occurs within the range with the most original DNs (Figure 42d).

Figure 42. Effect of contrast stretch on a digital image: (a) original image, (b) saturating linear contrast enhancement, (c) sinusoidal stretch and (d) non-linear stretch by histogram equalisation.

Other frequently used non-linear functions are logarithmic and exponential contrast enhancements, which map DNs between original and modified images to enhance dark and light features, respectively. The logarithmic and exponential functions can be expressed as

y = b log(ax ) + c and y = be ax + c

(8)

Page 31 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

where parameters a, b and c are usually included to adjust (or normalise) the overall brightness and contrast of the output values. 4.2.2 Density Slice

Density slice is an enhancement technique whereby the DNs distributed along the xaxis of an image histogram are divided into a series of pre-determined intervals or slices. All DNs falling within a given interval in the input image are then displayed at a single brightness value. In this way the overall discrete number of brightness values used in the image is reduced and some detail is lost. However, the effect of noise can also be reduced and image becomes segmented, or looks like a contour map, except that the areas between boundaries are occupied by pixels displayed at the same DN (Figure 43).
Lmax

New pixel brightness Lmin Bmin

Old pixel DN

Bmax

Figure 43. Density slicing. Up: the brightness value mapping function corresponding to black and white density slicing. Right: the resulting density sliced image becomes segmented with reduced noise and details.

Different colours can also be used in density slicing instead of using grey levels. This is known as colour density slicing. Provided the colours are chosen suitably, it can allow fine details to be clearly visualised. 4.2.3 Spatial Filtering

The most basic algorithms that are used to enhance patterns in imagery are based on comparisons of pixels within small regions of an image. The algorithm is usually designed to identify large differences in brightness value, or more subtle differences that occur in a specific orientation. The simplest of these is known as a neighbourhood function. The neighbourhood of any given pixel is comprised of a series of adjacent or nearby pixels. A Figure 44. The immediate neighbours of a pixel form simple 3 by 3 pixel neighbourhood is shown in Figure 44.
a 3 3 neighbourhood.

Page 32 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Figure 45. The pixels within the 3 by 3 neighbourhood (at left) can be used to calculate a range, which can be used to create a new grid (at right) in which high values indicate an edge.

The data values of the pixels that form the neighbourhood can be used to describe brightness differences within the neighbourhood. For example, the maximum and minimum values (the range) from the neighbourhood at the left of Figure 45 could be used to quantify the smoothness of the image. If the range value is placed in a new grid (at the right of Figure 45), the larger the number, the greater the variation in the area local to that pixel. A connected region with high range values may represent an edge, while low values would represent unchanging areas (smooth areas).
a

In the image in Figure 46, area A has a dark region on the left, and a lighter region on the right. Neighbourhoods that straddled the boundary would have a larger range of data values within their neighbourhood than pixels within area B. In an image, the difference between the brightness of neighbouring pixels can be large where there is an abrupt edge (on a ridge, or along a road), or small where a gradual contrast change is occurring (such as on a smooth hill). If a profile across an image is drawn such that very bright values are high, and dark values are low, steep slopes in the profile represent sharp light/dark boundaries (Figure 47).

Figure 46. Areas defined show a region within edges (A, C) and without (B, D).

The cross-section in Figure 48 shows the profile starting at left side of the image, through a relatively dark shadowed area, over lighter face tones, and back to the dark area of hairs. In a real-world satellite image (Figure 49) the profile is not as smooth as in Figure 48, but the same characteristics apply. Bright pixels have a high value, darker pixels have low values, and the changes between bright and dark areas show up in the profile.
Cross-section of the image

Figure 47. A cross section of an image along a row, showing brightness as height. Abrupt changes in contrast can be seen as steep slopes.

Brightness

Page 33 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Figure 48. A photo of a human face shows abrupt boundaries between generally smooth regions. The profile line is shown in white on the photo.

Figure 49. Real-world images, such as this one of Hong Kong (Landsat TM band 4), are not as smooth. The profile above right follows the horizontal line across the image.

The profile across an image is a mixture of rapidly changing brightness values and slowly changing shades. The abrupt edges may be due to a boundary such as a change from grass to a concrete surface, while the more gradual changes may be due to topographic shading of a hill. Slow, gradual changes in brightness over an image are referred to as low frequency changes, while abrupt changes are high frequency changes.

Page 34 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The image at the left in Figure 50 is an example of a case with the greatest possible amount of difference between two pixels (black to white the checkerboard image). The range calculated for any 3 by 3 neighbourhood would be large and the profile would be very rugged when looked at closely. In the gradually changing middle image the range calculated for neighbourhoods would be much lower. The profile would be a smooth gently rising slope. The image at right would have a number of very high and low values (and a variable profile).

Figure 50. The image on the left has the greatest difference possible between two pixels; minimum to maximum brightness next to each other. The difference between pixels in the middle image is much less, while the right image has a mixture of large and small changes between pixels.

The neighbourhood can also be used to detect, remove or enhance the variability using a numerical comparison of the values within the neighbourhood. The simplest function is the calculation of an average for the neighbourhood, which then becomes the value of the central pixel in the output image. For a neighbourhood X (see right), the average value is the sum of all neighbourhood values divided by the number of neighbourhood members (9 for a 3 3 neighbourhood). For this simple algorithm (known as an Average Filter) it is rather obvious about what is happening. However, if the algorithm is more complicated it becomes useful to describe the calculation to each element in the neighbourhood more completely. A generic equation for this would look like this: y (i, j ) = 1 M N (m, n ) x (m, n ) MN m =1 n =1

x(1,1)

x(1,2)

x(1,3)

x(2,1)

x(2,2)

x(2,3)

x(3,1)

x(3,2)

x(3,3)

(9)

where y(i, j) denotes the output filtered value for pixel x(i, j), is the weighting value (for average, = 1), M and N are the number of columns and rows of the neighbourhood (in this case M = N = 3), respectively. Following this example, if every element is multiplied by its weight and then all of the results summed, we only need to represent the weights in a grid representing the neighbourhood. In the sense of image processing, the normalisation of the output value (y) is largely unnecessary since the image would most likely be contrast stretched (see 4.2.1) after filtering. Note that the original image (with no changes) can also be shown using this notation (see right).
1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0

Average

Original (no change)

Page 35 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The example in Figure 51 shows that a flatter, smoother profile is resulted after passing an average filter. Figure 52 shows the appearance of images after an average filter of different neighbourhood sizes is applied to an image. The larger the neighbourhood, the greater the degree of smoothing.

Figure 51. The profile above shows how the profile changes due to an averaging filter. The bottom profile is a 3-pixel-wide average of the top profile.

Figure 52. The original image (left), 3 3 average image (middle), and a 7 7 average image (right).

The output from the average filter appears blurred because high-frequency information is removed. For this reason, the averaging filter is also known as a low-pass filter because low-frequency information passes through the filter and is retained. Edge detection filters enhance abrupt changes in the image and filter out the background. One of the simplest means to detect edges is to subtract the smoothed average image from the original, leaving only the high and low points in the data that were removed from the original. The result of this is shown in profile in Figure 53 (the residual information). This process is called edge detection because only the edges are left in the resulting output image (Figure 54). The edge detection filters are also called high-pass filters since only the high-frequency information is passes Figure 53. The smooth profile (top black line) is subtracted from the through. original profile (top grey profile)
to produce the bottom residuals.

Figure 54. The original image (left) minus the smoothed image (middle) leaves the residual edges (right).

Page 36 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

0 0 0

0 9 0

0 0 0

1 1 1

1 1 1

-1

-1 8 -1

-1 -1 -1

1 1

-1 -1

Figure 55. The Edge Detection kernel (right) as calculated from the original and smoothed kernels (left and middle).

The process of subtracting the smoothed image from the original can be done using the kernels. Arithmetically it is the same as going through the filtering process first and then subtracting the images.

The edges that have been isolated using the edge detection technique can be added back on top of the original to enhance the edges (Figure 56 and Figure 57). This process is called unsharp masking and produces an edge enhanced image (Figure 58).
0 0 0 0 1 0 0 0 0 -1 -1 8 -1 -1 -1 -1 -1 -1 9 -1 -1 -1 -1

-1 -1

-1 -1

Figure 56. The Unsharp Masking procedure using the kernel values.

Figure 57. The profile showing the residuals added on top of the original profile.

Figure 58. The original image (left) adds the residual image (middle) gives the residual edges (right).

4.3 Image Arithmetic


One most important nature of multispectral images is their ability to detect the differences between spectral bands, which can then be compared qualitatively or quantitatively with the spectral signatures of different earth materials (refer to Figure 20). Often the image processing functions using multispectral bands can be presented in the similar way to the variables in a mathematical equation. For example, one image band can be added on, or subtracted from, another band on a pixel-by-pixel basis. This approach is, therefore, commonly called image arithmetic. 4.3.1 Band Ratio and the Topographic Effect

Among a large variety of image arithmetic functions, band ratio is arguably the most commonly used. Typical applications include the removal of topographic effects on images and detection on different cover types, including deriving vegetation indices.

Page 37 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The amount of illumination that a point on a surface receives is a function of the angle that the light is hitting the slope. A typical area on the surface of the earth receives the most light when the sun is directly overhead. The amount of light that is reflected back to a sensor, is thus a function of not only the properties of the surface material, but also the angle of illumination. This property is known as the Topographic Effect and its effect is shown in this simulated image of terrain (Figure 59). One cover type will usually reflect a different amount of EMR in each band due to the particular absorption characteristics associated with it.

Figure 59. A simulated image of the topographic effect with a constant cover type.

Two stands of identical material will appear different if they are receiving differing amounts of illumination. The ratio between matching pixels in each band will remain the same even under different light conditions, if the only variable affecting the response is illumination due to slope angle. This is because the percent of the total light hitting the surface remains the same for each band, even if the absolute amount is different (Figure 60). In reality, atmospheric effects, variation in the cover type response with the angle of illumination, and sensor calibration make quantitative analysis using band ratios difficult. The identification of spectral features using ratios is more promising.
DN on slope facing away from sun Unit Red NIR Red/NIR A 45 60 0.75 B 20 40 0.50 DN on slope facing sun Unit Red NIR Red/NIR A 60 80 0.75 B 30 60 0.50

A B

Sun

Figure 60. The ratio between areas of the same cover type that are illuminated differently should be constant (Sun illumination from the right).

Although band ratios do not entirely remove the effects of illumination, they do reduce its effect to the point that spectral differences between cover types can be more easily identified. In many cases two surfaces look similar in most bandwidths. A particular characteristic, such as selected absorption due to a mineral component, may impart a subtle difference between two regions in one band. These subtle differences can be enhanced using a ratio.

Band ratios can be used in any instances where the absorption in one band is different from that in another band. Stressed vegetation can be separated from healthy vegetation in many cases; algae can be found in turbid water; and the parent materials of soils can be traced. One typical example of this is based on the sharp difference in spectral signatures between bare soil and green vegetation. The former presents a near linear spectral curve in the region of visible and near infrared (NIR), while the latter typically has a low reflectance in red (R) and very high reflectance in the NIR area. Therefore the R/NIR ratio should largely distinguish the bare soil (with a high value) and green vegetation (with a low value). A Page 38 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

summary of band ratios that are useful for discriminating various surface materials is as following: Soil, Vegetation:
R NIR R or G R or B G

Clays:

TM 5 TM 7 TM 5 TM 3 MSS 3 or or TM 7 TM 5 MSS 4

FeO, Hydroxides:

Plant stress:

4.3.2

Vegetation Indices and their interpretation

A very popular application of the band ratio functions is the vegetation indices. Vegetation can be separated from other cover types because it has characteristic absorption features in visible wavelengths, and high reflectance in near-visible infrared wavelengths. The spectral curve of vegetation presented in Figure 20 is clearly in different shape of that of soil. On the red TM images presented bellow (Figure 61), the fields with green crops appear to be slightly darker and are not easily distinguishable from those without. On the other side, the high reflectance of green vegetation is detected by NIR sensor so that the cropped fields are clearly distinguishable (Figure 62).

Figure 61. Red TM channel (band 3).

Figure 62. NIR TM channel (band 4).

When vegetation becomes stressed, absorption decreases in visible wavelengths and increases in near-visible infrared wavelengths (NIR). Additionally, the higher the density of broad-leaf plants, the more distinct the difference between visible and NIR will be. To date, there are tens different kinds of vegetation indices that have been developed and reported in the literature. Most of them are fundamentally based on the comparison between the red band and NIR band of remotely sensed images. The simplest index using these features is simply an NIR and red band differential (DVI).
DVI = NIR R

(10)

If there is substantial topographic effect due to rugged terrain, the variation in illumination (due to slope orientation) will cause the resulting difference to vary substantially throughout the image. Page 39 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The topographic effect can be minimised with a ratio, as discussed in 4.3.1. The current most widely used vegetation index is the Normalised Difference Vegetation Index (NDVI) which can be expressed as:
NDVI = NIR R NIR + R

(11)

The NDVI is primarily based on the NIR and red ratio, but normalises the output values within the range of [-1, 1]. This provides advantages not only for reducing the problems related to illumination differences due to topography, but also made the result easier to be interpreted. Practically, when NDVI 0, one can quite comfortably assume that the pixel is not vegetation. The more active (or greener) the plant is, the higher value of NDVI will be returned on the image. While NDVI 1, the pixel will most likely to be covered by active (green) plant. A comparison between simple NIR/red ratio and NDVI is shown in Figure 63 and Figure 64.

Figure 63. Simple ratio of NIR/red (TM4/TM3).

Figure 64. NDVI.

Quite often in applications, vegetation indices are derived to separate the vegetation cover from bare soils, rocks, urban areas, etc. A common technique is to compute the vegetation index and then density slice of the resulting image so that the areas with different levels of vegetation coverage may be distinguished. Attempts were also made with variable degrees of success to quantify the vegetation cover by relating ground measurements to the vegetation indices.

4.4 Image Classification


Image classification is the process of creating a meaningful digital thematic map from an image data set. The classes in the map are derived either from known cover types (wheat, soil) or by algorithms that search the data for similar pixels. Once data values are known for the distinct cover types in the image, a computer algorithm can be used to divide the image into regions that correspond to each cover type or class. The classified image can be converted to a land use map if the use of each area of land is known. The term land use refers to the purpose that people use the land for (e.g. city, national parks or roads), whereas cover type refers to the material that an area is made from (e.g. concrete, soil or vegetation).

Page 40 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

What cover type is this? It has the spectral signature like wheat, the cover type is likely to be wheat. In this area, wheat is likely to be a farming land use. The thematic class is therefore farmland. Remotely sensed imagery Computer decisions Resulting class image

Figure 65. The process of making a classed thematic map from a digital data set.

Image classification can be done using a single image data set, multiple images acquired at different times, or even image data with additional information such as elevation measurements or expert knowledge about the area. Pattern matching can also be used to help improve the classification. The discussion here concentrates on the use of a single image data set to create a classified thematic map where each pixel is classified based on its spectral characteristics. The process that would be used for multiple images is essentially the same with perhaps some extra effort needed to match the images together. If soil type or elevation are used the algorithm would need to take into account the fact that thematic soil classes need to be treated differently than measured radiance data. 4.4.1 Turning Pixel Values into a Thematic Map

Classification algorithms are grouped into two types of algorithms: supervised and unsupervised classification. With the supervised classification the analyst identifies pixels of known cover types and then a computer algorithm is used to group all the other pixels into one of those groups. With the unsupervised classification a computer algorithm is used to identify unique clusters of points in data space, which are then interpreted by the analyst as different cover types. The resulting thematic image shows the area covered by each group or class of pixels. This image is usually called a thematic image, or classified image.
Clustering
ed pervis Unsu ation ic classif

Use clusters to define signatures or use clusters as classes

Image data

Supe r class vised ificat ion

?
Seed area (example pixels) Signature information

Thematic image

Use a decision rule to class each pixel

Figure 66. Diagram showing the steps to produce a thematic or classified image.

Page 41 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Figure 66 shows the processes that are used to create the classified image. Unique cover types are identified either by the computer (clustering) or by the analyst (from the image). If clustering is used, the pixels in an image are sorted into groups that are spectrally unique. These can either be used directly as a classified image, or they can be used to define a set of spectrally unique signatures (the statistical description of the class). If the user has chosen example pixels, the pixel samples are used to calculate the signatures of each cover type class (vegetation, sand, etc; signatures will be discussed further). Once signatures have been defined, an algorithm called a decision rule is used to place each pixel from the image data set into one of the classes. The process is often repeated a number of times, adjusting the signatures and decision rule before each run, each time checking the results against areas in the image in which the cover types are known. 4.4.2 Supervised Classification

The supervised classification relies on the analyst who provides the training for computers to recognise different cover types. Usually there are three basic steps involved in a typical supervised classification procedure, namely training, classification and output. The purpose of the training stage is to derive spectral signatures for the cover types of interest to create seeds for classification in the later stage. The analyst identifies representative training areas and develops a numerical description of the spectral attributes of each land cover type of interest. This training can be carried out interactively on the image processing system by selecting training areas in which the pixel DNs of multispectral bands can be statistically analysed to derive spectral signature of the class (Figure 67). Alternatively, one can train the computer by selecting certain DN range in a multidimensional spectral space (e.g. on a scattergram) and then examining the corresponding selected areas on the image (Figure 68).
Bare soils

Cover Type
Water Forest

Colour Cyan Purple Thistle Coral Yellow Green

No. Points 3793 975 1866 784 924 3122

Water Concrete High buildings

Concrete

High buildings

Bare soils Grass slope Forest

Figure 67. Training areas are interactively selected on the image for different cover types to derive their spectral signatures for classification.

Page 42 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Concrete

90

69

TM Band 4
Water

49

28

8 14 35 56 77 88

TM Band 3
Figure 68. Spectral ranges are selected on the scattergram (right) and the pixels with the selected spectral characteristics are interactively marked as the training areas (left).

In the classification stage, each pixel in the image is categorised into the cover class it most closely resembles. If the pixel is not spectrally similar enough to any seed created by the training process, it is then labelled unknown. The class label (or the theme) assigned to each pixel is then recorded in the corresponding cell of an interpreted data set, or classified image. Today the analyst has a variety of choices in the way to define how close a pixel to its nearest seed of pre-defined classes. This choice often refers to the selection of classifiers which are based on spectral pattern recognition. Numerous mathematical methods to spectral pattern recognition have been developed and extensive discussion of this subject can be found in the literature. For the purpose of this note, our discussion only touches the surface of the vast knowledge base about how spectral patterns may be classified into categories, by demonstrating with limited examples, namely parallelepiped, minimum distance and maximum likelihood classifiers. For the ease of presentation, various approaches towards classification are illustrated with a 2 band multispectral image. In reality, rarely are just two bands employed in an analysis.

Page 43 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Assume that we sample an image with pixel observations from areas of known cover type (i.e. from the training areas). Each pixel value is plotted on the scattergram that shows various distributions of the spectral response patterns of each cover type to be interpreted in the 2dimensional spectral space (Figure 69). Our consideration is thus the strategies of using these training spectral response patterns as interpretation keys by which other pixels are categorised into their appropriate classes. The parallelepiped classifier considers the range of values in each classs training set. This range may be defined by the highest and lowest digital number values in each band and appears as a rectangular boxes in our 2-dimensional scattergram (Figure 70). When a pixel lies inside one of the box, then it is classified into the corresponding class (e.g. point 2 in Figure 70). If a pixel lies outside all regions, then it is classified as unknown. Difficulties are encountered when class ranges overlap where a pixel has to be classified as not sure or be arbitrarily placed on one of the two overlapping classes.

255

TM Band 4

concrete high buildings grass slope water bare soils forest

0 0 TM Band 3 255

Figure 69. The scattergram illustrating the pixel observations of six cover types.

255

TM Band 4

2 concrete high buildings grass slope water bare soils forest

0 0 TM Band 3 255

Figure 70. Parallelepiped classification strategy.

Because spectral response patterns often exhibit correlation, or high covariance, and the rectangular decision regions fit the class training data very poorly, resulting in confusion for a parallelepiped classifier. For example, the point 1 shown in Figure 70 would probably better be classified into the class grass slope rather than bare soils as shown. This problem can be somewhat amended by modifying the single rectangles into a series of rectangles with stepped borders. The minimum distance classifier comprises three steps. First the mean of the spectral value in each band for each class is computed (represented in Figure 71 by symbol +). Then the distance between the spectral value of an unknown pixel and each of the category means can be computed. The pixel is then assigned to the closest class.

Page 44 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The minimum distance classifier is mathematically simple and it overcomes the poor representation problem of rectangular decision region used by parallelepiped classifier. For example, the point 1 shown in Figure 71 would be correctly classified as grass slope. This strategy, however, has its limitations. It is insensitive to different degrees of variance in the spectral response data. In Figure 71, the point 2 would be classified as concrete in spite of the fact that the pixel would probably more appropriate to be bare soils because of the class greater variability.

255

TM Band 4

concrete high buildings grass slope water bare soils forest

0 0 TM Band 3 255

Figure 71. Minimum distance classification strategy.

The maximum likelihood classifier quantitatively evaluates both the variance and covariance of the category spectral response patterns when classifying an unknown pixel. To do this, an assumption is made that the pixel spectral cluster forms a normal distribution, which is considered reasonable for common spectral response distributions. Under this assumption, we may compute the statistical probability of a given pixel value being a member of a particular cover class by applying a probability density function for each class derived from its training data. Using the probability density functions, the classifier would calculate the probability of the pixel value occurring in the distribution of class concrete, then the likelihood of its occurring in class high buildings, and so on. After evaluating the probability in each class, the pixel would be assigned to the most likely class that presents the highest probability value), or labelled unknown if the probability values are all below a given threshold. Figure 72 shows the probability values plotted on our 2dimensional scattergram where the contour lines are associated with the probability of a pixel value being a member of one of the classes. Basically the maximum likelihood classifier delineates ellipsoidal equal-probability contours, the shape of which shows the sensitivity of the classifier to both variance and covariance. For example, both pixel 1 and 2 would be appropriately assigned to the class grass slope and bare soils, respectively.
255

TM Band 4

concrete high buildings grass slope water bare soils forest

0 0 TM Band 3 255

Figure 72. Equal-probability contours defined by a maximum likelihood classifier.

Page 45 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The principal drawback of maximum likelihood classification is the extensive demand on computation to classify each pixel. When a large number of spectral bands are involved or a large number of classes must be differentiated, the maximum likelihood classifier would perform much slower than the other classifiers described. This drawback was one of the major limitations in the past, but is becoming much less critical today with rapid development of computer hardware. In the output stage, the results are presented in the forms of thematic maps, tables of statistics for the various cover classes, and digital data files suitable for inclusion in a GIS. Since the multispectral classification methods are primarily based on the spectral characteristics with minimum consideration on the spatial extents of the resulting classes, the resulting classified image is often with considerable high-frequency spatial variations (Figure 73, left image). Because many applications require the classification results to be input into GIS in the form similar to other thematic data layers, often the post-classification process needs to be performed. The most common demand on post-classification process is to remove high-frequency spatial variance (or noise) from the classified image. This is often achieved by analysing the neighbourhood for each pixel and removing the scatter single pixels (sieve process), and then merging the small patches of pixels together to make more continuous and coherent units (clump process). The effect of this process is illustrated in Figure 73.

Water

Concrete

High buildings

Bare soils

Grass slope

Forest

Figure 73. Classified image before (left) and after (right) post-classification process.

4.4.3

Unsupervised classification

Unsupervised classifiers do not utilise training data as the basis for classification. Rather, this kind of classifiers involves algorithms that examine the unknown pixels in an image and aggregate them into a number of classes based on the natural groupings or clusters present in the image values. The basic assumptions here is that values within a given cover type should be close together in the multi-dimensional spectral space, whereas data in different classes should be comparatively well separated.

Page 46 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

Unlike the supervised classification, the classes that result from unsupervised classification are spectral classes. Because they are solely based on the clusters in the image values, the identity of the spectral classes is not initially known. The analyst must compare the classified data with some form of reference data to determine the identity and informational value of the spectral classes. Clustering algorithms use predefined 255 parameters to identify cluster locations in Cluster size data space, and then to determine whether Distance between individual pixels are in those clusters or not. cluster means In many algorithms the number of clusters may be defined at the start, while others just use cluster size and separation parameters to control the number of clusters that are found. Figure 74 illustrates the type of parameters Distance to a that can be used to define clusters, and cluster mean whether pixels belong in that cluster. 0 Clustering algorithms either pass once through the data, grouping pixels during that 0 Band A 255 pass, or they pass through a number of times to adjust and improve the clustering Figure 74. Measures that define a cluster include assignments. It is impossible to discuss all the size of a cluster and the distance between clusters. forms of clustering in this text, however most clustering algorithms used in remote sensing software operate in a similar manner. A typical multiple-pass, or iterative, clustering algorithm works as shown Figure 75. Pass One: (A) Cluster centres are arbitrarily assigned. (B) Each pixel is assigned to the cluster centre nearest to them in data space (spectral distance). (C) The cluster means are then calculated from the average of the cluster members (the middle cluster is shown with grey points) and the pixels are reassigned to the new cluster centres. Pass Two: (D) the process is repeated. The iteration stops when the cluster centres (or means) move by less than a pre-set amount during each iteration. With a number of iterations the location of clusters tend to stabilise as the location of cluster centres between each pass changes less and less.
255

255

Band B

255

255

Band B

Band B

Band B

0 0 Band A 255

0 0 Band A 255

0 0 Band A 255

Band B 0 0 Band A 255

Figure 75. Iterative clustering of points in data space.

Algorithms that pass through the image only once tend to be more affected by the initial conditions than iterative algorithms that repeatedly adjust the cluster means. After each pass through the data, cluster means can be calculated along with other measures such as standard deviation. In addition to simple straight-line distances, statistical measures of distance can be used where the distance to clusters is weighted by the size and importance of that cluster.

Page 47 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

The result from unsupervised classification may also need post-classification process as described above. In addition, because the real-world nature of spectral classes derived from the classification is largely unknown, considerable analysis and interpretation will be required. Often the resulting classes need to be merged into fewer classes to make the classified image more acceptable as a thematic map.

4.5 Accuracy assessment


Classification accuracy analysis is one of the most active research fields in remote sensing. Meaningless and inconclusive assessment on the image classification results sometimes precludes the application of automated land cover classification techniques even when their cost is more favourable with more traditional means of data collection. It is always claimed in the remote sensing community that a classification is not complete until its accuracy is assessed (Lillesand and Kiefer, 1994). One of the most common methods of expressing classification accuracy is the preparation of a classification error matrix (or confusion table). The error matrix compares the relationship between known reference data and the corresponding results of the classification. Table 5 shows an error matrix. The numbers listed in the table represent the proportion of training pixels, for each cover type, that were correctly and incorrectly labelled by the classifier. It is common to average the correct classifications and regard this as the overall classification accuracy (in this case 81%), although a better measure globally would be to weight the average according to the areas of classes in the map.
Table 5. An error matrix expressing classification accuracy.

Training data (known cover types) Classification results Water Concrete High buildings Bare soils Grass slopes Forest Column total Producers accuracy W = 93/97 = 96% C = 65/71 = 92% H = 124/165 = 75% B = 165/202 = 82% G = 201/310 = 65% F = 512/581 = 88% Water 93 0 2 2 0 0 97 Concrete 0 65 3 3 0 0 71 High buildings 2 4 124 21 6 8 165 Bare soils 1 6 5 165 16 9 202 Users accuracy W = 93/96 = 97% C = 65/75 = 87% H = 124/155 = 80% B = 165/227 = 73% G = 201/268 = 75% F = 512/605 = 85% Grass slopes 0 0 9 24 201 76 310 Forest 0 0 12 12 45 512 581 Row total 96 75 155 227 268 605 1426

Overall accuracy = (93 + 65 + 124 + 165 + 201 + 512)/ 1426 = 81%

= (1160 - 365.11) / (1426 - 365.11) = 0.749

Page 48 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

A distinction is made between omission errors and commission errors. Omission errors correspond to those pixels belonging to the class of interest that the classifier has failed to recognise, whereas commission errors are those that correspond to pixels from other classes that the classifier has labelled as belonging to the class of interest. The former refers to columns of the error matrix, whereas the latter refer to rows. For example, in the case presented in Table 5, the omission error for class concrete is (0 + 3 + 3 + 0 + 0)/71 = 8%, whereas the commission error for the class is (0 + 4 + 6 + 0 + 0)/75 = 13%. The producers accuracy shown in Table 5 is interpreted as the probability that the classifier has classified the image pixel as, for example, water given that the actual class is water as indicated by the training data. As a user of the classified image we are more interested in the probability that the actual class is water given that the pixel has been labelled as water by the classifier (users accuracy). In our case, the producers accuracy for class forest is 512/581 = 88%, whereas the users accuracy is 512/605 = 85%. For the global assessment of classification accuracy, a measurement called Cohens kappa coefficient () is often employed. The kappa coefficient is a measure that considers significantly unequal sample sizes and likely probabilities of expected values for each class. Let xi + = x j ,i (i.e. the sum over all columns for row i), and x + j = xi , j (i.e. the sum
j =1
i =1

over all rows for column j), then

d q N q

(12)

where N = total number of samples, d = total number of cases in diagonal cells of the error matrix, and

q=

x
k =1

i+

x+ j . (13)

The optimal score is 1.0 (perfect classification). In our case, N = 1426, d = 1160, q = 365.11, and = 0.749.

Summary

Digital remote sensing imagery is widely recognised as one important source for spatial information technology. However, to maximise its potential benefit, the images need to be correctly interpreted, classified and integrated with a GIS operating environment, so that the support for real-time decision making can be delivered. This lecture note discusses the considerations and techniques in information extraction from digital images. The discussion is largely focused on the remotely sensed images from various satellite platforms. The majority of such images have a common nature of multispectral capabilities, but with variable spatial resolutions. The spectral, spatial and temporal characteristics of remotely sensed images and some typical natural and artificial

Page 49 of 50

ZHOU, Q., 1999: DIGITAL IMAGE PROCESSING AND INTERPRETATION

features are reviewed in order to provide general background for the techniques described later. The discussion on methodology is focused on two major areas, namely feature identification and image interpretation, and image processing for thematic information extraction. The former describes keys and methods to be employed in recognising natural and cultural features on the Earths surface, which may be made by different materials. The latter discusses the computer-based image processing techniques for extracting thematic information from digital images. It is important to understand that we have only scratched the surface of vast knowledge base of interpretation and machine processing of digital images. With the scope of this volume, it is impossible to cover a wider extent of related topic. Interested readers may find themselves getting lost in the large literature base of remote sensing technology, but the few references listed below may provide quite useful initial help.

Further Readings

ASPRS, 1997, Manual of Remote Sensing, 3rd Edition, American Society of Photogrammetry and Remote Sensing, Bethesda, Maryland, CD-ROM. Avery, T.E. and Berlin, G.L., 1992, Fundamentals of Remote Sensing and Airphoto Interpretation, 5th Edition, Prentice-Hall, Upper Saddle River, NJ. Cracknell, A.P. and Hayes, L.W.B., 1991, Introduction to Remote Sensing, Taylor and Francis, London. Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd Edition, John Wiley & Sons, New York. Philipson, W.R. (ed.), 1997, Manual of Photographic Interpretation, 2nd Edition, American Society for Photogrammetry and Remote Sensing, Bethesda, Maryland. Richards, J.A., 1993, Remote Sensing Digital Image Analysis: An Introduction, 2nd Edition, Springer-Verlag, Berlin.

Page 50 of 50

S-ar putea să vă placă și