Gis Data Capture Hardware and Software

GIS DATA CAPTURE HARDWARE AND SOFTWARE
INTRODUCTION
Progress in the commercial application of GIS technology is in practice more likely to be limited in the foreseeable future by the rate at which GIS databases can be populated rather than by shortcomings in the applications software. It is now widely accepted in the literature, and has been apparent for some time to the practitioners, that the cost of data collection (or data capture) is by far the dominant component of overall GIS project costs. For example, Prof. G. Konecny in his Keynote Paper to the fourth European AM/FM Conference in 1988 (Konecny 1988) analysed a range of mature Land Information System projects and concluded that acquisition of the data for the database constituted the single largest expenditure element, between 38 and 84 per cent of total cost. The larger the project, the less the hardware and software costs mattered. .
Certain GIS applications can be supported entirely by data in raster form, but for many GIS purposes data have to be available in the featurecoded, vector form. Increasingly there is a requirement for structured data, either in a link and node form or in some objectoriented . In practice, with the growth of hybrid raster/vector GIS capabilities, both forms of data are required. This presentation describes hardware and software techniques for raster and vector data capture, and the associated issues of data structuring and attribute tagging.
The data capture process can be split into two different operations:
. Primary data collection, for example from aerial photography or from remote sensed imagery. .Secondary data collection, for example from conventional cartographic sources.
Once the data are integrated into the database for analysis many issues are raised, for example: What was the source of information for the map and what are the characteristics of this source? What was the inherent precision of the source materials? What interpretation was applied in the mapping process? Where there multiple sources? Is the categorization of data defined, for example what constitutes the difference between urban open space on the periphery of an urban area and non-urban land use? Is the categorization applicable to the current GIS application?
These and further aspects of secondary data have given many scientists and end applications users an instinctive preference towards primary data capture where greater control and specificity can be applied. While tailored survey is possible in some instances, it is frequently ruled out on the grounds of cost and the elapsed time necessary to undertake the work. This has particularly been the case where comprehensive, large area coverage is required of topography and recourse is made to national generalized map series information. In practice both forms of data are essential to GIS and for practical reasons secondary data dominate. Both approaches are discussed below.
PRIMARY DATA CAPTURE

The introduction of remote sensing, particularly from the early 1970s with the land resource satellites, produced a climate of expectation that generalized access to primary data sources would be available together with automated techniques of land use and feature extraction. Unduly optimistic attempts at totally automated classification of imagery led to a period of scientifically interesting, but ultimately unsuccessful, research into ever more sophisticated attempts to use remotely sensed data in an isolated image processing environment.
The early approaches to using digital remotely sensed data were preoccupied by the issues of handling and realtime processing of large raster data sets and the desirable full colour display capability needed for visualization. These systems employed dedicated and often specialized pipeline or parallel processing hardware. The software was focused on raster data manipulations and classification of the image data. The image processing environments which emerged were very different from the typically vector graphics-based cartographic and mapping systems.
The evolution of hardware and software for primary data collection has, however, been more rapid in the late 1980s. Direct surveying techniques employing in-the-field digital recording, GPS technology for precision positioning and vehicle location tracking are becoming routine tools . The use of remotely sensed data is gradually offering the theoretical benefits identified in the early 1970s. These benefits are only being achieved, however, by introducing a radical change of thinking in the user community.
The new hardware and software resulting from this trend reflect the need for an open architecture and powerful integrated data processing environment with good visualization. This is met in hardware terms by the current and emerging generation of desktop workstations. These offer an easily network able environment for local or wide area processing yet provide locally to a single user a dedicated fast processor (e.g. more than 20 MIPS), large memory (e.g. more than 24 Mb) and substantial disk storage capacity (e.g. over 1 Gb) in a desktop .
The corresponding evolution of software has required a much greater consideration to be given to the overall design of GIS. This has been necessary to ensure access to multiple data structures, to allow greater attention to be given to quality assurance and error train analysis, and to provide a user interface that gives a consistent view of all data and facilities available rather than one specific to a single type of data. Much has been achieved in the interim by developing more efficient links between remote sensing packages and mapping systems, as in the ARC/INFO-ERDAS 'live-link'. This approach is now being replaced, however, by new 'integrated GIS' packages which incorporate the fundamental redesign necessary .
Conceptual design of an Integrated GIS.
SECONDARY DATA CAPTURE

Despite the perceived benefits of primary data, for the immediate future the largest source for GIS data will continue to be existing maps. In some countries, the scale of map digitizing programmers is considerable, with near term targets of substantial or complete coverage at certain scales.
. Factors which affect the capture techniques that can be used include the following:
Maps are accurate and to scale. The scanner or digitizer, and the processing algorithms, must deliver a high and consistent plan metric accuracy. . Maps are high resolution and contain fine detail. Line weights of 0.004 in (0.1 mm) are commonplace. Maps contain a wide variety of symbolization and linestyle, and different map series differ widely in these respects. Maps, in particular small-scale maps, are multicoloured documents. Map sheets represent parts of a large continuum and edge matching is generally required. .Maps are multi-purpose documents and, in consequence, map data formats and quality standards have to support a range of users and applications. The possibilities for redesign of maps in order to improve data capture have been discussed at length by Shiryaev (1987) but generally the needs of data capture have had little impact on map design. Maps, like all other documents come in variable qualities. The paper map is subject to substantial distortion when folded, or when affected by varying humidity.
. In addition to the definition of the format and structure of the digital data required, the following factors have to be defined to arrive at an adequate specification of the data capture task: Accuracy. A traditional specification is to require that the digital data represent the source to within one line width (or a half line width). It is important that any automated conversion process provides an inbuilt accuracy check to the required tolerance. Representation. Data volumes should be minimized, for example a rectangle should be represented by four points. Different classes of features require different representations. Abstraction. For some classes of features the task is not to reproduce the geometry on the map, but an abstraction, for example cased roads by centre lines, broken lines by coordinate strings with appropriate codes, point symbols by coordinate pairs with symbol code text by ASCII codes. Selection/completeness. In many cases, not all the information recorded on the source map is required in the GIS database. None of the required data may be omitted or repeated. The specification is not complete until the external Quality Assurance procedure is also defined.
HARDWARE
Manual digitizers The most commonly used equipment for graphical data capture is the manual digitizing table or tablet. The key elements of the technology are the digitizing surface and the puck or cursor used by the operator to record coordinates. The whole may be regarded conceptually as a 'reverse drawing board'. The surface may be translucent, providing a backlit capability which is very useful for film, as opposed to paper, source documents. In the most widely used technologies, the surface contains a precise grid of currentcarrying fine wires. The precision of this grid determines the basic accuracy of the table. Accuracy specifications are typically expressed as root-me an-square (RMS) deviations from a true and square grid over the active area of the table. The cursor used by the operator to record coordinate measurements consists of a target (usually cross-hairs), possibly viewed under magnification, embedded in a conveniently held puck. This incorporates buttons used to trigger coordinate measurement and to communicate with the controlling software. Sometimes 'stream digitizing' is used to control the frequency of coordinate measurement, on a time or distance basis. Typically the electrical interface to the receiving computer system is a serial line and the data format is coordinate strings in ASCII format. Operator fatigue is the major issue in manual digitizing, and table ergonomics are of crucial importance. A good modern design such as the Altek table illustrated in Plate 17.1 incorporates fully adjustable height and tilt, variable backlighting and a lightweight cursor with four control buttons and 16 feature coding buttons. Accuracies typically range from 0.003 inch (0.075 mm) to 0.010 inch (0.25 mm). Digitizing tablets are similar in concept to digitizing tables, offering reduced accuracy at lower cost.
HARDWARE
Scanners A scanner is a piece of hardware for converting an analogue source document into digital raster form. The most commonly encountered scanner in everyday life is the FAX machine, The key characteristics of a scanner reflect the documents it can handle (size, accuracy and speed) and the nature of the data it produces (resolution, greyscale and colour). All scanning involves systematic sampling of the source document, by either transmitted or reflected light. The fundamental technology used in many scanners is the charge-coupled device (CCD) array. CCD arrays are available as one- or two-dimensional regular rectangular structures of light-sensitive elements. Two-dimensional arrays are not at present economically available at resolutions useful for map source documents. A single linear CCD array typically has a resolution of 5000 elements. The key decision in utilizing this in a scanner design is whether to move the document or the scanning element. A low cost arrangement involves scanning a single linear CCD array over a magnified image of the source document - the so-called 'digital camera'. These are extremely rapid in operation (a whole image in a matter of seconds) but are currently restricted in resolution to about 5000 by 6000 elements. For most GIS applications, a larger information content (resolution) is required than can be provided by digital cameras and scanners based on multiple linear CCD arrays are necessary. A commonly used arrangement is the 'continuous feed' scanner illustrated in Plate 17.2, in which the document is passed rapidly by a set of, say, five or ten concatenated linear arrays, or CCD cameras. The key elements are the pinch roller document handler (accuracy in the direction of document motion is determined entirely by this component) and the alignment optics (see Plate 17.3). The hardware design must manage the overlap between the individual CCD cameras in a mechanically stable manner, in addition to compensating for differences in sensitivity between them. Continuous feed scanners provide high throughput rates [an AO sheet at a resolution of 500 dots per inch (dpi.) or 20 dots per millimeter in a few minutes] at reasonable cost, with accuracies of the order of 0.02 inches (0.5 mm). Input widths up to 60 inches (150 cm) are available and document length is theoretically unlimited. Documents can be paper, film, vellums, sepias, linen or cardboard.
HARDWARE
Workstations and data compaction The pace of advances in workstation technology is such that the availability of computing power is rapidly ceasing to be a limiting factor in GIS data capture applications, particularly in distributed systems using local area network (LAN) or cluster technology. Standard platforms with open system architectures and windowing environments need little or no augmentation. Special purpose parallel hardware may have some place, but it is at least arguable that the same results will be achieved by tomorrow's conventional processors. Despite the advances in optical storage technology, the widespread use of raster data is likely to pose a continuing requirement for data compaction implemented in hardware or by software. Since this requirement is particularly associated with GIS applications, it is appropriate to outline some of the principles involved.
The simplest forms of raster data compaction use run length encoding (RLE), based on the observation that it takes fewer bits to say '123 blank pixels' - namely 7 bits for 123 and 1 control bitthan 123 bits each zero (see discussion in Egenhofer and Herring 1991 in this volume and Blakemore 1991 in this volume). PackBits (Aldus Corporation 1988) is a byte-oriented run length scheme modified to handle literal data for areas that do not compress well. Where space efficiency is paramount, the CCITT-3 and -4 standards established by the International Telecommunications Union (!TU) for facsimile transmission have become de facto standards (CCITT 1985). Both I-D (modified Huffman RLE) and 2D forms exist. The main characteristic of the 2-D formats is that each scan line is described in terms of changes from the previous scan line. Hardware for compression and decompression of large raster data sets is not yet readily available, but tiling techniques can be used to overcome this. LZW (Lempel-Zif and Welch) is an encoding scheme (Welch 1984) which can handle all kinds of data from binary to full RGB colour (colour defined by its red, green and blue components) at good compression factors, while being fully reversible. Originally designed to be implemented in hardware, it has proved challenging to implement efficiently (Welch 1984). All the above data compaction schemes are encompassed in TIFF - the Tag Image File Format devised by Aldus/Microsoft for transfer of image data between desktop publishing systems (Aldus Corporation 1988). This is now seeing increased use for map image data and is becoming a de facto standard for scanner output formats. Extensions are also under way to encompass tiled raster images. TIFF defines a standard housekeeping header around the various encodings. It is an example of a standard arising in the wider information technology context, but having relevance to geographical information systems.
SOFTWARE
Manual data capture The reader will have already observed incursions across the hardware/software divide, if such can be said to exist. Software for manual data capture using digitizing tables is sufficiently well established not to need detailed description and in any case it is reviewed in Rhind (1974), Marble, Lauzon and McGranaghan (1984) and Yeung and Lo (1985). Efficiency is determined at least as much by operator procedures and flowline design as by software functionality. A macro command language is highly desirable to enable flowlines to be efficintly tailored and most systems now incorporate on-line digitizing with immediate graphical feedback, including colour display of feature coding. Some protagonists still argue, however, that off-line digitizer operation is more efficient, because of the constant operator distraction caused by viewing the graphics display. Despite the use of pop-up or pull-down menus, complex feature coding schemata are an unavoidable burden on the operator. Some users have reported success in using voice input to alleviate the feature coding burden, but this technique is by no means well established. Further improvements in the cost of voice recognition technology remain to be exploited
Overlay digitizing The widespread availability of hybrid vector/raster GIS software, or at least, of vector editing/drafting software supporting raster image data as backdrop, has led to new methods of manual data capture. Such systems were originally developed as 'interim solutions' which allowed many GIS applications, for which map data are required only as a passive background frame of reference, to proceed in the absence of vector map data. They depend on establishing a means of registration between the vector data and the raster image, and of providing fast zoom and pan capabilities. These capabilities also provide the means of 'heads-up' or screen digitizing from raster images of map sources. Vector data, created either by manual point input using a screen cursor or by use of higher level drafting functions, is immediately displayed, superimposed on the raster source image (Plate 17.6). Accuracy is still dependent on manual positioning, augmented, albeit clumsily, by the ability to work at high magnification. The content of the available display window is also a significant limitation. Nevertheless, many protagonists have reported significant gains over the use of digitizing tables, particularly for large-scale maps and plans. As raster storage of source documents becomes more the norm, the small footprint and other advantages of this technique make it increasingly attractive. It is worthy of note that if greyscale backgrounds are supported the technique can be applied to the creation of vector data from remote sensing images or scanned aerial photographs. Also, if interactive thresholding of the greyscale background is available, useful data can be captured from poor quality source documents. A recent advance has been the use of 'raster-snapping' to improve accuracy.
Interactive automatic systems An important alternative to fully automatic raster to-vector conversion techniques is exemplified by the Laser-Scan VTRAK system (Waters 1989) and the Hitachi CAD-Core Tracer system (Sakashita and Tanaka 1989). These systems involve the extraction of vector data from the raster source on a feature-byfeature basis, with real-time display of the results to operators, who control the overall sequence of data capture, provide the interpretation necessary for feature coding prior to feature capture, and intervene in the case of ambiguity or error. This approach also provides for selective data capture in the frequently occurring case where only some of the features present in the source documents are required in the GIS database. Coding of features prior to capture provides an invaluable aid to automatic feature extraction in that the extraction algorithm used can be matched to the class of feature. In an ideal system, feature recognition would be automatic, but in practice when working with cartographic sources this goal is rarely achievable.
Since coding has to be done at some stage it is a system advantage to do it early, so that the appropriate automatic feature extraction algorithms can be invoked, and the appropriate data representations created. Thus, using the VTRAK system as an example, for contours and other curvilinear features, a centre-line extraction algorithm and a data point reduction algorithm (based on the Douglas-Peucker algorithm described below) which preserves shape to within prescribed tolerances is appropriate (Plate 17.7). Rectilinear features on the other hand require vertex extraction algorithms and, in the case of buildings, optionally a squaring algorithm (Plate 17.8). Broken lines, the edges of solid areas and the centre lines of cased roads can all be followed and the appropriate vector representation produced. The data produced can be either vector spaghetti or, if j unction recognition is invoked, a link -andnode structure. Thus in Plate 17.9 a network of road centre lines is being created from cased roads. Nodes and intermediate data points are differentiated in the data (by colour on the screen). In this mode, nodes and links are measured once only, are given unique values and a topological structure is created for further processing. Symbol measurement is also provided, for example for buildings and cadastral symbols. The key elements of such systems are: the ability to zoom and pan rapidly across the combined raster source and vector overlay; appropriate local feature extraction algorithms using all the available raster information; and a 'paintout' facility as a visible and logical progress check. As features are captured, their representation in the raster source is changed, so that they are displayed to the operator in a different colour (as 'done'), and so that they are no longer 'visible' to the feature extraction algorithms. This avoids duplication, and also simplifies the data capture task as the whole process is subtractive. In cases where the source document is of variable quality, the source raster image can be held as greyscale. This increases the size of working files (e.g. by a factor of four). However, the ability to vary the threshold according to the context is very powerful and enables clean vector data to be produced from unpromising material (Plate 17.10).
In practice, if separations are available or if the source contains features of only one class, operator interaction is not necessary and features can be extracted as a batch process in an 'auto pass' mode. This works well for contour or drainage separations
and for polygon networks, particularly as there is provision for indicating 'no-go' areas. The interactive automatic system software can be installed on a standard workstation platform, together with editing and post-processing software. A typical flowline is a combination of autopass, interactive feature extraction and overlay editing. At all stages there is a continuous visual assessment of the resulting vector data against the raster source, building in data quality checks as the data are created
An interesting alternative technique for the creation of structured and attributed vector data is exemplified by the SysScan GEOREC system (Egger 1990). In this, the startpoint is the set of vectors created by an automatic raster-to-vector conversion process. Features are recognized and extracted from this set of vectors by the application of a 'production line' which can utilize combinations of more than 150 algorithms held in a 'method bank'. Algorithms include vector geometry enhancements, methods which handle neighbourhood relationships, a statistical recognition package for text and methods for replacing vectorized geometry with symbol references. Geometrical elements are classed as nodes, symbols, lines, areas and texts. Topological information between these elements is maintained via a set of suitable forward and backward pointers, and groups of geometrical elements which form a logical entity can be combined in sets. Recognition and structuring proceeds by sequences of operations under the generic descriptions of 'select', 'grow' and 'apply'. A 'production line' is usually set up interactively, but is controlled by a programming language (GPL), so that once the control structures have been created for the classes of features in a given map series, the whole process of feature extraction can be invoked automatically, with only exceptions needing subsequent manual editing. Knowledge and experience of manual digitizing flowlines are invaluable in the development of GPL programs. Good quality, cost effective results are reported in some instances from good quality, wellbehaved source maps.
Data capture and processing algorithms

The resolution of the source raster image must be adequate for the geometry to be accurately extracted by the vectorization algorithms. Typically, there needs to be at least 2-3 pixels in the finest lines in order to establish a cartographically acceptable vector representation. On the other hand it is important that the vector representation contains an optimal number of points, approximately the same as would result from an experienced manual digitizer. Superfluous points will clutter up GIS databases for a long time! Data point reduction is therefore an important requirement, and the Douglas-Peucker algorithm (Douglas and Peucker 1973), originally devised in the context of cartographic generalization, is widely used for this purpose. The principle is illustrated in next figure.
algorithm to reduce the number of data points required to represent curvilinear features.
The following description is adapted for thinning data points from a dense set representing a line, as they emerge from a line-following algorithm applied to a raster source. The first point on the line is used as an 'anchor' point and the last point of the line segment currently under consideration as a 'floater'. The point with the greatest perpendicular distance from the line joining the anchor to the floater is examined. If this distance is less than the prescribed tolerance, the next point along the line as extracted from the raster image becomes the floater and the process is repeated. If the tolerance is exceeded, the point preceding the floater is passed through to the vector representation and becomes the new anchor point, and the whole process recommences. Intermediate points from the raster image are discarded. If the tolerance is chosen to be half the line width, say, an acceptable representation of the shape is obtained with an optimal number of data points. Vertex extraction algorithms hinge on there cognition of changes of direction, and on the fitting of straight line segments to the data points on either side of the putative vertex. Special cases arise when vertices are separated by distance comparable to the line thickness. Squaring algorithms abound, differing in the sophistication of the control parameters they provide.
Software for dealing with source document distortion and with changes of geographical projection (Snyder 1987; see also Maling 1991 in this volume) is well established. It is good practice to ensure before any correction is applied that the vector data are totally congruent with the raster source, except where discrepancies are deliberate. Such checking can be performed on screen, by vector-on-raster overlay, or by the traditional checkplot. Such quality assurance procedures are treated in more detail in the next section. Distortion is typically removed by least-squares fitting to an appropriate number of control points, over and above the corner points required to register the coordinate system. In some instances it may be appropriate to use any orthophoto sources to improve or correct the control on the cartographic sources. Coordinate transformations for projection changes can then be applied before output in the required GIS data format.
The problems of creating a seamless continuous map from a set of not necessarily homogeneous map sheets is peculiar to GIS applications. Quasiautomatic edge-matching software is available, but in practice the prevalence of anomalies can dictate a considerable human input to the process if fully edge-matched data are a requirement. Techniques for organizing sheet-based source data to present effectively continuous or 'seamless' cover are now well established. The practical problems arise from source data overlap or inconsistency (see Fisher 1991 in this volume). The automatic creation of link-and-node structured data greatly facilitates the creation of correct polygon or parcel data, although software is also available to create such data from unstructured 'spaghetti'. Again the issue is considerably complicated by issues of matching across sheet boundaries.
Quality assurance
The key to the reduction of the burden of data capture costs on a project is data sharing. The most important aspect of data sharing is validation that the data are of a quality acceptable to the needs of a wide community of users. Validation needs to be based on objective tests that can be externally applied however the digital data are captured. Considerable effort by interested parties in the United Kingdom led to the establishment of agreed criteria between the Ordnance Survey and the National Joint Utilities Group (NJUG 1988). Although these criteria are drawn up in terms of large scale (1 : 1250 and 1 : 2500) plans, the principles are of general applicability. Eight tests are applied:
1-Data format - readability 2-Number of data points - no more than 25 per cent excess 3-Coding accuracy - colour code, visual check 4-Positional accuracy - remeasure random samples; mean and standard deviation criteria 5-Squareness of buildings - tested on a sample basis 6- Line junction fitting - by visual inspection on a workstation screen 7-Text - by random sampling in each category
8-Completeness - using an overlaid check plot
Given a specification and an appropriate validation procedure, how are valid data to be captured in a cost-effective and timely manner? Whether manual or automatic techniques are used, checks and feedback mechanisms must be built in throughout the process, as it is not sufficient to check quality only at the end. Automatic techniques, properly applied and controlled, can produce consistent and reliable data quality much more rapidly and cheaply than manual techniques, but flowlines must be designed so that automatic processes fail safe rather than producing copious errors that are then expensive to correct. Whenever possible, structure inherent in the data (e.g. a link and node structure) should be used to ensure that data are correct.
SPATIAL DATA SOURCES AND DATA PROBLEMS
INTRODUCTION
Many of the data that are incorporated into GIS are initially in analogue form, most commonly as hardcopy maps. To be used in GIS, however, map data must undergo a conversion process known as digitizing, a labour-intensive task which is time consuming and prone to error. Fortunately, increasing amounts of data are now obtainable directly in digital form.
ANALOGUE DATA SOURCES

The most important source of analogue spatial data is the map. Since prehistory, maps have been produced with the specific purpose of recording the spatial relationships observed and measured by the map's compiler. Maps are used to convey spatial Many of the problems encountered in developing geographical databases from maps have not been hindrances to the use of maps in the past, because conventional uses place less stringent demands on the analogue medium. But users of digital geographical databases are often unaware of the limitations of conventional maps and consequently may make unreasonable or inappropriate assumptions about the data derived from them. The following discussion about the potential limitations of analogue maps is based mainly on Rhind and Clark (1988).
Map scale
Scale determines the smallest area that can be drawn and recognized on a paper map (Table 13.1).
On a topographic map at a scale of 1 : 50000. it is not possible to represent accurately any object of dimensions less than one line width, or less than about 25 m across. However, small features can be important, so cartographers have devised methods for selecting and symbolizing small but significant features, even though their physical dimensions on the ground may be less than one line width Thus many roads and rivers which are less than 25 m across are nevertheless shown on 1 : 50000 maps. Scale may determine which rivers are shown in a drainage network or which roads in a road network Similarly, scale may determine whether the various features in a class, such as roads, are shown as a single feature class or differentiated (e.g. highway, motorway, main road, minor road, etc.).
The following figures Presentation of the same information about the same area requires cartographic generalization, where some features become exaggerated and others obscured. The progressive elimination of information with changing scale is shown on (A) a stream network and (B) the interaction of road network and urban area
By contrast, a digital database appears, initially, to be independent of scale because it may be portrayed at any scale. If the data were originally collected from a map or maps, then the map scale is important because it determines the size of the minimum mapping area (Table 13.1) and the material included and excluded. As a piece of information with respect to the digital data, however, it is only an identifier of the original map series. In the database, it is more appropriate to identify the map series exactly, and then give the accuracy of the database as a representation of the map. Indeed, this is exactly the approach used by various agencies in producing digital databases for general use (USGS 1987; SCS 1984b). Scale is misused all too often as a measure of accuracy.
Map audience
Assumptions about the map's audience determine the intensity of information included, and the need for additional reference material. A map designed for a technical audience will probably have a higher information density compared to one designed for the public or one designed for a 'wide user community'. Those in the latter category may contain more by way of contextual information such as roads, buildings and towns, at the expense of accurate representation of feature position The cartographer must juggle the conflicting needs of audience and scale. Similarly, the compiler of a database formed by digitizing maps may need to consider the purposes for which those maps were created.
Currency
A map is a representation of features in space as they existed at the time they were surveyed. The real world of geographical information changes continuously, but many maps remains static. Thus maps become increasingly inaccurate as a representation of the world over time. The longtime delay between mapping and publishing often means that most maps are not true records of spatial relations when they are used. Most human users expect this, and compensate for it, although many road-map users still may not understand exactly why some roads are not shown on their maps. Map sheets are revised periodically, of course, and all national mapping agencies maintain a revision programme, but features continue to change.
Fig. 13.2 The purpose for which a map is designed affects map content and precision of that content. Here the actual line plan is compared with the same area, at the same scale, but in a street atlas. Many roads have exaggerated widths, to accommodate road names and enhance visibility, while many building and block outlines are simplified (Source: Keates 1989).
Vegetation and land use maps require constant revision. Although soi]s and geology are less subject to change, even these classes of maps must be updated regularly to accommodate new field work and general improvements in the level of human understanding of soils and geology.
Map coverage
The actual geographical area for which a geographical database might be constructed is variable, from less than one to thousands of square kilometres. Therefore, the source-map coverage must be chosen as appropriate to the task in hand. Scales and completeness of map coverage of different geographical areas are, however, highly variable. This point can be illustrated by reference to mapping in two of the world's most advanced countries: the United States and Britain. In the United States the most detailed complete coverage scale of topographic maps is 1 : 24000, whereas in Britain it is a combination 1 : 1250, 1: 2500 and
Map accuracy
As noted above, maps are an abstraction of reality, and so map makers have been concerned to give concise statements of the accuracy of their products. The US National Map Accuracy Standard, issued by the Bureau of the Budget in 1947 and still in force, is perhaps the best known example of these (see Thompson 1988). Some of the major points included are summarized in Table 13.3, and the standard has recently been revised by a committee of the American Society of Photogrammetry and Remote Sensing (Merchant 1987) which specifies acceptable rootmean-square error terms for horizontal locations for various maps (Table 13.4). Table 13.3 Summary of important parts of the US National Map Accuracy Standard US Bureau of the Budget .On scales smaller than 1 : 20 000, not more than 10 per cent of points tested should be more than 1/50 inch in horizontal error, where points refer only to points which can be well defined on the ground.
On maps with scales larger than 1 : 20 000 the corresponding error term is 1/30 inch. At no more than 10 per cent of the elevations tested will contours be in error by more than one half the contour interval. Accuracy should be tested by comparison of actual map data with survey data of higher accuracy (not necessarily with ground truth). If maps have been tested and do meet these standards, a statement should be made to that effect in the legend. Maps that have been tested but fail to meet the requirements should omit all mention of the standards on the legend.
Planimetric coordinate accuracy requirement of well-defined points
Map sheets and series

The traditional paper map series is sometimes designed, drafted and published as a collection of individual map sheets, because each separate map sheet or quadrangle is intended to stand alone as a single entity. This gives the individual paper map an internal coherence and a pleasing appearance. If the reader is interested in an area beyond that covered by the current map, it must be filed and another extracted from the library. There is no guarantee of conformity across the seam of the maps, however. Many researchers and other users have found to their cost that edge-matching between map sheets can be a major problem. In map series with overlap between contiguous sheets (e.g. 1 : 50000 OS maps of Britain) large features in the zone of overlap may not conform between sheets . This can be even worse in the case of maps prepared on poorly rectified orthophotomaps such as those included with county soil reports by USDA SCS
Map users may also experience difficulties when they attempt to compare different map attributes because of variations in scale and projection. In Britain, for example, the use of 1:63 360 base maps for both soil and geology has allowed these attribute themes to be compared, but lack of conformity with the topographic map series of that scale has meant that comparison with topographic information is not necessarily reliable. In the United States, geology is mapped at the various scales of the standard topographic map series, butsoils are commonly mapped by county on orthophotomaps at a scale of 1 : 15840, almost precluding precise comparison with either topography or geology. The ability to analyze and overlay maps with different attribute data types is, however, integral to GIS. Software has been developed to force data into common scales and projections, either through mathematical transformations or 'rubber-sheet' approximations.
ATTRIBUTE DATA
Attribute data are complementary to location data and describe what is at a point, along a line, or within a polygon. All spatial features have some immediately associated attribute or attributes, such as building type, soil type, etc. Some, such as low level census divisions, are no more than a code value to enable association with other attribute information.
Socio-economic attributes
Some of the most widely used sets of longitudinal (time series) attribute data are derived from national census offices. Census data are essential to planning by many government agencies. Most census offices prepare a number of different censuses (Bureau of the Census 1982, 1984a, 1984b), but the most common and important is the census of population. All population censuses collect a myriad of social variables wherever people occur and operatives can reach . Census results are reported at a number of spatial resolutions and using a variety of media. The results of the 1990 US Census are available in computer readable form for all 7.5 million census blocks (basic enumeration areas), as well as in printed form for the sub-state local government units, and for the smaller block numbering areas (Fulton and Ingold 1989). Data are available as printed tabulations and in computer-readable form on both tape and CDROM (Fulton and Ingold 1989). As in most census reporting, some data are published as absolute counts within geographical areas, while others are based on only a sample of households .
Reported census data are, however, subject to numerous problems of accuracy and reliability. The foremost of these is that individuals are counted in the United States by their 'usual place of residence', which may be different from their legal residence, their voting residence or their domicile (Bureau of the Census 1982). Undercounting is a perennial problem, due to illiteracy, illegal immigration, homelessness and simple unwillingness to complete census returns despite legal inducements (Bureau of the Census 1982). Overcounting is also a problem. In some countries a more systematic bias may be introduced because census takers fail to penetrate particularly inaccessible regions, or have trouble counting all the inhabitants of a village. Furthermore, it is usual in a census to count people by their night time location and, although location of workplace may be included in the census (Bureau of the Census 1982), only a poor representation of day time and working place population distributions may be recorded.
In some cases the counts for certain geographical areas may be so small that statistical representation is uncertain (Kennedy 1989). This introduces a considerable problem of confidentiality since if too few individuals are in a sample it may be possible to identify the individuals concerned. Indeed, in the United Kingdom the data are specifically modified by randomly assigning values of + 1,0 and -1 to low counts (Dewdney 1983), while in the United States low counts are simply suppressed (Bureau of the Census 1982).

Gis Data Capture Hardware and Software

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Gis Data Capture Hardware and Software

Încărcat de

Drepturi de autor:

Formate disponibile

GIS DATA CAPTURE HARDWARE AND SOFTWARE

PRIMARY DATA CAPTURE

Conceptual design of an Integrated GIS.

SECONDARY DATA CAPTURE

Data capture and processing algorithms

8-Completeness - using an overlaid check plot

SPATIAL DATA SOURCES AND DATA PROBLEMS

ANALOGUE DATA SOURCES

Planimetric coordinate accuracy requirement of well-defined points

Map sheets and series

S-ar putea să vă placă și