Prelim Raster and Vector Model

Raster and Vector Data and Data Structures
Jamie Wolfe CITE

Marshall University Huntington, WV - 25755 304-696-6042 Jawolfe@marshall.edu
IS 645 Introduction to GIS

Lecture 07, June 06, 2000
Todays class topics

Raster and Vector Models Raster Models

File Formats Data Models Data Structures
Vector Models
Data Models Topological Models Representation of surfaces
Issues with data exchange
IS 645: Geographic Information Systems, Summer 2000, J. Wolfe
Information represented on a map

Geographic information represented based on the projection and coordinate system used Features, such as roads, rivers, rail lines, etc., All information represented on a map must be translated into electronic form The electronic form of the map may contain just the graphic representation of the map, or information on map is separated into groups of objects (features) and the objects (features) are stored electronically in a group
Representing Feature and Attribute data

A feature on a map may have many attributes For example:

A road may have many attributes:
Length Surface type Maintained by
Each attribute has values:

Length = 500 Surface type = Cement Maintained by = County
Raster and Vector

Graphic representations and geographical space can be presented in raster form or vector form In the raster format, the graphic is represented as a combination of individual units, where each unit can represent only one value. All units are stored to represent the graphic
Example: bitmaps of images, where the image is composed by the combination of individual pixels
In the vector format, the graphic is represented by a set of points, joined by a certain relationship or function. Only the points and the relationship are stored. Intermediate points are determined using the relationship
Example: A CAD drawing (engineering drawing)
Example of raster and vector files

Vector-based line
Flat File 4753456 623412 4753436 623424 4753462 623478 4753432 623482 4753405 623429 4753401 623508 4753462 623555 4753398 623634 Flat File
0000000000000000 0001100000100000 1010100001010000 1100100001010000 0000100010001000 0000100010000100 0001000100000010 0010000100000001 0111001000000001 0000111000000000 0000000000000000
Raster-based line
Vector overlay over raster file
Raster and Vector Data models
Raster data model overlaid over raster image
Water dominates W W G W W G W W G
Winner takes all W G G W W G W G G
Edges separate W E W E E E G G G
Common Raster data file formats
Raster File formats:

BMP (bitmaps) - no compression DIB (Device Independent Bitmaps) GIF (Compuserves Graphical Interchange Format) RLE - Run Length Encoding CCITT Group 3, 4, (International Consultative Committee for Telephone and Telegraph ) - RLE, Huffman coding JPEG (Joint Photographic Experts Group) TIFF (Tagged Image File Format) DEM (Digital Elevation Model)
10
Data Compression Techniques

Lossy: Where there is a certain loss of accuracy in exchange for a greatly increased compression Lossless: Where there is a guarantee that the exact input stream will be generated after the compress/expand cycle Data compression = Modeling + Coding
Symbols Input Stream Model Probabilities Codes Encoder Output Stream
Lossless coding techniques: Huffman, Runlength Encoding, LZ (uses adaptive dictionary), LZW, etc.,
11
Lossless Data Coding techniques
Variable-length coding: Use short codes for most frequent symbols, and longer codes for less frequent symbols. Examples: Huffman, ShanoFannon Sliding Window Compression: Uses previously seen data as a dictionary. Examples: Lempel Ziv (LZ77,LZ78) (used in PKZip), LZW (used in ARC, GIFs) Run Length Encoding (RLE): Replace repeated data by the count of the data elements
12
BMP

Used by MS Windows and OS/2 systems An 8-bit bitmap uses a color table of size 256 (8 bits) to store color depth (based on palette). Each pixel is coded as index of the color table. File size is number of pixels multiplied by bits/pixel Color may vary from computer to computer based on default palette A 24-bit bitmap uses 3 bytes for each pixel to provide a depth of 16 million colors
13
DIB: Device Independent Bitmap
Device-independent bitmap (DIB) format allows Windows to display the bitmap on any type of display device. The term "device independent" means that the bitmap specifies pixel color in a form independent of the method used by a display to represent color.
14
GIF: Graphics Interchange Format

Interplatform format created by Compuserve One file can contain multiple images Uses LZW (Lempel-Ziv-Welch) algorithm for bitmap compression. Patented algorithm requires license for each use Two versions: 87a and 89a. Allows images to be interlaced, have transparent background, or animated Extension blocks provide mechanisms for file annotation Can have only 256 colors in an image
15
RLE: Run Length Encoding

Earliest and simplest method of data compression A repeated string of characters is replaced by two bytes: the number of times character appears and the character itself Basis for many CCITT group standards. Combined with Huffman coding Two dimensional coding schemes: First line coded using 1D scheme, next K lines coded using first line Size of K varies over applications
16
RLE Example: Vector Map
17
RLE Example: Raster Map
18
RLE Example: Raster map with codes
19
RLE Example: RLE file based on codes
20
Color Characteristics
Colors can be mixed in two ways:

Additive mixing: generate colors by mixing various amounts of primary colors: red, green, blue (RGB) Subtractive mixing: generate colors by mixing secondary colors that absorb unwanted colors
Color characteristics:
Luminance or Brightness: Measure of the brightness of light emitted by an object. Human eye responds differently to different colors. Response is highest at wavelength of 575 nm (yellow color) Hue: Sensation produced due to the presence of certain wavelengths of color Saturation: Measure of the color intensity - example: red and pink have same predominant wavelength but pink has more white
21
Color Models
Chromacity model:
3-D model which uses x and y for color and the third dimension for luminance. Additive model
RGB model:
Combines different intensities of red, green, and blue to generate various colors. Additive model
HSI model (Hue, Saturation, Intensity):

represents an artists impression of tint, shade, and tone
CMYK model (Cyan, Magenta, Yellow, Black):

Subtractive color model used for printing
YUV model (Luminance-Chrominance):

Y is the luminance and contains black and white or grayscale information. UV contains color information. Subtractive model
22
Lossy compression
Lossless compression is difficult, especially for continuous tone images, because of slight variations in color Lossy compression:
Based on the principle that the human eye sees finer detail in an image more because of brightness variations than because of color variations. Hence, certain color pixels can be dropped without any perceptible loss To determine which pixels should be dropped, image is converted from a spatial domain to a frequency domain
Most raster image formats use some form of lossy compression technique
23
JPEG compression
JPEG methodology consists of the following steps:

Apply Discrete Cosine Transformation (DCT) to an 8x8 image block to determine DCT coefficients Matrix Quantization: Determine which information can be dropped safely without perceptible loss Use data coding methods such as Huffman to compress the data
Reverse process is used for decoding JPEG image
24
JPEG compression
8x8 Image Block Apply DCT
Matrix Quantization
JPEG Image Compression
Quantizer Table
Huffman Table
25
JPEG compression Example: Input DCT coefficients
Gray scale values of 8x8 pixel Image block
26
JPEG compression Example: Output DCT coefficients
DCT coefficients of 8x8 pixel Image block before quantization
27
JPEG compression Example: DCT coefficients after quantization
DCT coefficients of 8x8 pixel Image block after quantization
28
JPEG compression Example: Storing DCT coefficients
Using zig-zag technique of storing relevant matrix information
29
TIFF: Tagged Image File Format
An image encoded in a TIFF file is wholly defined by its tags, and the file format is highly extensible because additional features can be added simply by defining additional tags The TIFF file format specification defines more than 70 different types of tags. For example, tags are used for: image width in pixels; image height; a color table (if required); compression type; TIFF file can contain multiple images TIFF file format is one of the best for transferring bitmaps across platforms, because it is flexible enough to allow virtually any image to be encoded in binary form without losing any of its attributes, visual or otherwise
30
DOQ: Digital Orthophoto Quads
Georeferenced image of quad or quarter quad (3.75 minute) developed from photograph and other data Displacements due to sensor orientation and terrain relief has been removed DOQQs have a ground pixel distance of 1 meter DOQ created by mosaicking DOQQs and other photography chips Photographs exposed by camera at 20,000 feet above mean terrain with a 6-inch focal length camera
31
DOQ: Digital Orthophoto Quads

Photograph scanned at resolution of 7.5 to 30 micrometers (generally 25 micrometers) A black and white QQ generated from a 240mm square photograph at 25 micrometers produces an image between 45-50 megabytes uncompressed, and yields a ground pixel of 1 meter A standard ASCII header is used. File has image stored west to east with north on top Uses the UTM coordinate system and NAD 83 datum
32
DEMs: Digital Elevation Models
Digital Elevation Model (DEM) data files are digital representations of cartographic information in a raster form. DEMs consist of a sampled array of elevations for a number of ground positions at regularly spaced intervals. These digital cartographic/geographic data files are produced by the U.S. Geological Survey (USGS) as part of the National Mapping Program. DEM data for 7.5-minute units correspond to the USGS 7.5-minute topographic quadrangle map series for all of the United States and its territories
33
DEMs: Digital Elevation Models

DEMs are generated using several techniques, such as interpolation, hyposgraphy layer, etc Some uses of DEMs
Cut and fill volume estimation Coarse contour maps Line of sight (viewshed) maps Shaded relief maps
Several models are used for storing information to a file, and hence appropriate software is required to read/analyze them
34
DEM Examples
35
DEM Examples
36
A raster data model uses a grid

One grid cell is one unit or holds one attribute Every cell has a value, even if it is missing A cell can hold a number or an index value standing for an attribute A cell has a resolution, given as the cell size in ground units A thematic map of grid cells where each cell represents the same theme is called a coverage
37
Generic structure for a grid
Grid extent
Resolution
Columns
Figure 3.1 Generic structure for a grid.
Rows
Grid cell
38
The mixed pixel problem
Water dominates W W G W W G W W G
Winner takes all W G G W W G W G G
Edges separate W E W E E E G G G
39
IDRISI Raster Model
IDRISI names raster files as images. Each image consists of a defined count of rows and columns thus forming cells. These cells are stored as a sequence of numbers (byte, integer or real) representing values (vegetation classcodes, reflectance numbers, political units, z-values in a DEM, ...) A raster in IDRISI carries no information about 'itself' - it stores that metadata separately. This is done by so-called raster documentation files (*.DOC). All images must have their corresponding DOC-files. These are ASCII files made up by a sequence of lines, each representing metadata
40
IDRISI Raster Model
The values may represent some code for land usage. IDRISI is starting in the upper-left corner (row 0/column 0), then advances column by column and row by row. In the simplest format ASCII - the cellvalues are stored one in each line
41
Raster Data Structure: Quad-trees

0 2 1 3
210
0 2
0 2 1 3
1 3
quadrant number
Figure 3.9 The quad-tree structure. Reference to code 210.
42
Pros and Cons of Raster files
Pros
Can be easily created from existing pixel data in memory Pixel values can be modified individually or in a group by using a palette Translate well to CRT based output
Cons
Can be very large, based on size and number of colors Do not scale very well. Decimation (throwing away pixels) may make an image unacceptable
43
The Vector Model

Vectors are line segments minimally defined by a starting point, a direction, and a length A vector data model uses points stored by their real (earth) coordinates Lines and areas are built from sequences of points in order. Lines have a direction based on the ordering of the points Straight, curved lines, and simple shapes can be used to create more complex shapes Vector models can store information about topology
44
Topological model

A line is a segment between two points (vertices) A link (or arc or chain) is a connection between two nodes. A link may consist of several lines which are joined at points (vertices) Links can only originate, terminate or be connected at nodes A point (vertex) is where a line originates or terminates A node is where a link originates or terminates A polygon (area) is composed of links. Adjacent polygons have only one link between them
45
Topological model characteristics

Direction
All links have a direction (the FROM node and TO node)
Connectivity
Keep track of which links are connected at a node
Adjacency
A link can determine the polygon to its left and its right
Nestedness
what nodes and links and other polygons are within a polygon
These characteristics allows the software to determine the relationships between the individual graphic objects as well as values for length, perimeter, area, etc.
46
Non Topological Vector Model Spaghetti model

The map is maintained as a conceptual model. It is a one-for-one translation of the analog map Imagine covering each graphic object on the analog map with a piece of spaghetti, where each spaghetti acts as a single entity, without any structure between them Each entity is a single, logical record coded as variable length strings of (X,Y) coordinate pairs A polygon is a closed loop coordinate string No two adjacent polygons share the same spaghetti string - hence stored twice
47

Lack of topology increases computational overhead Generally used where analysis is not important Good for plotting
48
From Demers, Page 109
49
Topological Models : Basic arc topology
50
ESRIs ARC model: Coverage Model
Copyright: ESRI
51
ESRIs ARC model: Point Feature Type
Copyright: ESRI
52
ESRIs ARC model: Arc Feature Type
Copyright: ESRI
53
ESRIs ARC model: Arc-Node Topology
Copyright: ESRI
54
ESRIs ARC model: Polygon Feature Type
Copyright: ESRI
55
ESRIs ARC model: Polygon-Arc Topology
Copyright: ESRI
56
ESRIs ARC model: Tics Feature Type
Copyright: ESRI
57
ESRIs ARC model: Boundary Feature Type
Copyright: ESRI
58
ESRIs ARC model: Annotation Feature Type
Copyright: ESRI
59
Intergraphs MGE Model
Uses the following feature types:

Point Line Area Centroid Area Boundary Label
60
Topological Vector Formats: GBF/DIME
Best known topological model is the Geographic Base File / Dual Independent Map Encoding (GBF/DIME) model Developed by US Census Bureau to store street map data for decennial census Street addresses and UTM coordinates of each link are defined, permitting street addresses to be accessed by geographic coordinates Suffers from same problems as spaghetti model because the program must perform a sequential search to search for a particular link (ex: a street is broken at intersections)
61
Topological Vector Formats: GBF/DIME
From Demers, Page 112
62
Topological Vector Formats: TIGER

Topologically Integrated Geographic Encoding and Referencing System (TIGER) Designed for use for the 1990 census Points, lines, and areas can be explicitly addressed
63
The TIGER data structure

Landmark
Map 3
2
157
Addresses on block
6
158 159
Files
Zero cells
Nodes 13,17, 21, 22,23, 158,159 (x,y) values 18,19, 156,157,
156
First St.
3 1
87
Ave nu e B
4
Lake Drive
88
A ve n u e C
89
22
One cells
1,2,3,4,5,6,7,8, 9,10,11,12,13, 15, 16,17,18 Addresses 14 ,
11
86
Avenue A
21
23
Second St.
18
17
A ve nu e D
10
85
13
13
90
14
91
Two cells
Lake, Blocks 86, 87, 88, 89, 90, 91
12
Third St.
Zero cell
17
15
16
18 19
Two cell One cell
From Clarke, Pg 91 64
Topological Vector Formats: DLG

Developed by USGS to produce digital 7.5 minute and 15 minute Topographic Map series Uses the currently available 7.5 minute topographic maps as a base Information is separated into layers, such as: hydrographic, transportation, etc. In additional to topological data, points, lines, and areas have attribute codes attached to them consisting of a three digit major code and a four digit minor code. Ex: 050 0200 is a Shoreline on the hydrography layer The file also has header records for metadata information
65
Topological Vector Formats: DLG
From Estes, Page 56

66
Topology Matters

Topology allows automated error detection and elimination The tolerances controlling snapping, elimination, and merging must be considered carefully, because they can move features Complete topology makes map overlay feasible Topology allows many GIS operations to be done without accessing the point files. This can mean considerable speed improvements for many operations. Ex. bounding box tests, route analysis, polygon neighbor (contiguity) operations
67
Unsnapped node
From Clarke
68
The bounding rectangle

(xmax, ymax)
(xmin, ymin)
From Clarke
69
Network Analysis
Network models are built on top of topological models to establish routes between connected nodes Data associated with individual routes is stored in tables associated with the route
70
Vectors to represent surfaces and 3D

In vector models, the space between the graphical entities is implied Volumes (continuous surfaces) are represented with the Triangulated Irregular Network (TIN) model, including edge or triangle topology It allows surface models to be generated efficiently to analyze and display terrain and other types of surfaces The fundamental building block of the TIN data model is the node. Model a surface by placing irregular nodes that act as vertices Each node has an explicit topographic value
71
Vectors to represent surfaces and 3D
Nodes are connected to their nearest neighbors by edges, according to a set of rules, to represent an area of uniform topography Finally, the TIN model creates a network of triangles by storing the topological relationships of the triangles TINs use an optimal Delaunay triangulation of a set of irregularly distributed points TINs are popular in CAD and surveying packages
72
TIN: Triangulated Irregular Network

Elevations as raster (top), TIN (bottom)
From: A. Turner, Latrobe.edu
73
Creating a TIN
Initially approximate the map by a square with 2 triangles, 4 points, and 5 edges Find the most deviant point in either triangle and split that triangle into 3 by inserting a new point and 3 edges Next check all quadrilaterals composed of a new triangle and a old triangle to see if the diagonal should be swapped (based on certain criteria) Finally, find the new most deviant point and repeat
From: W.R.Franklin, RPI.edu
74
From Clarke Page 81
75
Vector File Formats
Vector file formats are of three types:

Page Description Languages (PDLs) Vector information languages (such as Autocad DXF, Microstation DGN) Vector topological files such as GBF/DIME, TIGER
PDLs (such as Postscript, PDF), are used for output on print or display devices and are not true graphics file formats.
76
AutoCAD DXF Vector data file format

Consists of up to seven sections: Header, tables, blocks, classes, objects, entities, and end-of-file A DXF file consists of group codes, and associated values. For example: Code 9 introduces the name of the header section and 999 for a comment
77
AutoCAD DXF Vector data file format
A minimal, yet complete DXF file example would be:
From GFF, Page 279

78
Comparing Raster Vs Vector
Raster Models:
Good for surface representation whereas TIN models must be used to represent surfaces for vector models Easy to conceptualize space representation Allows easy integration of image data (satellite, remoted sensed, etc.) Do not provide precise locational, and area computation information due to grid cells Requires large storage capacity Blocky appearance when image is viewed in detail
79
Comparing Raster Vs Vector
Vector
Provides precise locational information of points Vector can represent point, line, and area features very accurately. Hence measurements are very accurate Topological models enable many types of analysis Vectors are far more efficient in storage than grids Vectors work well with pen and light-plotting devices and tablet digitizers Vectors are not good at continuous coverages or plotters that fill areas. Spatial analysis is difficult Image data overlay requires special image tools
80
Data Exchange
A GIS is based on either a vector model or a raster model. Within each, it may use (import) many file formats and convert them to its internal model/ data structure A vector based GIS may support raster based layers (and vice versa), but most analysis tools will be based on its native model (either raster or vector) Changing vector to raster is less time consuming and simpler than raster to vector. Loss of data and inaccuracies may result with either conversion
81
Vector to raster exchange errors
From Clarke, Page 94
82
Data Exchange
Data also are often exchanged or transferred between entire different GIS packages and computer systems In the past, GIS data exchange has been been isolated. However, with the development and use of Open Standards for systems, architecture, databases, and interfaces, handshaking between GISs of various types is now possible Also, the use of GIS as a component of an Integrated System (Enterprise system) has made it necessary to share and exchange data
83
GIS Data Exchange
Blind data exchange by translation (export and import) can lead to significant errors in attributes and in geometry In the United States, the Spatial Data Transfer Standard (SDTS) was evolved to facilitate data transfer. It became a federal standard (FIPS 173) in 1992. Website: http://mcmcweb.er.usgs.gov/sdts/index.html SDTS is quite complex and contains a terminology, a set of references, a list of features, a transfer mechanism, and an accuracy standard Both DLG and TIGER data are available in SDTS format
84
GIS Data Exchange
Other standard effort is:

Tri-Service Spatial Data Standards http://fwgcom.wes.army.mil/projects/standard/tssds/ For it to be effectively used in an organization, it is important for a GIS to be able to exchange data (graphics, databases, analysis) and be a component in an Enterprise-wide system
85

Prelim Raster and Vector Model

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Prelim Raster and Vector Model

Încărcat de

Drepturi de autor:

Formate disponibile

Raster and Vector Data and Data Structures

Jamie Wolfe CITE

IS 645 Introduction to GIS

Todays class topics

Raster and Vector Models Raster Models

Issues with data exchange

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Information represented on a map

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Representing Feature and Attribute data

A feature on a map may have many attributes For example:

Each attribute has values:

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Raster and Vector

Example of raster and vector files

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Vector overlay over raster file

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Raster and Vector Data models

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Raster data model overlaid over raster image

Winner takes all W G G W W G W G G

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Common Raster data file formats

Raster File formats:

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Data Compression Techniques

Lossless Data Coding techniques

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

DIB: Device Independent Bitmap

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

GIF: Graphics Interchange Format

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

RLE: Run Length Encoding

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

RLE Example: Vector Map

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

RLE Example: Raster Map

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

RLE Example: Raster map with codes

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

RLE Example: RLE file based on codes

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

Colors can be mixed in two ways:

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

HSI model (Hue, Saturation, Intensity):

CMYK model (Cyan, Magenta, Yellow, Black):

YUV model (Luminance-Chrominance):

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

JPEG methodology consists of the following steps:

Reverse process is used for decoding JPEG image

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

JPEG Image Compression

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

JPEG compression Example: Input DCT coefficients

Gray scale values of 8x8 pixel Image block

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

JPEG compression Example: Output DCT coefficients

DCT coefficients of 8x8 pixel Image block before quantization

IS 645: Geographic Information Systems, Summer 2000, J. Wolfe

JPEG compression Example: DCT coefficients after quantization

DCT coefficients of 8x8 pixel Image block after quantization