2.
(Raster Data Model )A. THE DATA MODEL• geographical variation in the real world is infinitely complex• the closer you look, the more detail you see, almost without limit it would take aninfinitely large database to capture the real world precisely• data must somehow be reduced to a finite and manageable quantity by a process ofgeneralization or abstraction• geographical variation must be represented in terms of discrete elements or objectsthe rules used to convert real geographical variation into discrete objects is the datamodel• Tsichritzis and Lochovsky (1977) define a data model as "a set of guidelines for therepresentation of the logical organization of the data in a database... (consisting) ofnamed logical units of data and the relationships between them.”• current GISs differ according the way in which they organize reality through the datamodel• each model tends to fit certain types of data and applications better than othersthe data model chosen for a particular project or application is also influenced by: - the software available - the training of the key individuals historical precedent 2
3.
there are two major choices of data model - raster and vectorraster model divides the entire study area into a regular grid of cells in specific sequence•the conventional sequence is row by row from the top leftcorner• each cell contains a single value.• is space-filling since every location in the study areacorresponds to a cell in the raster• one set of cells and associated values is a layerthere may be many layers in a database, e.g. soil type,elevation, land use, land covervector model uses discrete line segments or points to identifylocations • discrete objects (boundaries, streams, cities) are formed by connecting line segments • vector objects do not necessarily fill space, not all locations in space need to be referenced in the model • a raster model tells what occurs everywhere - at each place in the area • a vector model tells where everything occurs - gives a location to every objectconceptually, the raster models are the simplest of the 3available data models
4.
B. CREATING A RASTER consider laying a grid over a geologic mapcreate a raster by coding each cell with a value thatrepresents the rock type which appears in the majority of that cells areas when finished, every cell will have a coded valuein most cases the values that are to be assigned toeach cell in the raster are written into a file, often coded in ASCIIthis file can be created manually by using a wordprocessor, database or spreadsheet program or it can be created automaticallythen it is normally imported into the GIS so that theprogram can reformat the data for its specific processing needs there are several methods for creating raster databases 4
5.
B. CREATING A RASTER Cell by cell entry direct entry of each layer cell by cell is simplest entry may be done within the GIS or into an ASCII file for importing each program will have specific requirements the process is normally tedious and time-consuming layer can contain millions of cellsaverage Landsat image is around 7.4 x 106 pixels, average TM scene is about34.9 x 106 pixels 5
6.
run length encoding can be more efficient values often occur in runs across several cellsthis is a form of spatial autocorrelation - tendency for nearby things to be more similar than distant things data entered as pairs, first run length, then valuee.g. the array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 would be entered as 3 0 21203120311041 this is 16 items to enter, instead of 20 in this case the saving is 20%, but much higher savings occur in practiceimagine a database of 10,000,000 cells and a layer which records the county containing each pixel suppose there are only two counties in the area covered by the database each cell can have one of only two values so the runs will be very long only some GISs have the capability to use run length encoded files Digital datamuch raster data is already in digital form, as images, etchowever, resampling will likely be needed in order that pixels coincide in each layerbecause remote sensing generates images, it is easier to interface with a raster GIS than any other typeelevation data is commonly available in digital raster form from agencies such as the US Geological Survey 6
7.
C. CELL VALUESTypes of valuesthe type of values contained in cells in a rasterdepend upon both the reality being coded andthe GIS different systems allow different classesof values, including:whole numbers (integers) real (decimal) valuesalphabetic valuesmany systems only allow integers, others whichallow different types restrict each separate rasterlayer to a single kind of valueif systems allow several types of values, e.g.some layers numeric, some non-numeric, theyshould warn the user against doing unreasonableoperationse.g. it is unreasonable to try to multiply the valuesin a numeric layer with the values in a non-numeric layerinteger values often act as code numbers, which"point" to names in an associated table or legende.g. the first example might have the followinglegend identifying the name of each soil class: 0= "no class" 1 = "fine sandy loam" 2 = "coarse 7sand" 3 = "gravel"
8.
One value per celleach pixel or cell is assumed to have only one value this is often inaccurate - theboundary of two soil types may run across the middle of a pixelin such cases the pixel is given the value of the largest fraction of the cell, or thevalue of the middle point in the cell note, however, a few systems allow a pixel to have multiple valuesthe NARIS system developed at the University of Illinois in the 1970s allowed each pixel to have any number of values and associated percentagese.g. 30% a, 30% b, 40% c D. MAP LAYERS the data for an area can be visualized as a set of maps of layersa map layer is a set of data describing a single characteristic for each location within a bounded geographic areaonly one item of information is available for each location within a single layer - multiple items of information require multiple layerson the other hand, a topographic map can show multiple items of information for each location, within limitse.g. elevation (contours), counties (boundaries), roads, railroads, urbanizedareas these would be 5 layers in a raster GIS typical raster databases contain up to a hundred layerseach layer (matrix, lattice, raster, array) typically contains hundreds or thousands of cells 8important characteristics of a layer are its resolution, orientation and zone
10.
Resolutionin general, resolution can be defined as the minimum linear dimension of the smallest unit of geographic space for which data are recordedin the raster model the smallest units are generally rectangular (occasionally systems) have used hexagons or triangles these smallest units are known as cells, pixels note: high resolution refers to rasters with small cell dimensions high resolution means lots of detail, lots of cells, large rasters, small cells Orientation the angle between true north and the direction defined by the columns of the raster Zones each zone of a map layer is a set of contiguous locations that exhibit the same value: these might be ownership parcels political units such as counties or nations lakes or islands individual patches of the same soil or vegetation type there is considerable confusion over terms here other terms commonly used for this concept are patch, region, polygoneach of these terms, however, have different meanings to individual users and different definitions in specific GIS packagesin addition, there is a need for a second term which refers to all individual zonesthat 10
11.
have the same characteristics class is often used for this conceptnote that not all map layers will have zones, cell contents may vary continuously over the region making every cells value unique e.g. satellite sensors record a separate value for reflection from each cell) major components of a zone are its value and location(s Value is the item of information stored in a layer for each pixel or cell cells in the same zone have the same value Locationgenerally location is identified by an ordered pair of coordinates (row and columnnumbers) that unambiguously identify the location of each unit of geographic space) in the raster (cell, pixel, grid cellusually the true geographic location of one or more of the corners of the raster is also known E. EXAMPLE ANALYSIS USING A RASTER GIS Objective identify areas suitable for logging: an area is suitable if it satisfies the following criteria) is Jackpine (Black Spruce are not valuableis well drained (poorly drained and waterlogged terrain cannot support) equipment, logging causes unacceptable environmental damageis not within 500 m of a lake or watercourse (erosion may cause deterioration of 11) water quality
13.
Procedurerecode layer 2 as follows, creating layer 4 y if value 2 (Jackpine n if other value (recode layer 3 as follows, creating layer 5 y if value 2 (good n if other value(spread the lake on layer 1 by one cell (500m), creating layer 6recode the spread lake on layer 6 asfollows, creating layer 7 n if in spread lake y if notoverlay layers 4 and 5 to obtain layer 8,coding as follows y if both 4 and 5 are y n otherwiseoverlay layers 7 and 8 to obtain layer 9,coding as follows y if both 7 and 8 are y n otherwiseOperations used recode overlay spread 13
18.
Slopes and aspectsif the values in a layer are elevations, we can compute the steepness of slopes bylooking at the difference between a pixels value and those of its adjacent neighborsthe direction of steepest slope, or the direction in which the surface is locally"facing", is called its aspect aspect can be measured in degrees from North or by compass points - N, NE, Eslope and aspect are useful in analyzing vegetation patterns, computing energybalances and modeling erosion or runoff aspect determines the direction of runoff this can be used to sketch drainage paths for runoffE. OPERATIONS ON EXTENDED NEIGHBORHOODSDistancecalculate the distance of each cell from a cell or the nearest of several cells each pixels value in the new layer is its distance from the given cell(s)Buffer zonesbuffers around objects and features are very useful GIS capabilities e.g. build a logging buffer 500 m wide around all lakes and watercoursesbuffer operations can be visualized as spreading the object spatially by a givendistancethe result could be a layer with values: 1 if in original selected object 2 if in buffer 0 ifoutside object and bufferapplications include noise buffers around roads, safety buffers around hazardous 18facilities
19.
in many programs the buffer operation requires the user to first do a distance operation, then a reclassification of the distance layer" the rate of spreading may be modified by another layer representing "friction e.g. the friction layer could represent varying cost of travel. this will affect the width of the buffer - narrow in areas of high friction, etc Visible area or "viewshed"given a layer of elevations, and one or more viewpoints, compute the area visible from at least one viewpoint e.g. value = 1 if visible, 0 if notuseful for planning locations of unsightly facilities such as smokestacks, or surveillance facilities such as fire towers, or transmission facilities F. OPERATIONS ON ZONES (GROUPS OF PIXELS) Identifying zonesby comparing adjacent pixels, identify all patches or zones having the same value give each such patch or zone a unique number set each pixels value to the number of its patch or zone Areas of zonesmeasure the area of each zone and assign this value to each pixel instead of the zones numberalternatively output may be in the form of a summary table sent to the printer or a file 19
20.
Perimeter of zonesmeasure the perimeter of each zone and assign this value to each pixel instead of the zones numberalternatively output may be in the form of a summary table sent to the printer or a filelength of perimeter is determined by summing the number of exterior cell edges in each zonenote: the values calculated in both area and perimeter are highly dependentupon the orientation of objects (zones) with respect to the orientation of the gridhowever, if boundaries in the study area do not have a dominant orientation such errors may cancel out Distance from zone boundarymeasure the distance from each pixel to the nearest part of its zone boundary, and assign this value to the pixelboundary is defined as the pixels which are adjacent to pixels of different values Shape of zonemeasure the shape of the zone and assign this to each pixel in the zone one ofthe most common ways to measure shape is by comparing the perimeter lengthof a zone to the square root of its area by dividing this number by 3.54 we get ameasure which ranges from 1 for a circle (the most compact shape possible) to 1.13 for a square to large numbers for long, thin, wiggly zones commands like this are important in landscape ecology 20
21.
helpful in studying the effects of geometry and spatial Perimeter of zones arrangement of habitate.g. size and shape of woodlots on the animal species they can sustaine.g. value of linear park corridors across urban areas in allowing migration of animal species G. COMMANDS TO DESCRIBE CONTENTS OF LAYERS it is important to have ways of describing a layers contents particularly new layers created by GIS operations particularly in generating results of analysis One layer generate statistics on a layer e.g. mean, median, most common value, other statistics More than one layer compare two maps statistically? e.g. is pattern on one map related to pattern on the other e.g. chi-square test, regression, analysis of variance Zones on one layer generate statistics for the zones on a layer e.g. largest, smallest, number, mean area 21
22.
H. ESSENTIAL HOUSEKEEPING list available layers input, copy, rename layers import and export layers to and from other systems other raster GIS input of images from remote sensing system other types of GIS identify resolution, orientation" resample" changing cell size, orientation, portion of raster to analyze change colors provide help to the user!) exit from the GIS (the most important command of all 22
23.
INTRODUCTIONWhy use raster?•data are acquired in that form remote sensing, photogrammetry or scanning•is a common way of structuring digital elevation data .•raster assumes no prior knowledge of the phenomenon, sampling is done uniformly •knowledge of variability would allow us to sample more heavily in areas of high variability (rugged terrain) and less heavily in smooth terrain•data are often converted to raster as a common format for data interchange•for merging with remote sensing images or DEMs•raster algorithms are often simpler and faster •e.g. buffer zone generation is simpler in raster•raster may be appropriate if the solution requires uniform resolution, e.g. in findingoptimum routes for linear features such as power lines, or in inferring the locations ofstream networks from DEMsObjectivesthere are many options for storing raster data (many data structures( •some are more economical than others in use of storage •some are more efficient in access and processing speed 23
24.
B. STORAGE OPTIONS FOR RASTER DATAby convention, raster data is normally stored row by row from the top left this is the European/North American reading order is also the order of scan of a TV imageexamplethe image A A A A A B B B A A B B A A A B would be stored in 16 memory positions, one for each pixel, in the sequence: A A AAABBBAABBAAABWhat if there is more than one layer?two options: 1. store the layers separately this is the normal practice2 . store all information for each pixel together this requires extra space to be allocated initially within each pixels storage location for layers which might be created later during analysis this is usually difficult to anticipateWhat do raster systems store in each pixel?some allow only an integer, in a fixed range, e.g. -127 to +127 (1 byte per pixel) or-32767 to +32767 (2 bytes per pixel(some allow integers, real (decimal) numbers and mixed alphabetic letters andnumbers in each pixel in this case it helps if the system keeps track of what type of data is stored in each layer and stops the user doing wrong types of analysis on the data 24
25.
Example: vegetation data is recorded as a class (A thru G) in each pixel elevation data is recorded as a decimal number (e.g. 100.3 m( the system should not allow the user to add the pixel values from the two layers (A + 100.3) or perform any other kind of arithmetic operation on the vegetation dataRaster/Vector combinations•many raster-based systems allow vector input Example: •a polygon, defined by its vertices, is input •convert this to a raster •e.g. assign 1 to all pixels inside the polygon, 0 to all outside•some forms of data are really hybrids of raster and vector: •Freeman chain code has finite resolution based on pixels (raster-like) but defines lines and the boundaries of objects (vector-like) •a raster can be used to define objects at fixed resolution if every pixel is given an object number instead of a value •the object numbers are pointers to an attribute table: Raster ObjectAttributes•23 23 23 24 23 A 100.0 23 23 24 24 24 B 101.1 23 23 24 24 23 23 23 24 •this gives us an object with its attributes, plus a list of pixels associated with the object instead of the objects coordinates•in this sense, a raster is a finite resolution geometry rather than an alternative way ofstructuring spatial data 25
26.
C. RUN ENCODINGgeographical data tends to be "spatially auto-correlated", meaning that objects whichare close to each other tend to have similar attributes Tobler expressed it this way: "All things are related, but nearby things are more related than distant things"because of this principle, we expect neighboring pixels to have similar values so instead of repeating pixel values, we can code the raster as pairs of numbers - (run length, value( e.g. instead of 16 pixel values in original raster matrix, we have : 4A 1A 3B 2A 2B 3A 1B produces 7 integer/value pairs to be storedif a run is not required to break at the end of each line we can compress this further:5A 3B 2A 2B 3A 1B = 6 pairshowever, it helps to limit the possible size of the run so that we can use less space tostore the run length, as the amount of space allocated must be sufficient for themaximum run lengthProblemslayers now have different lengths depending on the amount of compression (lengths ofruns(storing all layers together for each pixel now makes no senserun encoding would be little use for DEM data or any other type of data whereneighboring pixels almost always have different values 26
27.
D. SCAN ORDER 1. Row orderdescribed alreadyare there better ways of ordering the raster than row by row from the top left? otherorders may produce greater compression2 Row prime order (Boustrophedon(suppose we reverse every other row: diagramthis has the charming name boustrophedon from the Greek for "how an oxen plows afield"avoids a long jump at the end of each row, so perhaps the raster would produce fewerruns and thus greater compressionthis order is used in the Public Land Survey System: the sections in each township arenumbered in this wayone the original raster it results in: 4A 3B 3A 3B 3A = 5 runs3 Morton orderMorton order is the basis of many efforts to reduce database volume named for GuyMorton who devised it as a way of ordering data in the Canada GeographicInformation System however, this way of ordering or scanning a raster was well knownlong before Morton it is associated with the names of several mathematicians andgeometers: Hilbert, 27
28.
Peano, and Koch coincidentally, Morton is the name of the lower left corner county in Kansasthe strategy is to exhaust each area of the map in sequence, whereas row by roworder scans from one side to the other this minimizes the number of large jumpsdiagramthis is one of several hierarchical ordering systems it is built up level by level, repeating the same pattern at each level, as follows 2 3 10 11 14 15 42 43 46 47 58 59 62 63 0 1 8 9 12 13 40 41 44 45 56 57 60 61 2 3 6 7 34 35 38 39 50 51 54 55 0 1 4 5 32 33 36 37 48 49 52 53 10 11 14 15 26 27 30 31 8 9 12 13 24 25 28 292 3 6 7 18 19 22 23 0 1 4 5 16 17 20 21it is only valid for square arrays where the numbers of rows and columns are powersof 2 e.g. 2x2, 4x4, 8x8, 16x16, 32x32, 64x64, etc.how does it do on our 4x4 array? 5A 3B 1A 1B 2A 2B 2A = 7 runs which is as long as row by row compression4Peano scan (also Pi-Order or Hilbert(the Peano scan or Pi-order is like boustrophedon in always moving to a neighboringpixel diagram 28
29.
E. DECODING SCAN ORDERSsince Morton and Peano orders are useful but complex, two types of questions arise when theyare used:1 What are the row and column numbers for a given pixel?2 What is the position in the scan order for a given row and column number?Methodstart by numbering the rows and columns from 0 up: 3 10 11 14 15 2 8 9 12 13 1 2 3 6 7 0 0 1 450123- row 2, column 3 is position 13 in the Morton sequence1- How to go from row 2, column 3 to Morton sequence?a. convert row and column numbers to binary representations:16s 8s 4s 2s 1s 1 0 row 2 1 1 column 3b. interleave the bits, alternating row and column bits (called bit interleaving (1 1 0 1 row col c. evaluate this sequence of bits as a binary number:Answer: 8 + 4 + 1 = 13 so to get the Morton position, interleave the bits of the row and columnnumber . How to find row and column number from Morton position 9? a. convert the positionnumber to a binary number 16s 8s 4s 2s 1s 1 0 0 1 (8 + 1 = 9) row col b. separate the bits:1 0 row = 2 0 1 col = 1Generalizationcan express the row and column number to any base, not just base 2 (binary), and includingmixtures of basesexample: row 6, column 15, using base 4 instead of base 2 6464s 16s 4s 1s 1 2 row 6 = 1x4 +2x1 3 3 col 15 = 3x4 + 3x1interleaving:1 3 2 3 1x64 + 3x16 + 2x4 + 3x1 = 123 29answer: row 6 column 15 is position 123
30.
HIERARCHICAL DATA STRUCTURESA. INTRODUCTIONdifferent scan orders produce only small differences incompression the major reason for interest in Morton and other hierarchical scan orders is for faster data accessthe amount of information shown on a map varies enormouslyfrom area to area, depending on the local variability it would make sense then to use rasters of different sizes depending on the density of information large cells in smooth or unvarying areas, small cells in rugged or rapidly varying areas unfortunately unequal-sized squares wont fit together ("tile the plane") except under unusual circumstances one such circumstance is when small squares nest within large onesthere are, however, some methods for compressing raster data that do allow forvarying information densitiesB. INDEXING PIXELSconsider the 16 by 16 array in which just one cell is different notation: row and columnnumbering starts at 0 thus the odd cell is at row 4, column 7 30
31.
Procedurebegin by dividing the array into four 8x8 quadrants, andnumbering them 0, 1, 2 and 3 as in the Morton order quads 1,2 and 3 are homogeneous (all A) quad 0 is not homogeneous,so we divide only it into four 4x4 quads these are numbered00, 01, 02 and 03 because they are partitions of the 8x8 quad0 of these, 00, 01 and 02 are homogeneous, but 03 is dividedagain into 030, 031, 032 and 033 now only 031 is nothomogeneous, so it is divided again into 0310, 0311, 0312and 0313what we have done is to recursively subdivide using arule of 4 until either: a square is homogeneous or we reach the highest level of resolution (the pixel size)this allows for discretely adaptable resolution where each resolution step is fixedthis concept is related to the use of Morton order for run encoding if we had coded theraster using Morton order, each homogeneous square would have been a run 8x8squares are runs of 64 in Morton order, 4x4 are runs of 16, etc the run encoded Mortonorder would have been: 16A 16A 16A 4A 1A 1B 1A 1A 4A 4A 64A 64A 64A if we allowruns to continue between blocks we could reduce this to: 53A 1B 202A i.e. ahomogeneous block of 2m by 2m pixels is equivalent to a Morton run of 22m pixels 31
32.
Decoding locationsthe conversion to row and column is the same as fordecoding Morton numbers except that in this case thecode is in base 4 in the example the lone B pixel isassigned code 03111. convert the code to base 2 hint: every base 4 digit converts to a pair of base 2 digits thus 0311 becomes 001101012. separate the bits to get: row 0100 = 4 column 0111 = 7 so the numbering system is just the Morton numbering of blocks, expressed in base 4 however, sequence and data compression are not the most useful aspects of this concept C. THE QUADTREE can express this sequencing as a tree the top is the entire array at each level there is a four-way branching each branch terminates at a homogeneous block the term quadtree is used because it is based on a rule of 4 each of the terminal branches in the tree (the ones having values) is known as a leaf in this case there are 13 32 leafs or homogeneous square blocks
33.
Coding quadtreesto store this tree in memory, need to decide what to store in each memory locationthere are many ways of storing quadtrees, but they all share the same basic ideasone way is to store in each memory location EITHER: 1.the value of the block (e.g. A or B), or or 2. a pointer tothe first of the four "daughter" blocks at the next level downall four daughter blocks of any parent always occur together overhead - Coding quadtreesthus, the quadtree might be stored in memory as: Position: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Contents: 2 6 A A A A A A 10 A 14 A A A B A A level):0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4( the content of position 1 is a pointer indicating that the map is subdivided into four blocks whose contents can be found starting at position 2 position 2 indicates that the four parts of the 0 block can be found beginning at position 6 positions 3, 4 and 5 indicate that the other three level 1 blocks are all A and are not further subdividedAccessing data through a quadtreeconsider two ways in which this quadtree may be accessed:1. find all parts of the map with a given value2. determine the contents of a given pixelnotation: if the array has 2n by 2n pixels there are n possible levels in the tree, or n+1 if we count the top level (level 0) use m for the number of leafs 33
34.
1. to find the parts of a map with a given value we must examine every leaf to see ifits value matches the one required this requires m steps as there are m leafs2. to find the contents of a given pixel, start at the top of the tree if the entire map ishomogeneous, stop as the contents of the pixel are known already if not, follow thebranch containing the pixel do know which branch to follow: take the row and columnnumbers, write them in binary, interleave the bits, and convert to base 4 e.g. row 4,column 7 converts to 0311 at each level, use the appropriate digit to determine whichbranch to follow e.g. for 0311, at level 0 follow branch 0, at level 1 follow branch 3,etc. in the worst case, may have to go to level n to find the contents of the pixel, sothe number of steps will be n 34
35.
QUADTREE ALGORITHMS AND SPATIAL INDEXES A. INTRODUCTIONthis unit examines how quadtrees are used in: several simple processes, including - measurement of area - overlay - finding adjacent leafs - measuring the area of contiguous patchesin addition, this unit will look at how quadtreescan be used to provide indexes for faster access to vector-coded objectsfinally, alternative forms of spatial indexing will be reviewedDefinitionto traverse a quadtree: begin by moving down the leftmost branch to the first leaf after processing each leaf in this branch, move back up to the previous branching point, and turn right this will either lead down to another leaf, or back to a previous branching point diagram several of the following examples use this simple raster and its associated quadtree 35
36.
B. AREA ALGORITHM Procedureto measure the area of A on the map: traverse the tree and add those leafs coded A, weighted by the area at the level of the leafExamplein the example quadtree, elements at level 0 havearea 16, at level 1 - area 4, at level 2 - area 1 thus, area of A is: 1 (leaf 00) + 1(leaf 02) + 1 (leaf 03) + 4 (leaf 2) + 1 (leaf 32) = 8 units C. OVERLAY ALGORITHM Procedureto overlay the two maps: traverse the treessimultaneously, following all branches which exist ineither tree where one tree lacks branches (has a leafwhere the other tree has branches), assign the value of the associated leaf to each of the branches e.g. node 3 is branched on map 1, not on map 2the leafs derived from this node (30, 31, 32and 33) have values B, B, A and B on map1, all 2 on map 2 the new tree has the attributes of both of the maps, e.g. A1, B2 36
37.
Problem D. ADJACENCY ALGORITHMfind if two leafs (e.g. 03 and 2) are adjacent Corollary: find) the leafs adjacent to a given leaf (e.g. 03note that in arc based systems adjacencies are coded inthe data structure (R and L polygons), so this operation is simpler with vector based systems Definitionhere adjacent means sharing a common edge, not just a common point diagram Two casesleaf codes are: 1. same length (same size blocks, e.g. 01and 02) or 2. one is longer than the other (different size) blocks, e.g. 03 and 2solving this problem requires the use of: 1. conversion from base 4 to binary and backbase 4 because of the "rule of 4" used inconstructing quadtrees 2. bit interleaving 3. a new concept called Tesseral Arithmetic Tesseral Arithmetictesseral arithmetic is an alternate arithmetic useful for working with the peculiarities of quadtree addressingto add binary numbers normally, a "carry" works to the position to the left e.g. adding 1 to 0001 gives 0010this is the same as decimal arithmetic except that carries occur when the total reaches 2 instead of 10 37
38.
in tesseral arithmetic, a "carry" works two positions to the left e.g. adding 1 to 0001 gives 0100the reverse happens on subtraction 1000 less 1 is 0010 not 0111, as the subtraction affects only the alternate bitsin other words, if we number the bits from the left starting at 1 adding or subtracting 1 affects only the even- numbered bits adding or subtracting 2 (binary 10) affects only the odd- numbered bitsDetermining AdjacencyDetermining adjacency 1. same size blocks: two leafs areadjacent if their binary representations differ by binary 1 or 10(decimal 1 or 2) in tesseral arithmetic example: 01 and 03 are adjacent because 0001 and 0011 differ by binary 10, or decimal 2example: 033 and 211 are adjacent because in tesseral arithmetic 001111 + 10 = 100101, or 100101 - 10 = 001111: different size blocks: taking the longer of the two codes. 2 convert it from base 4 to binary tesseral-add and -subtract 01 and 10 to create four new codesreject any cases where subtracting was not possible (a "negative" code would have) resulted, or a "carry" would have been necessary to the left of the leftmost digit discard the excess rightmost digits in the resulting transformed longer codes 38 convert back to base 4 to get the leaf
39.
the two blocks are adjacent if any of the transformed and truncated codes are equal to the shorter codeexample: Are 02 and 2 adjacent? convert 02 to binary = 0010 0010 + 1 = 0011 0010 + 10 = 1000 0010 - 1 (impossible) 0010 - 10 = 0000 truncating gives 00 and 10 these are equal to 0 and 2 in base 4 therefore, 02 and 2 are adjacent (also 02 and 0 are adjacent)example: Are 033 and 2 adjacent? convert 033 to binary = 001111 001111 + 1 = 011010 001111 + 10 = 100101 001111 - 1 = 001110 001111 - 10 = 001101 truncating to two digits gives 01, 10 and 00 these are equal to 1,2 and 0 in base 4 therefore, 033 and 2 are adjacentexample: Find leafs adjacent to 03 in the first map abovemethod: find the codes of adjacent blocks of the same size, then work down the tree to find theappropriate leaf (note: can only find equal or shorter codes - equal or bigger leaf blocks) 0011 + 1 = 0110 = 12 : leaf 1 0011 + 10 = 1001 = 21 : leaf 2 0011 - 1 = 0010 = 02 : leaf 02 0011 - 10 = 0001 = 01 : leaf 01 39
40.
Length of common boundarythe length of common boundary between the two blocks is determined by the level ofthe longer code can use this to construct an algorithm to determine the perimeter of a patch e.g. the length of the A/B boundary in the first example mapdiagramE. AREA OF A CONTIGUOUS PATCH ALGORITHM Problemfind the area of a contiguous patch of the same value, e.g. all A Corollary: How manyseparate patches of A are there?note: this is a general method which can be used in both quadtree and vector datastructures i.e. find contiguous sets of quadtree blocks or irregularly shaped polygons, given that adjacencies are known or can be determinedthe following example uses the original raster map note that there are only two contiguous patches; the areas of A and B form only one patch eachMethodArea of a contiguous patchcreate a list of leafs, with their associated codes, by traversing the tree allow space fora "pointer" for each leaf, and give it an initial value of 0 40
41.
Algorithmfor each leaf i: find all adjacent leafs j with equal or shorter length codes (4 maximum) if the adjacent leaf j has the same value, determine which of i and j has the higher (larger value) position in the list, and set its pointer to the lower position (note: if a pointer has already been changed, it may be changed again or left, the result is the same)this produces the final pointer listResults1. the number of contiguous patches will be equal to the number of zeros in the example, two pointers are zero, indicating two contiguous patches2. the value of each patch can be obtained by looking up the values of leafs with 0pointers in the example, leafs 00 and 01 have 0 pointers these have the values A and B respectively3. to find the area of each patch, select one of the zeros and sum its area plus theareas of any leafs which point to it directly or indirectly the component leafs of each patch can be found by starting at with a leaf at the end (or beginning) of the list and following the pointers until a 0 is found 41
42.
e.g. leaf at position 10 (code 33) points to 8, which points to 7, which points to 5, which points to 2, which has a zero pointer therefore, leaf position 10 (code 33) is part of the same patch as leaf 2 (code 01) and has the value B the areas can be found by summing the leaf areas for the example: A leafs: 00 02 03 2 32 A positions: 1 3 4 6 9 Area of A: 1 + 1 +1+4+1=8B leafs: 01 1 30 31 33 B positions: 2 5 7 8 10 Area of B: 1 + 4 + 1 + 1 + 1 = 8F. QUADTREE INDEXES Indexing using quadtreesindexes are used in vector systems to get fast access to the objects in a particulararea of a map very useful in searching for potentially overlapping or intersecting objects therefore, they are an essential part of a polygon overlay operationlooked at the usefulness of a simple sort of objects on one axis (e.g. x) in the movingband operation for intersection calculations now will look at methods which can be thought of as sorting on both axes simultaneouslythese use 2D coding systems and a simple one dimension sortSetting up the indexsteps are: 1. for each object (point, line, area) in the database, find the smallestquadtree leaf which encloses the object 42
43.
some large objects will have to be classified as NULL, as they span more than one of the four leafs in the first branching (0, 1, 2 and 3) other smaller objects may be enclosed within a small leaf, e.g. 0312. sort or index the objects by the enclosing quadtree leafsUsing the indexto find all objects which might intersect an area, line or point of interest find the quadtree leaf enclosing the object of interest starting at this point follow up the quadtree through all branching points that contain the original cell and down the quadtree to all branching points and leafs below the cellexample: the area of interest is enclosed in leaf 31 of the original example quadtree the objects which may intersect the area of interest are those in leaf 31 and all leafs above it thus, these are 3 and the null leaf objects in other (remote) leafs cannot intersect the area of interest, so need not be checkedexample: the area of interest is enclosed in leaf 0 the objects which may intersect the area are in leaf 0, the null leaf and all leafs below 0 - 00, 01, 02, 03 there may be other leafs below these as well such as 010, 011, 012, 013, etc 43
44.
Generalizationsquadtree indexing is most effective for small objects,particularly points large objects tend to require largeenclosing leafs even though they may not fill much of thespace (i.e. highway corridors) these objects will alwaysneed to be checked for intersection it may pay to subdivideobjects so that the pieces fall entirely within smaller leafsindexing in this way is intuitively more efficient thanindexing by x or y alone since the quadtree index iseffectively two-dimensional the divisions at each branchingneed not be equal in size it may pay to have some blocksof smaller area and some of larger area, rather than fourequal squares at each branching however, for general efficiency the blocks should be rectangular G. R-TREE INDEXESR-tree indexes are a response to the problem of indexinglarge areas R stands for "range", a concept similar to MER Methodfind two, possibly overlapping, rectangles (aligned with x: and y axes) such thatas many objects as possible are wholly within one or the other rectanglethere are roughly equal numbers of objects wholly enclosed in each rectangle the overlap between the rectangles is minimum 44
45.
indexing is determined by the rectangle in which the object is contained objects which are wholly within a rectangle are associated with that rectangleobjects which are not wholly within either of the two rectangles are associated with the undivided mapapply the procedure recursively, finding two new smaller rectangles within each existing rectangle this creates a tree structure similar to the quadtree every object is associated with some node in the tree: to find the objects which might intersect a given area of interestfind the smallest rectangle used in the indexing procedure which wholly encloses the area of interestthe objects are those associated with this rectangle and all nodes above and below it in the tree Problemalthough benchmark tests have shown that R-trees are generally more efficient than quadtrees and simple 1-D sorts, they are computationally intensive to construct 45
47.
Network Analysis• Much of the economic and social activity of the world is organised into networks.• The form, capacity and efficiency of these networks have a substantial impact on ourstandard of living and affect perception of the world around us.• Networks also exist in the physical world, e.g. networks of streams and rivers.What Is Network?• rail network (KCR, MTR)• road and highway network (KMB) 0electricity network (CLP) 0telephone network(CWHKT)• air transportation network (Cathay Pacific)• street network (Emergency Services, Police Department, etc.) 47
48.
Questions That Require Use of Network• What is the best route from a location to a given destination?• Where should I locate a service centre?• Which centre serves a particular location?• How accessible is a location to other locations?• How many trips will be generated between origins and destinations?•Given street addresses, how can I map occurrence of given events on a street map?• A network can be represented digitally by nodes and links. •Nodes represent intersections, interchanges and confluence points. •Links represent transportation facility segments between nodes. Network Data Structure 48
Be the first to comment