An Automated Digital Elevation Extraction System

An Automated Digital Elevation Extraction System

Ram Srinivasan

1. Abstract
A digital system is presented that automates the data from a pair of overlapping aerial images for use in digital mapping. The digitized images are registered using the camera middles to eliminate platform induced distortions and obtain a stereo pair. An automated process can then be initiated to correlate between corresponding points in the stereo pair at any desired regularly spaced interval. An intelligent process has been developed to overcome the difficulties in false correlations and to minimize the computational burden by adapting to the terrain. The correlation process provides the disparity, or the parallax information which is then converted to true elevation data and ground location using the absolute orientation parameters. In regions where a more dense grid of elevations are needed the operator identifies them by delineating regions where a more dense grid of elevations are needed, the operator identifies them by delineating regions-of-interest interactively in graphics mode. The automated elevation extraction process can also be used to supplement data obtained by traditional means such as stereo plotters. The extracted data is then interpolated to a higher resolution grid and edited manually or automatically digital filtering operations. Elevation data can be used for many purposes such as producing contour elevation maps, ortho rectified images and perspective scenes for simulation.

2. Background
Obtaining elevation data from multiple aerial images is an important step in many applications such as ortho image generation, map making, geographic information systems, and mission planning. Currently prevalent methods are mostly analog in nature. Flights are planned to acquire overlapping photographs covering areas of interest. Typically there is a 60% overlap between adjacent photographs. The diapositives made from photo negatives are then placed in a stereo plotter, and a human operation of the diapositives until the images are seen in stereo through the viewer. At this stage the necessary camera model, calibration and other orientation information have been obtained that will enable computation of elevation values from a knowledge of the measured parallax or disparity between conjugate points in the two diapositives at various locations. The operator positions a floating mark or dot on the ground viewed in three dimensional space. Through optical/mechanical means the operator can move the dot up or down relative to the surface of the stereo model. The single dot essentially represents a combination of two of them, one for each photograph. When the dots appear to merge, and a single 3D dot is placed on the ground, the different in X locations of the two dots in each respective depictive represents the disparity or the parallax between the conjugate points. As the dots are moved in the vertical dimension to match the surface of the stereo model the plotter then obtains the parallax information at equally spaced intervals, converts the measured parallax values to elevation data and writes it to a file along with the corresponding X/Y ground location. Typically, elevations are obtained along vertical lines called profiles. Elevations can also be obtained as contours. The obtained elevation data or elevation model is then fitted to a regularly spaced grid. Elevation data can be obtained from the U.S. Geological Survey or the Defense Mapping Agency of the US in gridded form. Two types of data are available called the DEM (Digital Elevation Model) and DTED ( digital Terrain Elevation Data), each of them meeting different accuracies. The gridded elevation data is then used for various purposes, important among them being ortho rectified photograph production. Sometimes it sis also important to edit the elevation data, particularly for quality control of elevation data that is widely distributed.

Besides aerial photographs digital satellite stereo images such as from SPOT are being used increasingly. The usage of satellite digital data has been more prevalent in the developing countries, primarily because the developing countries are leap frogging with the technological advances taking place. Though the article is primarily centered around aerial images, the system is equally applicable for satellite images with modifications in the mathematical modeling process.

3 Digital System – Advantages
Proposed here is a digital system for automated elevation extraction from aerial images. The obtained photographs are assumed to be scanned by high resolution scanner into digital images. A digital system offers a variety of advantages over conventional processing using photographs. Images taken under poor lighting, haze, sun illumination angle effects and other similar problems can be easily corrected in a digital system. Integration of images taken at different times, cameras, positions are easily achieved in he digital domain for purposes of comparison and utilization of different sensors to maximum advantage. Archival, reproduction of results, consistency in output product quality, quality control, throughput, information management and updating records are easily done in the digital domain. Automation also becomes an easier task thus minimizing valuable human effort. A extensive wealth of knowledge is available in processing digital images which can be put to use.

4. Initial Registration
The digitalized images are first enhanced if needed. Identifying the fiducially in the images space and making a correspondence with the camera calibration information yields a user coordinate system (UCS). Relates the sample/line in the digital image to the film coordinates. This step also yields the principal point in the two images, the origin in the film coordinate space. Features are displaced in the X dimension between the two images due to elevations of images features. They can also be displaced in the Y dimension primarily because the photographs may not be truly vertical. Differences in the flying height for the two images can also cause Y displacements. This situation is generally rare. In the stereo. Similarly the same can be done in the digital domain for viewing the images in anaglyphic stereo or through the stereo viewer. The Y-displacement of features can be very significant even for very small tilt angles. Though the automated correlation process can account for this problem to a certain extent, the Y-parallax is better removed (may not be completely) either by aligning the base axis of each image with the X-axis or more precisely by suing the camera model position and orientation information to compensate for the platform distortions. The correction of the left and right images to true vertical images can also be dept in just a mathematical form and no vertical images are created as such. For ease of discussions below let us assume that two new images are created as such. For ease of discussions below let us assume that two new images are created to form a stereo pair. The new vertical images will hence be called the right and left images.

5. Automated Correlation – An Introduction
When the operator uses a stereo plotter to look at a pair of photographs and keep a moving floating mark on the ground he/she is performing a manual matching to obtain corresponding pair of points in the stereo pair. The matching however is done with a lot of intelligence and an understanding of the terrain and the features. Without such a knowledge and positioning the stereo. The operator faces difficulties when the dot is moved across a desert or a forest where there are no distinct features to match. Matching becomes relatively easier in such areas when the surrounding terrain is identifiable or when there are unique features within the forest, say a forest outpost.

Mathematical matching processes have been implemented by various researchers in an attempt to replace the human matching process performed through the stereo viewer. The overlapping images are first relatively registered so that a stereo pair results. This eliminates the parallax in a direction perpendicular to the flight line ( the Y-parallax). Now the features are displaced only in the X direction ( along the flight line) as a result of their heights. Consequently epipolar lines are found at the same Y locations in both images of the pair. The relative registration can also be kept in mathematical form instead of performing a correction on the digitized images. In such a case a mathematical equation is used to locate epipolar line.

5.1 1D vs 2D matching
Mathematical matching processes involve finding a best match in the right image for a given point, area or feature in the left image. The best match is obtained by finding a minimum or maximum of a quantitative measure such as a correlation coefficient, mean square error (MSE) and mean absolute error (MAE). For a candidate pixel in the left image the matching pixel is displaced from the corresponding location in the right image, the displacement proportional to the height of the feature images by the pixel. Since individual pixels are hard to match a group of pixels centered around the candidate pixel is taken. This is called the candidate area. The area to be searched in the right image to find the best matching area is called the search area. The size of the search area is dependent on the maximum possible displacement. The candidate area and the search area can be one dimensional since the displacement due to the heights is only along the X direction on the epipolar line. Though processing one dimensional areas are easier in computation and hence the throughput, matching becomes a more difficult task. Since actual features are three dimensional and images in two dimensions the reliability of a one dimensional match is lesser when compared to match between two dimensional areas. Imaging an operator positioning the floating mark seeing only two lines from the stereo pair. In addition, practically it is very hard to have no Y-parallax. The possibility of no reliable match that meets some criteria leads to elevations not available for a given location . A worse situation is the possibility of false matches which lead to erroneous elevation values that can cause bigger problems when the elevation values are interpolated to a finer grid. Though these possibilities still hold for two dimensional matches the complexity of the problem is much lesser. for this reason two dimensional matching is adopted.

5.2. Block Matching Method
Automating the cross correlation process is typically done by a block matching approach. For a candidate block from the left image a best match is found by searching in neighborhood of the corresponding area from the right image. Though intelligence can be added in the search process we will. EXAMINE A BRUTE FORE SEARCH IN TWO DIMENSIONS. a Small candidate area, an NxN block of pixles, is chosen on the left image centered around a specific the corresponding location in the right image a search area is chosen of size MxM where m=N+2* p where p is the maximum possible absolute parallax. Though a square block size is discussed , the blocks can be rectangular to take advantage of the extent of the residual Y-parallax. The candidate block is compared against a same sized block from the search area and a quantitative measure ( correlation coefficient, mean square error, mean of absolute error) of matching obtained. The process is repeated at all possible (2p+1)x(2p+1) locations on which the block from the search area can be centered. A best matching block is regarded as one that has the maximum correlation coefficient or minimum mean square error (MSE). The center of best matching block from the search area ( right image) is considered to be the match point for the center of the candidate block (left image). The parallax is computed from the locations of the candidate point and the match point. The camera models of the two images and the parallax gives the ground location and the elevation at that point. Elevations have to be computed in a similar manner for all desired points. This methods is computationally heavy. A large number ( 2p+1) x(2p+1)) of correlation coefficients have to be computed and maximum among them found for every elevation value.

5.3 The Correlation Surface Method
The proposed method uses the cross correlation surface created using basic ED-FFT processes to obtain the matching point. Once the images are initially registered and a stereo pair obtained, corresponding matching points are only displaced mostly in the X direction due to elevation. However for reliability as explained earlier two dimensional areas are considered. An NxN candidate block is chosen in the left image and another NxN block centered at the same location is picked in the right image. The two block are transformed by two dimensional Fourier transforms (2D-FFT), the transformations multiplied in the frequency domain and the result is inverse transformed to obtain a two dimensional cross correlation surface. The cross correlation surface values indicate the extent of matching to the candidate block at each possible shift location in the right image. The peak of the correlation surface indicates the location of the best match. A zero parallax is indicated by a peak at the center of the cross correlation surface which is the null shift location. The location of the peak from the null shift location indicates the parallax or the displacement vector. If the images are in perfect stereo pair there will be no shift in Y but only in X. From the X-displacement value are obtained. By this method the maximum detected pixel displacement can only be half the size of the block i.e. N/2. It is hence important that the block size be bigger than twice the maximum possible displacement.

The primary differences between the correlation surface method and the block matching (BM) method discussed earlier are as follows. The BM method performs computations in the spatial domain while the correlation surface method performs operations mostly through frequency domain. In the BM method the candidate block can be of any size. The maximum possible X-parallax influences only the size of the search area in the right image. Several correlation coefficients are computed for each possible shift location and the maximum among them is found . In the correlation surface method the candidate block from the left image and the corresponding block from the right image have the same dimensions. Both have to be at least of size 2p in the X dimension to detect the maximum possible parallax, p. We will discuss later how this can be avoided. In the new method only one correlation surface ( not coefficient) is computed from basic principles. Conceptually both methods are very similar.

5.3.1 False Mathces
The use of correlation surfaces or coefficients poses problems. False matches result as high cross correlation values where there is no suitable match at all. This can happen if the area under consideration is over structure less areas such as water, clouds, homogenous features etc. The quality and reliability in results from cross correlation based methods should hence be primarily based on how effective the method is to detect false correlations so that elevations computed from such matches are dropped. Studies have been conducted on a variety of image to develop a set of tests fro analysing the cross correlation surface. Currently about two dozen tests are conducted on each cross correlation surface. These tests are designed to eliminate dark areas like water, bright areas like clouds and other structure less area like a patch over a corn field. The tests are conducted both on the normalized and the normalized cross correlations surfaces. In general a good cross correlation surface is assumed to have a Gaussian shape around the peak. The tests are designed to check for the level of the peak and the fall off rate from the peak. The user decides how many of these tests should pass before the location of the peak is considered to be a valid match point. A rigid criterion drops several possibly matching points while a very relaxed criterion introduces too many unreliable ones. An optimum results when roughly 75% of the tests are required to pass. These tests add a considerably level of intelligence to the cross correlation method. Numerous experiments have been conducted and the method has been found to successfully reduce if not totally eliminated false matches.

5.3.2 Terrain Adaption
It was discussed earlier that the block sizes from the stereo pair need to be big enough such that the maximum possible parallax can be found. This means that if the maximum parallax is 32 pixels then the block size be at least 64x64, preferably 128x128. Computing 128x128 sized cross correlation surfaces for each desired elevation point can be very computationally intensive. Intelligence has also been added at this stage to reduce the computationally intensive. Intelligence has also been added at this stage to reduce the computational burden. The primary assumption and a practically valid one is that the parallax ( or the elevation) at any given point is not significantly different from the parallax for a neighbouring point. Under this assumption the chosen block from the right image is centered around the predicted match location. The predicted location is based on the parallax of the previous neighboring point computed. The distance between the predicted match point and the true match point is primarily based on how rapidly the terrain has changed from the previously computed location to the current location of interest.

The prediction process is designed to adapt the block size to the terrain so that the computational load is minimized substantially. The process works as follows. The block size is allowed to vary within a range. A cross correlation surface is obtained for the minimum desired block size. The tests are conducted and if a desired number of them pass, a valid match point is assumed to have been obtained. If not the block size is increased by a factor of 2 and the process is repeated once more. The block size is made to increase until a valid match point is obtained ( satisfying all desired criteria) or the maximum allowed block size is reached. If a valid match point is not obtained for the current location there is no predicted value for the next location to be computed. In such a case the center location of the block from the left image is also the center location for the block from the right image. If a match is obtained, computation for the next location starts off centered around the predicted location. The process continues as before. In this manner, most of the match points are obtained with very small block sizes as opposed to performing computations always on the biggest required block size. In a ideal case of the terrain being of constant elevation and has some level of uniqueness around each desired location all match points ( except probably the first one ) will be obtained with the minimum block size. For a constant block size, the prediction increases the maximum parallax that can be detected. The prediction process has been found to increase the throughput over a method. with no prediction by about 10 times for a typical image.

The frequency domain approach presented can be improved in several ways. A hierarchical process can be included where matching is performed at different frequency ranges so that different levels of features are correlated, gross ones at low frequencies and sharp ones at higher frequencies. In this manner noisy data leading to false matches can also be handled.

5.3.3 ROI Processing
Besides adapting the block size to the terrain it might also be required to adapt the spacing of obtained match points to the terrain. In other words widely spaced points are needed in flat areas while more frequently spaced ones are needed in rapidly changing portions of the image. The ensures that an optimum number of elevations are available to fit a finer grid for the entire overlapping area. If a constant spacing is chosen to suit the most rapidly changing terrain it results in unnecessary computational effort for the flat area , very few points are obtained in the rapidly changing areas thus making the interpolation process to a finer grid inaccurate. Though determination of spacing also can be automated it is better done with the user defining the special regions of interest. By default elevations are computed at some average spacing for the entire image. In addition, the user delineates areas of particular interest by graphically drawing regions-of interest (ROI) on the left image. For each ROI a desired spacing is also given. The ROIs are identified by the user as polygon. An intelligent method fits an optimum number of square areas that encompass the polygon and obtains elevations within those areas. Processing within ROI at in increased resolution is of greater interest when the automated correlation method supplements data obtained from a stereo plotter so that the quality of the products which use the elevation data can be improve.

6. Editing
The elevation values obtained by the automated process can have defects such as holes where no elevation values could be obtained and peaks due to false matches which were undetected. These defects can be automatically edited by digital filters to remove spikes, fill holes and smooth the data. As an additional step to improve quality, the data can be manually edited. Manual editing is also needed for data obtained manually from a stereo plotter. A manual digital editing system has also been developed. This system can filter s specific line or area suing a user defined filter. It can be used to input a specific elevation value using a map, or set a user specified elevation to a arbitrary location or an area. The editing system also allows the user to obtain secondary products that help in the editing process such as slope, shading aspect, contours and perspective images.

8. Conclusions
The correlation surface method of extracting elevations presented in this paper has been tested to be consistent in it quality. Improvements are being made in the adaptivity, accuracy and throughput. More experiments are being conducted to assess the accuracy and throughput of the elevation values obtained. A higher accuracy standard product is intended to be produced of quality better than currently available DEM ( Digital Elevation Model) or DTED ( Digital Terrain Elevation Data in the U.S.

7 Results
The automated correlation process has been implemented on a high speed floating processor , the BITE (Bulk Image Transformation Engine). The BITE has a local hard disk for embedded applications code and to store data, 16 Mbytes of image memory, and about 2 Mbytes of other memory for use by application programs written in C. The image viewing and user interface for graphics ROI delineation is done on the 1024x1024 IVAS (Interactive Viewing and Analysis Station) display processor. The paper gives preliminary results from a system being developed. Currently it takes about 5 minutes to produce a DTED image (3 arc second resolution in a 1 degree block) with a 10:1 interpolation factor.

Two 9”x9” aerial photographs were scanned using an Optronics scanner at 50 micron resolution ( 20 pixels/mm) and the digital images were obtained. The scale of the photographs were roughly 1:35000. The images had a 60% overlap in the area covered. A simple initial registration process was chosen for quick registration but was not successfully in eliminating the Y-parallax. Fiducial identification in the image space with the camera calibration information provided a scanner UCS to transform sample/line to film coordinates. The principal points were identified. Te conjugate points were first approximately identified by examining the images. The tow images were then rotated by suitable amounts so that the base line ( the line joining the principal point and conjugate point within an image) of each image was aligned with the X-axis. This process reduced the Y-parallax but a significant amount was left at the edges of the image.

Figure 1 shows a center 1024x1024 section of the left image after initial registration. Elevation values were obtained for every 16 pixels in the X and Y directions. Elevation values are not obtained for pixels along the border. Figure 2 shows the 64x64 parallax image created for the section seen in Fig.1. The automated process could not obtain parallax information that satisfied the specified criteria at the locations shown by the black pixels. The parallax is proportional to the elevation values. Viewing the stereo pair through a stereo viewer also confirmed the topography represented in Fig 2. Figure 3 shows the result of automated editing of data seen in Fig.2 The image shown is the result of applying a median filter which successfully removes the spikes and fills in the holes. If needed the data can also be smoothed by appropriate filters at this stage. The low resolution data is then interpolated to obtain a parallax/elevation value for any desired ground resolution. Figure 4 shows a perspective of the area seen in Fig 1. generated suing the elevation data from the automatic process. The filtered parallax/elevation image for almost the entire overlapping area between the two photographs is shown in Fig.5. This again represents and elevation every 16th Pixel. Due to the significant residual Y parallax discussed earlier, the prediction process was not very successful at the edges of this image. This will be corrected if the initial registration is performed using the two camera models. Another perspective image (1024x1024) representing a different area is shown in Fig. 6.

Figure 1 A section (1024*1024) from the left image

Figure 2 Parallax image representing elevations for every 16th pixel of area seen in Fig. 1

Figure 3 Result of digital filtering data shown in Fig. 2

Figure 4 A perspective of area seen in Fig. 1 obtained using elevation data from Fig. 2

Figure 5 Elevation data for entire overlap area obtained every 16th pixel

Figure 6 Perspective of a section using data shown in Fig. 5