GISdevelopment.net ---> AARS ---> ACRS 1991 ---> Mapping from Space

Detecting clouds using Neural Networks and generating cloud free Mosaics

T. Hosomura, P.K.M.M Pallewatta
Computer Science Division
Asian Institute of Technology
Bangkok, Thailand


Abstract
Certain areas of the earth's surface are constantly covered by clouds during most of the time of the year. Obtaining cloud free images of such areas is an extremely difficult task. Neural networks can be meaningfully used as classifiers in situations where the data to be classified is of non parametric nature. This will be the situation encountered when we consider clouds as one class and all the clear sky areas as another class. Traditional methods for cloud detection such as automatic thresholding techniques result in poor accuracies over high albedo surfaces such as snow, due to the similarities of spectral signatures, and incorporating texture features can improve the classification accuracy in these situations. It is expected that these methods will result in cloud detection accuracies of over 90% over high albedo surfaces. This paper discusses the application of the above techniques for detecting cloud contamination of 50m resolution MOS-1 data. The methods of obtaining cloud free mosaic images are also discussed.

Introduction
Obtaining cloud free images of the Earth is of prime importance in Remote Sensing. But unfortunately some areas of the earth are constantly covered by clouds obtaining cloud fee images of such areas is a difficult task. In this study an effort is been made to develop image processing software to obtain cloud free mosaic images of such areas using daylight images obtained at different times. To generate mosaic cloud free images the following work should be accomplished.
  1. Detect clouds and shadows in several images of the same area taken at different times.
  2. Forming a cloud mask.
  3. Registration of the images
  4. Intensity matching of the segments of the mosaic.
  5. Constructing the mosaic.
Emphasis has been made on the fact that this system should be able to process all types of images, including of snow covered areas, where cloud detection using only spectral bands is an extremely difficult task, due to the similarities of spectral signatures in all bands. In such situations, texture features should be used for cloud detection, and in areas where it is possible to detect clouds without texture features they should be dropped, since calculating texture features is computationally complex. Or alternatively texture features could be combined with spectral features for increased accuracy.

In a series of related works IEEE et a. (2) have used a back propagation neural network to classify cloud fields using texture measured derived from a single visible channel of Landsat MSS imagery. WELCH et al. (4) have has investigated the cloud and surface texture features in polar regions. In this study Grey Level Difference Vector (GLDV) and sum and Difference Histogram Approach (SDHA) texture features have been investigated. These texture features are simpler to compute than the traditional Grey Level Co-occurrence matrix texture features but result in similar accuracies in detecting cloud fields (4). A comprehensive analysis of GLCM, GLDV and SDHA texture features are given in (1), (2), (4).

Detecting Clouds an Shadows
In this study a neural network classifier has been selected be cause it is capable of forming disjoint and complex decision regions in feature space and does not make any assumptions about the distributions of the under laying classes. This will be very useful when we consider cloud and shadows as one class, and the rest as another class, since the clouds are in the bright end of visible and IR bands and the shadows are in the other extreme. If is necessary to employ an IR band in order to distinguish between water and cloud because especially where there is sun glint in the visible band they tend to have similar spectral signatures.

By the nature of the classification problem to be solved it can be seen than the problem is not linearly separable. It is a widely known fact that the delta rule perceptions are unable to solve this type of problems. So it was decided to build the classifier based on the Generalized Delta Rule Multilayer perceptions.

Since our original scenes did not include such high albedo surfaces as snow it was decided to limit the input information to two bands of spectral data only. The choices were band 2 and band 4, since these bands showed the highest contrast upon visible inspection. Keeping these constraints and limitations in mind the first multilayer perception with sigmoid output function was constructed for cloud/shadow detection. This multilayer perception had 2 input nodes, 8 nodes in first hidden layer, 2 hidden nodes in second hidden layer and one output node. All the neural network simulations were carried pout using the software package provided with (3). In training the neural network the training data was obtained form obviously cloud covered areas, from areas covers by shadows created by the clouds and from different types of land areas and water bodies. The data thus obtained was fed to the neural network and the total error measured in the network was minimized.

The learning rate of the network was initially set to 0.3, but it was observed that with real data the this learning rte led to oscillatory of the network total error. So the learning rate was reduced to 0.1, but with some data when no decaying oscillatory behavior was observed the learning rate was further reduced to 0.01.

The main problem encountered here was that the neutral network was unable to train itself when given real data from the train given areas. The first step towards the solutions the problem was to get the data from the training area via a 3 x 3 pixel averaging filter. The well known law of large numbers in statistics show us that this reduces the variability of the data and hence the variance. So the training data thus collected would not be truly lead to poor classification accuracies. It must be mentioned that even with the averaging filter the neural network the neural network unable train itself given real data taken from the images.

Another phenomenon observed was that the neural network was able to training itself with a small set of training data. But such arbitrary selection of training data would lead to poor classification accuracies since they are hardly representative of the dispersion of the data in the features space. So the first neural network, model consisting of two inputs nodes 8 nodes in the first hidden layer 2 nodes in the second hidden layer and 1 node in the output layer was trained with a small set of data obtained by having the mean vector of each training area to represent each class. But as mentioned previously this data did not represent any information about the dispersion in the feature space and resulted in poor accuracies.

As it was seen that the neural network classifiers were unable to train themselves when given substantial real data from images the attention was shifted to the capacity of neural networks. It was suspected that the number of training pattern were exceeding the capacity of the network. The statistical capacity of an adaptive linear network (i.e. the average number of pattern that an adaline can learn) is found to be equal to twice the number of weights in the network. The deterministic capacity of a single layer adaptive linear network Cd (i.e. the number f patterns that can be learned with a solution guaranteed) is found to be equal to the number of weights in the network (5).

Little is known about the number of patterns that layered sigmoid networks can learn to classify correctly. A good approximation of the deterministic capacity of such a network is the number of weights Nw divided by the number of output nodes Ny (5).

The current problem is to find a method of training the network, yet preserving a substantial amount of training data, and we should also try to find the reasons for such phenomenon taking place while training with real data.

Keeping the above estimates in mind a subset of training data was selected consisting of 28 pixels covering approximately all classes under consideration. These pixels were selected in such a way so that the neural net work would be able to form decision regions in feature space using these pixels values (pixel in the boundary).

The neural network was trained with this training data and after approximately 13000 iterations over the whole data set the network converged. The critical value of error was set to 0.09 and the learning rate was fixed at 0.1. The cloud / shadow detection accuracy of the network is given in the experimental analysis section.

At this point the attention was shifted to extracting more features from the images and an effort was made to extract texture features from the images. The Gray Level Co-occurrence Matrix method of (HARLICK et al. 1973) (1) was chosen for preliminary texture analysis and three second order statistics were calculated from the co-occurrence matrices. These are angular second moment (ASM), contrast (CON) and inverse difference moment (IDM). In the above analysis the texture features were calculated of individual areas of known classes. No specific window size was observed and this study of texture features was conducted as a preliminary investigation in to the possibility of using texture features for cloud and shadow detection. The GLCM texture measures are described in (1). The GLCM texture measures were investigated because it has been the basis for to the texture measures such as GLDV and SDHA. The results of texture analysis are discussed in the experimental analysis ate discussed in the experimental analysis section which follows.

Experimental Analysis
An accuracy assessment of the neural network classifier was carried out, under the assumption that visual identification is 100% correct. This assessment shows that bright thick clouds and dark shadows can be identified with high accuracy but thin light, clouds and cumulus cannot be identified with such accuracy. It is also seen that some water bodies are also misclassified as clouds. All this shortcomings can be attributed to the poor training of the classifier.

Table 3.1 Accuracy analysis of the classifier

Table 3.2 Texture features computed using GLCM

It is observed that using only the above features it is possible to discriminate between call the above classes investigated. It is also observed that areas covered by shadows are very homogenous as some bright clouds although their mean values differ.

Further work to be done
The possibility of employing texture features for cloud detection should be investigated, since employing only features will make decision boundaries dependent on the shade of training data and finding suitable data is a problem. The effect of this is evident as reduced accuracy over thin clouds and light shadows. Also the spectral signatures of cloud and high albedo surfaces such as snow are quite similar in all visible and infrared bands. Since the GLCM texture features are complex to calculate, some simple to calculate texture features should be investigated (GLDV or SDHA) which are capable of discriminate between surface and cloud. Such measures should be incorporated in a suitable neural network classifier. Experiments should be carried out using data of cloud and snow covered areas.

No work has been done yet on registering the images and on generating mosaic images. The possibility of using an automatic registration algorithms should be investigated. It is also necessary to develop intensity matching algorithms to improve the quality of mosaic images.

The authors wish to thank Mr. Kazuo Joko, Director for Co-research and Coordination in Bangkok, National Space Agency of Japan and the National Research Council of Thailand for providing the raw images and for the assistance provided.

References
  • HARALICK ROBERT M., SHANMUGAM K., DINSTAIN ITS'HAK (1973), IEEE Transactions on Systems man and Cybernetics, Vol. SMC-3, pp. 610-621.
  • LEE J., WEGER R.C., SENGUPTA S.K., WELCH R.M. (1990) A Neural Network approach to cloud classification, IEEE Transactions on Geosceince and Remote Sensing, Vol. 28, No. 5 pp., 846-855.
  • McCELLAND James L. And RUMMELHART David E. (1988), Explorations in parallel distributed processing, MIT Press, Cambridge, MA.
  • WELCH R.M., KUO KWO-SEN, SENGUPTA S,K. (1990), Cloud and surface textural features in polar regions , IEEE Transactions on Geosceince and Remote Sensing, Vol 28, No. 4, pp 520-528.
  • WIDROW BERNARD, LEHR MICHAEL A. (1990), 30 years of adaptive neural networks : Perception, Madaline, and Back propagation. Processing of the IEEE, Vol. 78, No. 9, pp.1415-1442.