Development of a NOAA image database with feature-based retrieval functions

Development of a NOAA image database with feature-based retrieval functions

Changming Zhou and Mikio Takagi
Institute of Industrial Science
University of Tokyo, Tokyo, Japan

Abstract
In this paper, NOAA-AVHRR image database system which is being developed in our laboratory is presented. Some new image retrieval approaches which are based on the image features are included in this system. After one scene of NOAA image is received, an automatic classification method proposed in this paper is applied to a 2048 pixels x 1960 liner region, which usually arouses users' interest, around Japan. The classified image, them is processed with region-labeling and boundary-following method. Each labeled region is represented using chain codes and stored in the system. The spatial relations between regions in the same scene are described by a syntactic pattern recognition approach, i.e., an edge-labeled directed node-label controlled graph (ed NLC-graph) based on the center-of-mass of each region in addition to a men-driven user interface, the system provides its users with two kinds of guide images, and users may use a mouse tool to specify the objective region and conditions of image contents (e.g., with or without clouds, the shapes of clouds etc.) on a display instrument, and retrieve images in the following to steps; (1) global retrieval based on the spatial relation between regions represented by ed NLC-graph, and (2) similarity retrieval based on the geometric properties of the dominant regions represented by chain codes. In addition, in order to improve the retrieval speed, and since information about cloud-covered regions is the main retrieval clue usually given by users, bit-based operations functions of a general-purpose image. Processor are applied to an index image generated from original images to do pre-selection of the candidate images before the above two-step retrieval processing.

Introduction
In most of the remotely sensed image database systems developed up to now, images are recalled mainly by the attribute information attached to the archived images such as image identities, sensor name etc. Nowadays, however because remotely sensed images are integrated into geographic information systems (GIS) together with various maps and other data, more effective retrieval approaches, for instance retrieval methods based on the image contents, are required. Development of such kind of retrieval methods becomes the key subject of image database and GIS.

In the case of NOAA images, considerable receiving, archiving and procession systems have been developed in many ground stations (1) (3) There exit a few systems that distribute a kind of abstract images called quick-look and provide users with visual inspection after images are received [1][4]. In [1] raw images are classified into a few classes such as land, sea etc. denoted as hatched patterns with a simple classification method based on some thresholds obtained empirically, and in [4], 10-bit original images are reduced into 8-bit images with size of 512 x 480 pixels, moreover, the reduced images are transformed into dither images with 64 levels using a general-purpose image processor and delivered to other universities and research organizations immediately after images are received via facsimile. In all he systems mentioned above, images can be recalled by using actuations time and some geographic parameters (longitude, latitude etc.), however retrieval approaches based on image features are not offered.

NOAA-AVHRR (Advanced Very High Resolution Radiometer) images are widely using in many fields. About 4-- 8 scenes can be received one day from two NOAA satellites (at present, NOAA-10 and NOAA-11 are available). Resembling to other remotely sensed data, NOAA-AVHRR images possess the characteristics of wide coverage, frequent observation, vast quantities, and are utilized for monitoring and observing the environment of the Earth. Consequently, users usually access those images that are received under some special conditions and possess some features. This kind of access can not be realized only by those attribute data of images managed by conventional database management system, e.g. the relational database management system.

The system presented in this paper is developed mainly to provide users with not only conventional retrieval approaches but also those based on image features. Because processing such as geometric distortion correction and sensor calibration of NOAA-AVHRR image is very time-consuming and space-occupying, only raw NOAA data are traded in this system. Structural and syntactic pattern recognition methods, iconic indexing and similarity retrieval approaches are introduced into this system for feature extraction, description, representation and image feature retrieval of NOAA-AVHRR images.

Feature Extraction of NOAA-AVHRR Images
In order to extract features of NOAA-AVHRR images, first of all we must classify the raw images. Classification algorithm to be applied in this system must be fast, and absolute accuracy is less crucial because, essentially, the classified images are only required to stand up to global feature description. An automatic classification method has been proposed in [1][2], but is mainly utilizes the absolute brightness temperature thresholds obtained empirically, and can not be applied to this system, because we use uncalibrated data and the temperature thresholds is inapplicable to our case due to the regional difference. Because of the huge quantities of NOAA-AVHRR images, the classification techniques proposed so far which is fundamentally based on the pixel-by-pixel processing are not adaptive in this case. We, therefore, select the histogram-based approaches to classify NOAA-AVHRR images. Many approaches based on histograms are proposed for the image binarization, even if most of them can be expanded to multivalue quantization, the class number must be specified previously. For NOAA-AVHRR images, however, class number is not invariant because the variation of images in different weather conditions. Consequently, a peak detection technique is selected to determine class number and thresholds.

Peak Detection Method Based on the Histograms
The peak detection technique proposed in [5][6] uses the image cumulative distribution function (cdf) to locate the peaks of the histogram. The peaks are located using the zero-crossing and local extrema of a peak detection signal generated from the cdf. For an image represented by M gray levels, the cdf c (n) can be derived from the gray-level histogram, and from c (n) a new function cN (n) can be obtained by [1].

Eq.(1)
Where, Ä means convolution operation and a uniform rectangular window wn is defined as in (2).

wN(m) = 1/N, - (N-1)/2 < m < (N - 1)/2 .......................(2)
then a peak detection signal rN (n) can be defined as in (3).

Eq.(3)
The following principles are applied to the detection signal rN to estimate the start, maximum and end points of the peaks: (1) A zero-crossing of the detection signal to negative values indicates the start of a peak, and denoted by s_i, for the ith one. A zero-crossing of the detection signal to positive values following a negative crossover estimates the gray level at which the peak attains its maximum, and this gray level is denoted by m1. Similarly, S_i + 1 and m_i +1 can be obtained (2) The gray level between two successive negative crossovers at which the detection signal attains its local maximum is defined to be the end points of the peak For the it peak, this peak, this gray level is denoted by e_i. One peak will be represented by such three parameters (s_i, m_i, e_i) later in this paper.

Obviously, the sensibility of the above peak detection signal depends on the parameters N in (2), which is referred to as peak-detection parameter. When this technique is applied to real image histograms, e.g. NOAA-AVHRR image histograms, it is very difficult to determine the value of N because of the variation of detected peaks b using different N. We, therefore, propose an adjustment method to derive the optimal number of peaks no matter what the parameter N is specified. The adjustment method used by us utilizes the square of Fisher distance (FD²) shown in (4) and Maharanobis generalized distance (MD) to check two successive peaks under the hypothesis of Gaussian-like distribution (i.e. bimodality check).

FD² = n(m₁ - m₂)² / (n₁ s²₁ + n₂ s²₂ .......................(4)
Where, n₁, n₂ are the sample numbers of two distributions, respectively. Correspondently, m₁, m₂ and s₁.,s₂ are the means and standard deviations.

As pointed out in [7], FD² attains the maximum at which the point is forest from the center of a Gaussian distribution. We calculate FD2 and MD for two successive peaks denoted by (s_i, m_i, e_i) and (s_i+1, m_i+1, e_i+1), respectively. If FD2 attains the maximum outside (m_i, m_i+1) then we combine the two peaks into a cluster denoted by (s_i, max (m_i, m_i+1), e_i+1) and repeat this process until no peaks can be combined. MD is used to calculated the percentages of the two peaks as another adjustment criterion to avoid the noise and exclude much small peaks.
Classification of NOAA-AVHRR Images

Fig.1 The flowchart of classification
The flowchart of the classification processing is shown in Fig. 1 Firstly, gray-level histograms are generated after NOAA-AVHRR images are received. Secondly, in order to obtain optimal class number on the base of histograms, we apply the above mentioned peak detection method and the adjustment method to NOAA-AVHRR image histograms. Then thresholds are determined according to the parameters (si,mi,ei) of detected peaks by equation (5).

t_i = int[me_i + (1 - m) s_i + 1 ] ........................(5)
Where t_i, is the threshold, 0 £ m £ 1, and in [.] denotes the nearest-integer truncation operation. In this paper, we use m = 0.2 after studying the gray-level histograms of many NOAA-AVHRR images received in different seasons. Finally the raw images are classified with the thresholds using the rule base, which is formed by studying many examples, and categories determined in the following way.

Categories such as land, cloud, sea, sunglint used in [1] [2] are applicable to this system. After checking many image histograms, we find that two peaks corresponding to clouds usually appear in the histogram. Correspondently, we classify clouds into thick and thin ones, if possible. A few rules are applied to the finally obtained peaks to determine the corresponding relations between peaks and categories. Moreover, we utilize coastline pictures with the same geometric distortion as the received images, which is overlaied to quick-look image in [4], to locate land area even covered with clouds. Therefore, we can represent those regions like land areas occluded by thick or thin clouds with the description and representation methods described later in this paper. The classified results are iconic images with the size of 128 x120 pixels. Classification is done in two ways according to the acquisition time of the objective NOAA-AVHRR image. Image are ground into day and night-time ones based on the data of visible and near-infrared channels are informative or not.

In the case of day-time images, data of channel 1, 2, and 4 are used for classification. Peak detection method described above is applied to the data of channel 4, and in order to locate cloud-free continent area where are difficult to extract only by gray-level histograms, normalized difference vegetation index determined by (6)

VI = (ch.2 - ch.1) / (ch.2 + ch.1) ...............(6)
In calculated to distinguish cloud-free land from others (sea, cloud) with a threshold of 0.4 as in [2]. In the case of night-time images, only the data from channel based on the detected peaks in applicable to night-time images. Fig.2 shows an original NOAA-AVHRR image (ch.4) received at 14:00 of Auguist 5 1990, and its histogram is whoen in Fig 3. The peak detection signal function expressed by (3) in shown in Fis.4 in this case, the peak detection parameter N equals 77, the number of peaks is 8, and it is reduces to 3 after adjustment. These three peaks are (177, 389, 406), (615,702,766) and (767 and 767, by 1024.. The classified iconic image of Fig .2 which is obtained using the method described above, is illustrated in Fig. 5

Fig.2: An original NOAA-AVHRR image(ch.4)

Fig.3: The histogram of fig.2

Fig.4: The peak detectionsignal for fig.3

Fig.5: The classified iconic image of Fig.2

Description and representation methods for image features
In order to realize the desired retrieval approaches based on image features, it is necessary to describe and represent image features effectively. Some researches about content-based image retrieval approaches such as [8][9] have been executed so far. In this paper, since the shapes of regions and the spatial relations between re-gions are important clues for recalling images on the basis of features, two kinds of description methods (chain codes and edNLC-graph) corresponding to spatial relations and shapes, respectively, are applied to representing NOAA-AVHRR image features.

Description with Chain-Codes
Although many descriptors (e.g. chain-code, Fourier descriptor; Walsh descriptor and etc.) for the shapes of patterns are proposed up to now, chain-code is one of the most effective descriptions for shapes. As pointed out in [11], the chain-code, in a data be environment, is a better representations such as Fourier or Walsh descriptors.

The labeling and boundary following approaches for binary images are applied to the iconic images obtained form the original images, and the results are utilized to describe and represent the image features. Firstly, we do propagation labeling on the basis of marks assigned for different classes using eight-direction codes. Secondly, a recursive method for boundary following is applied to the labeled images. Finally, regions resulted from the above labeling and boundary following processing are represented by their boundary chain codes, the number of pixels within each region and the corresponding category of each region are also derived. In addition, the adjacency matrix of all regions described by (7) is generated for later graph description.

A=[a_ij], (i, j, = 1,2, …..,m) ..........................(7)
Where m denote the number of regions, and a_ij equals the number of pixels which are located on the common boundary between the two regions. If two regions are not neighboring , a_ij is assigned to zero.
Description with edNLC-graph
Spatial relations between regions are represented by ed NLC-graph (edge-labeled directed node-label controlled graph). Each region is considered as a node of edNLC graph, and the position of each node is represented by the coordinates of its center-of-mass which are obtained by averaging the z and y coordinates of all boundary points.

An extension form of edNLC-graph can be defined by (8) as in (10)

G=(V,E,S,G,Y ) ..................................(8)
Where: V is a finite, non-empty set of nodes, S is a finite, non-empty set of node labels, G is a finite, non-empty set of edge labels, E is a set of edges of the form (n,l,w), where n,w,Î, V,lÎG, Y: V®S is a node labeling function.

A set G may be considered as a family of non-symmetric binary relations. It means that there exist and edge label l^-1 for each edge label l such that edges ( n,l,w) and (w,l^-1,n) describe the same spatial relation between regions represented by nodes n and w.

Now , we introduce a relation of simple ordering £, on the set of edge labels G = ¡₁,.....,¡_n | ¡₁ £ ...... £ ¡_n as in [10], so as to constructed an unambiguous string representation of a graph. We use a set of edge labels describing spatial relations in a two dimensional space which is illustrated by Fig. 6 and ordered:.

P £ r £ s £ t £ u £ v £ x £ y

To represent NOAA-AVHRR image according to the coordinates of center-of-mass of each region.

In this paper, a characteristic description of a node n_k is defined as a sevenfold set: n_k, c,p,(i₁....i_r), o₁....0_q), (ir₁....ir_r), (or₁....or_q), where c is the category , and p is the pixel number of this node. r and q are the numbers of edges coming into and going out from this node, respectively. (i₁....i_r), (ir₁....ir_r) and (o₁....o_q), (or₁....or_q) are the indices and relations strings of the nodes coming into and going out from this node, respectively.

Fig.6: The ordered set of edge labels

Fig.7: The graph representation of fig.5

An algorithm of transformation of edNLC-graph into a form of characteristic description is briefly described as follows.

(1)Let an image consisting of n regions k₁, ....K_n, described by coordinates (x₁....y₁), ...., (x_n, y_n) be represented by a edNLC-graph G. A node va E G corresponding with a region ka is called a S-node, if:

Eq.(9)
(2)We start from the S-node of n₀ and we index it with 1. (3). We index all the nodes all the nodes which are adjacent to n₀ with the help of a relation £ in a set of labels of edges connecting the node n₀ with adjacent nodes by referring the above adjacent matrix according to an increasing order:i=2,...,k. (4). Next, we successively choose nodes which are indexed i=2,...,k and we index all the nodes which are adjacent to them and which have not been indexed up to this moment , and repeat this step for all the nodes.

The image shown in FIg.5 can be represented by an edNLC-graph shown in Fig.7, and the characteristic descriptions of the nodes with over 20 pixels in he classified images illustrated in Table.1.
Indexing of Cloud-Covered Regions
In order to speed up retrieval processing and since information about cloud-covered regions is one of the main clues usually given by users, bit-based operation functions of a general-purpose image processor are applied to an index image generated according to the position or regions covered with clouds to do pre-selection of the candidate images. Each image is divided into 16 (4x4) blocks with size of 512x480 pixels. A 2-channel) of the index image is used to describe the cloud information of one raw indexed image, and each bit of a pixel is corresponding to one of the 16 block. If one block is fully covered with cloud, the value of the corresponding bit is set to 1, otherwise.

Table.1: The characteristic discription of Gif.7

Node no.	Class	# of pixels	In_node	Out_node	In_relation	Out_relation
1	2	38		2,3		rs
2	0	3332	1	3,4,5,6,7,8	v	urttuu
3	4	3924	1,2	6,7,9,10,11,12	xr	rvssss
4	2	1741	2	5,11,13,14,15,16	v	vtpstt
5	1	212	2,4	6,8,11	yr	txs
6	2	256	2,3,5	11	yvy	S
7	2	25	2,3		pr
8	2	23	2,5		ps
9	2	87	3	11	x	p
10	0	31	3	12	x	p
11	0	2566	3,4,5, 6,9	12,14,15,16,17, 18,19,20	xyx xu	sypyp rru
12	2	50	3,10,11		xux
13	0	30	4		u
14	1	1906	4,11	17,21	xt	st
15	1	24	4,11		yu
16	4	64	4,11		yt
17	2	58	11,14		ux
18	2	26	11		v
19	4	41	11		v
20	2	84	11		p
21	2	42	14		y

Note: class 0,1,2,4 are meant sea, thick clouds, thin clouds and land, respectively.
Image Retrieval

User Interface
Besides the popular menu-driven user interface, a kind of pictorial user interface is also offered in this system, namely users can interactively retrieve images in the form of query-by-pictorial-example. The system provides users with two kinds of guide images, which are composed of a global map, as shown in FIg.8, and a local one. The global one indicates the entire coverage where data from the NOAA satellites can be received at our ground station, and the local one covers the regions around the island of Japan, and is included in the global map, although it is not shown here separetely. Users. may use one of the guide maps or two of them, if necessary, to compose a sketch image by using the mouse tool of a display instrument. In addition, pictorial examples from the image database or the received image can also be utilized for retrieving the similar images.

Fig.8 The global guide map
Pre-Selection of Images Based on the Index Image
When users want to retrieved images based on the information about cloud-covered regions, the system puts the index image into the image memory of a general purpose image processor, and utilize its bit-based operations to process the index image to select candidate images according to the global cloud-covering information. This kind of processing is very fast, as in this system, the capacity of image memory is 512x512x4 bytes, global cloud information of 512x512 images can be processed simultaneously. In addition, the maximum elevation and azimuth of NOAA satellites at the acquisition time are also used for estimating which image the desired region is included in and which block in the image is near the desired region approximately.
Retrieval Model
A new model for image retrieval proposed in this paper is applied to our system. This retrieval model is a hierarchical one, and is composed two parts, which are based on the structural and syntactic methods widely used in field of pattern recognition. The first one is based on the edNLC-graph representation of images, and is utilized to retrieve images from the viewpoint of spatial relations between regions. The second one is based on the chain-codes description of regions, and is utilized to recall those images in which the dominant regions or desired regions are similar in geometric properties.
Retrieval based on the spatial relations
As the spatial relations between regions within one scene are represented by edNLC-graph, graph grammars and parsing algorithms must be utilized to analysis the similarity of such graphs for image retrieval. The similarity is a kind of distance measure between graphs. We use the string description, as shown in tab. 1, of each graph to calculate the distances between graphs, and the distances can be used to evaluate the candidates and determine the image (s) which satisfy the request from users by doing inexact matching.
Retrieval Based on Pattern Matching
The shapes of some dominant patterns within an image are also an important retrieval clue for image databases. In the case of remotely sensed images, much information can be derived from the shapes of clouds, for instance, precipitation can be estimated from the shape of clouds we can analysis the weather and atmospheric conditions, because they are mainly dominated by clouds.

Instead of chain-codes, we use a curvature chain which the elements e_i are defined from the chain code elements d_i by (10) as in In [12].

c_i = [d_i-d_i-1 +11)mod(8) -3 .............................(10)
Comparing with chain-codes, this curvature chain is rotation invent and is independent on start and end points. A similarity evaluation methods based on the geometric properties of the corresponding regions is applied to the system. This methods consists of finding the target common subsequence ILCS) of the above curvature chain strings and evaluating the similarity of regions based on the LCSs. This method is only applied to the dominant regions or desired regions, because it is computationally expensive. The method used in this system similar to the used in [11], in which the longest common sequence method of string matching is applied chain-codes representation of images.

Concluding Remarks
In this paper we have described the configuration of a NOAA image database system being developed at our laboratory, and a new image retrieval model in which structural and syntactic methods of pattern recognition are integrated. Because of the implementation of this new image retrieval model, image stored in the system can be recalled not only by attribute information of images in alpha-numerical form but also by sketch images based on image features. In this sense, the development of this system is very significant for the construction of multi-media databases, which attract many database researches' attention at present, as well as for its practical use in the remote sensing field.

The system configuration and the proposed approaches of classification, feature extraction, feature description, the future, we will investigate the structural and syntactic similarity between images using the proposed image retrieval model from the viewpoints of NOAA-AVHRR image features and human perception.

References

L.K. Fusco, et al., Earthnet's coordination shceme for AVHRR data, INT. J. REMOTE SENSING 1989, VOL, 10,pp625-636.
K. Muirhead, O. Malkawi, Automatic classification of AVHRR images, Proceedings of the 4th AVHRR data users' meeting , pp31-34,.
H. Murota, et al., Receiving and Processing System for Meteological Satellite ( NOAA), Proceedings of the 8th Asian Conference on Remote Sensing 1987.
M. Nakayams et. al., Quicklook Images Distribution System for the Meteorological Satellite (NOAA), IE87-89 (in Japanese).
M. I. Sezan, A Peak Detection Algorithm and Its Application to Histogram-Based Image Data Reduction. Computer Vision Graphics and Image Processing, 36-51 (1990)
M.I. Sezan, et. al., Automatic AutomicallyAutomically selective Image Enhancement in Digital Chesi RadioGraphy, IEEE TRANSACTION ON MEDICAL IMAGEING. VOL. 8, NO. 2, JUNE 1989 pp. 154-162.
T.Y. Philips, et. al., ) (log n) BIMODALITY ANALYSIS, Pattern Recognition, 741-746,1989.
A. Yamamoto, M. Takagi, Extraction of Object Features and Its Application to Image Retrieval, The Ttransaction of IEICE Vol. E 72, No. 6, 1989. pp 771-781.
S.K. Chang, et. al. Iconic Indering by 2-D Strings , IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9, NO.3, MAY 1987, pp 413,-629, 1988.
M. Flasinski, Parsing of edNLC-graph grammars for scene analysis , Pattern Recognition, Vol. 21, No.6, pp 623-629, 1988.
W.I. Grosky, Y. Lu, Iconic Indexing Using Generalized Pattern Matching Techniques, Computer Vision, Graphics, and Image Processing 35, 383-403 F(1986).
M.J. Eccles, et al., Analysis of the digitized boundaries of planer objects, Pattern Recognition, Vol. 9, pp 31-41, 1977.