GISdevelopment.net ---> AARS ---> ACRS 1989 ---> Digital Image Processing

The development of interactive decision boundary determination method in the feature space of remotely sensed data

Minoru Akiyama, Yasushi Shimoyama
Tamio Sekiguchi ,Takashi Hayashi, Takio Mizuno

Photogrammetric research and development office
geographicalsurvey institute Kitazato-1
Taukuba City Ibaraki Pref. Japan


Abstract
In ordinary supervised classification methods a decision boundary is automatically determined based on the statistical likelihood of each classes whose probability function was calculated from training samples therefore when attempting to improve classification accuracy we can do nothing but recollection of training samples however not always but recollection of training samples how ever not always effective despite that it is quite time and cost consuming .

On the other hand the displayed histogram of data gives us the information how the data are distributed in the feature space visually and may enable us to draw a decision boundary by referring the displayed data clusters more ever if an analyst has some knowledge for the land use pattern of the study area he could adjust the decision boundary by referring the distribution of training samples classified result from the initial decision boundary and so on

In this study a new method has been developed to determine a decision boundary directly on the feature space without the help of training sample feature this method consists of the following sub processes.
  1. Data compression by principal component transformation.

  2. Displaying the two dimensional data histogram of the first two principal component data in gray level.

  3. Collection of training samples and displaying their distribution over the data histogram data when necessary.

  4. Manual determination of a decision boundary on the principal component plane referring to the data histogram and the distribution of the training samples.

  5. Classification of image data by a looking up the table created by the decision boundary mentioned above.

  6. Evaluation of a classified result.

  7. Repetition of the process from 3 through 6 if necessary.
The characteristic of this method is that we can take human intelligence experience and knowledge in account setting the decision boundary in the feature space. This method enables us not only to reduce the time and cost for classification but to land the classified result to the direction as we like In addition through the case study it became assured that this method has enough accuracy and efficiency for practical use.

Methodology
The flow of this method original multi channel image data compressed so that two dimensional image data by using principal component analysis so that its histogram can be expressed on a plane in order to make each component axis are interactively plane scale and range of each determined by referring the minimum and maximum values of the first. Two principal .Training samples are colleted if necessary on the color composite image of the original channel data. then the distribution of the training samples is displayed over the histogram which may help a person to understand which cluster is corresponding to which class by referring the two dimensional histogram and training sample distribution decision samples distribution decision boundary is determining decision is interactively if the histogram is not well separated in to desired classes rough segmentation in to a few combined classes is executed .Fine segmentation is then executed by using the third or fourth principal component plane recalculated from the data involved in the decision class.

Since each segment divided by the determined decision boundary represents the area in the feature plane corresponding to each class classification by pixel is done by table look up method while the look up table is aforementioned class divided principal component plane.

Classified result is visually inspected and evaluated if there is any pixel obviously wrong it might be corrected by resetting the decision boundary.


Fig.1 Flow of the interactive Decision Boundary Determination Method

The final classification is made after several trial and error is decision boundary editing.

Case Study
The test site was determined in Fukuoka city and vicinity Fukuoka is the largest city in Kyushu island and the core city is south western Japan with the population of one million facing to the original image the specification of the test data is as follows

Sensor : LAND SAT TM
Scene 1D : PATH 112 row 37
Date : MAY 12 1986
Resembling : APPROX 20mX20m 1/50 National Standard Grid
Date Size : 1000 X 1000 Pixels
  1. Principal component transformation

    Principal component was calculated from six bands of TM original bands excluding and their contributing ratio. The two dimensional histogram on the first two principal component plane obviously this histogram consists of three big clusters one may easily draw decision boundary between three classes on this image.

  2. Collection of training samples and decision boundary setting shows some exampled of training sample collection.






    Fig.2 Original TM Image of Test Site


    Fig.3 Two-dimensional Histogram on the first two Principal Component Plan



    Fig.4 Training Sample collection


    Table.1 Principal Components and their contributing ratio
      P.C.1 P.C.2 P.C.3 P.C.4 P.C.5 P.C.6
    Eigen Vector1
    channel. 1
    2
    3
    4
    5
    6
    7

    .4090

    -4090

    -.2848

    .6774

    .3061

    -.1783
    .4343 -.2185 -3164 .-2838 -0976 .7573
    .4337 -.2330 -1711 -5635 -1639 -6197
    .2955 .7986 .1794 .174 1891 -0831
    .4227 .307 .4398 .1919 .7042 .0481
    .4359 -.0036 .6293 .2722 -.5817 .0390
    contributing
    ratio
    Accumulated
    .8328
    .8328
    .1411
    .9728
    .0189
    .9927
    .0044
    .9971
    .0019
    .9989
    .0011
    1.0000

    Selected classes are high density urban area low density urban area agricultural fields grass fields forestry and water 10 shows the distribution of training samples over the histogram respectively bu referring these distribution aforementioned three clusters were cleared to be corresponding to water forest lie urban classes classes agricultural fields and grasses fields lie in between forestry and urban. The rough setting of decision boundary referred to the training sample distribution the classified result in to those six month classes the classified result by ordinary maximum classification likeihood method the same training samples.

  3. Editing of the decision boundary

    By Comparing 12 13 we can see several differences as far water classes is concerned ML method is seemed to be under classified as unexpected islands appeared in the sea on the contrary this method is likely to be over classified as a break water disappeared the reason why ML is under stand is supported to be that training samples are too homogenous to represent whole to be that class evidently seen in we set the decision boundary for water class too large as seen in fig 11 by looking forestry class the result of this method is less noisy than the result of ML method this smoothing effect of this method appear in other classes .


    Fig.5 Training samples of H-Urban Area

    Fig.6 Training samples of L-Urban Area

    Fig.7 Training samples of agricultural fields

    Fig.8 Training Samples of Grass Fields

    Fig.9 Training Samples of forestry

    Fig.10 Training samples of water

    In order to salvage sank break water training samples are taken from the pixels of the water break and displayed the histogram as than decision boundary for water and urban are corrected so that the area corresponding to the training samples is saved from water class to urban class corrected decision boundary and the associated classified result.

  4. Classification Accuracy

    The consisting matrix for the training samples in the case of this method that of ML method with six original channels and two principal component channels respectively.

    The total performance of this method is worse than that of ML method with sox channels but better than that of two channels. therefore it is presumed that this method may provide better result than ML method when using same number of channels.


    Fig.11 Rough Setting of Decision Boundary

    Fig.12 Classified Relsult by the interactive

    Fig.13 Classified result by the mazimum likelihold method

    Fig.14 Training samples of breakvater

    Fig.15 Corrected Decision Boundary

    Fig.16 Corrective Decision Boundary Determination Method

    Table.2Confusion matriz of the interactive decision boundary deremination method
        1 2 3 4 5 6 Un class Total
    1 Urban 1531 100 2 0 0 0 4 1637
    2 Urban 124 726 133 2 1 0 5 994
    3 Agriculture 4 119 607 55 0 16 9 810
    4 Grass 0 36 121 425 0 3 1 589
    5 Forestry 0 0 0 0 972 0 0 972
    6 Water 0 0 0 0 0 1045 0 1045
    Performance (1531+726+607+425+972+1045)=87.7% 6047

    Table.3Confusion matrix of the maximum likelihood method(six original channels)
        1 2 3 4 5 6 Un class Total
    1 Urban 1502 135 0 0 0 0 0 1637
    2 Urban 90 780 109 15 0 0 0 994
    3 Agriculture 1 94 631 81 0 0 0 810
    4 Grass 0 32 48 507 0 2 0 589
    5 Forestry 0 0 0 0 972 0 0 972
    6 Water 0 3 4 2 0 1036 0 1045
    Performance ( 1502+730+634+507+972+1036) = 89.8% 6047

    Table.4 confusion matrix of the maximum likelihod method(tow principal component channels)
        1 2 3 4 5 6 Unclass Total
    1 Urban 1514 123 0 0 0 0 0 1637
    2 Urban 016 745 129 14 0 0 0 994
    3 Agriculture 3 146 533 128 0 0 0 810
    4 Grass 0 36 57 494 0 2 0 589
    5 Forestry 0 0 0 0 972 0 0 972
    6 Water 0 0 5 15 0 1025 0 1045
    Performance ( 1514+745+533+494+972+1025) = 87.4% 6074
Conclusions
This method showed even better result than ML method using the same channels and the same training samples in spited of quite rough setting of decision boundary therefore it is quite likely to get much better result through fine editing editing of the decision boundary or using the third or the fourth principal component information for classification of classes mixed in the first and the second principal component plane.

Decision boundary editing inset on an operator additional work how ever his effect must be rewarded as obtaining the better result this is in my problem my opinion the most automatic method which might be dropped in to endless loop of training sample recollection and classification.

It has already been proved that human ability of image interpretation is quite well there fore we had better to use human ability more directly and actively in classification.