GISdevelopment.net ---> AARS ---> ACRS 1999 ---> Image Processing

Study on Quality Evaluation of Compressed Remote Sensing Images

Li Youping, Xu Qingfen, Bian Guoliang
Beijing Remote Sensing Information Institute

Introduction
With the development of remote sensing technology, high resolution, wide swath and multi-waveband make data from sensor increase greatly. Image compression should be finished onboard to lighten the burden on the communication system. Therefore remote sensing image compression becomes one of the most pressing tasks. There are many kinds of image compression approaches based on different theories. To evaluate whether an approach is good or not, people’s attention is mainly focused on bit ratio, degree of difficulty in performing the approach and quality of the reconstructed image. Among the three, image quality measure is the most fundamental but the most unoperatable because by far there is no criterion of image quality evaluation which can be accepted generally all. Significant compression is achievable only by lossy algorithms. To find the answer to what degree the reconstructed image can be accepted by the users, we make an approach to the measurement of high-resolution remote sensing image quality and get some initial results.

General methods for image quality evaluation
Generally speaking, image quality evaluation has something to do with the purpose of the image. Image quality has two implications: fidelity and intelligibility. The former describes how the reconstructed image differs from the original one, with mean-square-error as a typical example, and the latter shows the ability through which the image can offer information to people, with classification-accuracy as a typical example. Both are foundational in measuring the image quality.

It must be pointed out that fidelity is not always objective and intelligibility is not always subjective. Whether an objective measure on image quality is efficient or not depends strongly on its accordance with subjective measure. But this consistence is difficult to find out. The reason why the result of objective measure is inconsistent with that of subjective by the user in many cases is that firstly man’s knowledge on visual characteristics is not enough to establish an accurate visual model and secondly people has no better methods to describe objective measure. For example, interpreters usually tend to pay their attention to that part of an image where distortion is the most or they are most interested in, such as edges or texture, but objective measure cannot describe it accurately.

Research on consistence between subjective and objective measures
Methods for image quality evaluation can be classified as objective and subjective measures. By objective measures some parameters are calculated to indicate the reconstructed image quality and by subjective measure viewers read images directly to determine their quality. The ultimate goal of research on image quality evaluation is to develop a quantitative measure that will consistently be used as a substitute. Then we conduct a study including several aspects as follows.

Establishment of a standard-image base
Until now, when those who study image compression evaluate their methods they usually choose some typical images such as portrait of Lena or Girl and calculate the peak signal-to-noise-ratio to indicate the quality of reconstructed image. Since this kind of test-image is different from that of remote sensing, the conclusion isn’t suitable for all kinds of images. So the standard-image base to meet special needs is established and the compression method to be chosen would be suitable for most of the images in the database. On the other hand, the ultimate goal of image quality evaluation is to find out the consistence between subjective and objective measure, which can be obtained only by large quantities of tests. Therefore the standard-images for test should be more comprehensive.

What kind of image should be chosen as standard-image? We consider it in two ways. These images must be firstly of different types and several resolution levels, which can offer various information, and secondly of various statistical characters such as histogram and entropy in order that the standard-images are more comprehensive in statistical characters.

Objective measure
Parameters of objective measure are considered in two aspects. One is the difference between the reconstructed image and the original such as Mean-Square-Error (MSE) and Signal-to-Noise-Ratio (SNR); another is the approximation between them such as fidelity (BZD) and resemblity (XSD). The calculation formulas are as follows:

Mean-Square-Error



Signal-to-Noise-Ratio



Fidelity



Resemblity



There is another parameter that combines both subjective and objective measures named Perceptional Error (PE).

The perceptional-image-quality-measure- scheme makes use of Human Visual Specialty (HVS). Refer to the reference [2] for more detail. By this scheme, two human visual specialties, lightness shield and special frequency shield, are considered to improve the measure of MSE. Mean lightness and activities in the whole image decide two factors important for PE calculation, which means that PE calculation depends on both subjective and objective measures.

Subjective measure
By now, subjective evaluation by viewers is still a method commonly used in measuring image quality. The subjective test emphatically examines fidelity and at the same time considers image intelligibility. That is to say, when taking subjective test, viewers focus on difference between reconstructed image and the original and, while grading, they notice such details where information loss cannot be accepted.

The representative subjective method is Mean Opinion Score (MOS). It has two kinds of rules. One is absolute and another is relative. Two examples are shown below. In our experiment, we use absolute rule in order to seek the consistence between subjective and objective measures.

Absolute rule
5 Excellent
4 Good
3 Fair
2 Bad
1 Very bad
Relative rule
5 The best in the group
4 Better than the average
3 The average of the group
2 Worse than the average
1 The worst in the group


The standard for quality levels should be established before grading. Then the viewer compares the reconstructed image with the original to decide which level it belongs to and gives the score. The final score is the average of all the viewers’. The number of viewers should be greater than 20.

In order to make scores more accurate, we use the hundred-score system. The rule for scoring is as follows.

90-100: almost distortionless
80-89: a little distortion, which can be ignored
60-79: distortion can be seen evidently but can be accepted reluctantly
40-59: a lot of distortion, which can’t be accepted
0-40: too much distortion to be tolerated

Both professional and amateur viewers are required to finish the subjective test. The former has experience in getting details and their evaluation could be regarded as of more image intelligibility. The latter has no training and their evaluation represent an ordinary sense perception on image quality. To images for a special application, conclusion drawn by the professional is more important than that by the amateur.

Test results
Ten typical images with different scenes are processed by nine compression methods and ninety reconstructed images are obtained. Some test results, gained by both objective and subjective measures mentioned above, are shown below.

Figure 1 is the result from subjective evaluation test, where a group of data dots with the same sign are connected to form a broken line, which indicates one method, applied to different images and the broken lines themselves have no physical meaning. It can be seen from the figure that when a compression method is good enough (MOS is great than 90) the MOS won’t change evidently with different images and keeps relatively steady. But to those whose reconstructed images are not very good, which are usually with high bit ratio or relatively simple algorithms, different images cause changes of the quality of reconstructed images. It shows the necessity to evaluate a compression algorithm with different types of standard images.

Figure 2 shows a contrast between subjective and objective measures on one of those ten images. Explanation to the broken lines is the same as above. We’ve drawn a conclusion based on the tests on the ten images.


Fig.1 Results of subjective measure



Fig.2 contrast between subjective and objective measures


  • PSNR can reflect the quality of reconstructed images approximately. Generally speaking, PSNR must be above certain value if the reconstructed image reaches the level of “good”.
  • To a sort of compression algorithms based on the same fundamentals, PSNR coincides with subjective evaluation and can be used as a measure to evaluate the quality of reconstructed images. To algorithms based on different fundamentals, PSNR cannot reflect image quality correctly all the time. In other words, PSNR isn’t sufficient to determine whether an algorithm is good or not.
  • Perceptional Error can reflect quality of reconstructed images fairly well. Generally speaking, PEm must be under certain value if the reconstructed image reaches the level of “good”.
  • Fidelity and resemblity are so obtuse that they cannot show the changes sensitively.
Conclusion
Study on criteria for image quality evaluation is a meaningful but complicated task. The criteria will be used to evaluate the compression algorithm and to guide the design of algorithm as well. We’ve made an approach to the measurement of image quality and drawn some primary conclusions, which indicates that the research method is feasible.

To obtain an accurate and reliable conclusion eventually, we need subjective tests on a large quantity of standard images, including testing samplings of different resolution and different image types, various compression algorithms and different bit ratio, etc. At the same time, objective measures should be studied deeply and improved with the result of a quantitative measure, which will consistently be used as a substitute.

References
  • Ahmet M.Eskicioglu & Paul S.Fisher, Image Quality Measures and Their Performance, IEEE Transactions on Communications. Vol.43, No.12, Dec.1995
  • Pamela C.Cosman, Robert M.Gray & Richard A.Olshen, Evaluating Quality of Compressed Medical Images: SNR, Subjective Rating, and Diagnostic Accuracy, IEEE, Vol.82, No.6, Jun.1994
  • Shanika A.Karunasekera & Nick G.Kingsbury, A Distortion Measure for Blocking Artifacts in Images Based on Human Visual Sensitivity, IEEE Transactions on Image Processing. Vol.4, No.6, Jun.1995