GISdevelopment.net ---> AARS ---> ACRS 1997 ---> Education/Training

Why is GIS Difficult?

Sally E. Goldin and Kurt T. Rudahl
Goldin-Rudahl Systems, Inc.
University Drive, #213, Amherst Maximum 01002 Usa
FAX: +1-413-549-6401
E-mail :seg@goldin-rudahl.com

Abstract
This project used knowledge-engineering techniques to identify obstacles to more successful and less effortful GIS use. We conducted in-depth, structured interviews with sixteen GIS users working in a variety of environments. We then analyzed the content to identify fundamental knowledge elements, common problem solving strategies, and areas of perceived difficulty. Results indicated that institutional and environmental constraints, management of details, and a mismatch between GIS software structure and users' conceptual models are significant factors that interface with effective GIS use.

Introduction
The use of computer-based Geographic Information Systems (GIS) technology in the government, business, and non-profit sectors has expanded tremendously in the last decade. GIS has become a pivotal component for decision making and planning in government agencies and in business, and already had significant impact in applications ranging from facilities management through marketing analysis to the monitoring of global change and environmental degradation. The audience of GIS Technology continues to diversify, and it is expected that GIS will be adopted by millions of new users in the years ahead. While many of these new users will be seeking simplified tools custom tailored to their applications (Jordan, 1993), a significant fraction will need the rich repertoire of data development, data management, and spatial analysis functions provided by a full-scale GIS software environment.

Despite this growth, obstacles to the widespread use of GIS technology remain. One of the most serious is the apparent complexity of GIS analysis techniques and systems. Because GIS commonly uses graphical presentation methods, almost anyone can understand and appreciate its results. However, highly-trained personnel are usually are required to produced these results. GIS systems are actually becoming more complex due to the more integration of Remote Sensing and other functions (Dobson, 1993). Their complexity remains an obstacle preventing many organizations from reaping the benefits of GIS (Gordon & Subra, 1992).

Our objective in this research was to identify barriers to rapid learning and effective use of GIS, with the long-term goal of creating knowledge-based software tools to assist experienced GIS users in being more productive. The results were also expected to provide guidelines for GIS education and training. The research used knowledge-engineering methodologies originally developed within the cognitive science and artificial intelligence research communities. These techniques have been applied primarily in the development of experts systems, but have broad potential for more general task analysis and modeling.

Knowledge Engineering
Knowledge engineering (Waterman, 1986) is the process of creating a formal description of human knowledge or exercise and implementing a computer system that incorporates that knowledge in a usable form. The content of the knowledge derives from interviews with and/or observations of human experts in the problem domain. Techniques exist for analyzing records of experts' behavior and inferring rules and other informations structures. This activity is known as protocol analysis (Newell and Simon, 1972).

Knowledge engineering traditionally consists of four stages:
  1. Protocol gathering: Experts in the domain of interest (e.g. the use of GIS) are observed while engaged in a task relevant to the domain. The subjects are asked to verbalize their intensions, strategies and thought processes as they work on the problem or task.
  2. Protocol analysis: The transcripts of experts' verbalizations are coded into content categories; results are subjected to frequency analysis and other forms of statistical comparison.
  3. Knowledge Modelling: Representational structures are inferred or synthesized based on the raw data from the content coding. These may include conceptual primitives, relations, and/or rules.
  4. Implementation: The representational structures resulting from the third step are incorporated into a software model of the task and domain. The knowledge, instantiated in working software, can be used to execute similar tasks in the domain, or assist human users in doing so.
This prototypical methodology encounters some problems when applied to the domain of GIS. A realistic GIS analysis task has a time span of weeks, months, or even years. Observing a user for a few hours or days would provide only incomplete and fragmentary data. Furthermore, GIS experts typically cannot afford to devote days or weeks to a research undertaking. We modified the typical knowledge engineering procedure to mitigate these problems. First, we increased the number of subjects, to compensate for spending only a few hours with a subject. Second, we used structured interviews rather than unfocused protocol gathering, to obtain a broader sampling of data about task strategies and conceptualizations.

Methodology
Sixteen subjects participated as experienced GIS users in the data gathering phase of this research. We attempted to include individuals from a variety of applications areas, and institutional categories (Table1). GIS experience ranged from two years to more than twelve years, the majority of subjects indicated that they had been working with GIS for at least four years.

Although the subject population was moderately diverse, they shared one common characteristic: nearly all of the subjects had worked primarily with ERSI's ARC/InfoTM GIS products. This reflects the dominant position of this software in the current GIS market. Six subjects indicated that they had worked with other GIS packages or system. Only one subject did his primary work using a different package.

Another common feature was that all the subjects were involved in natural resources or environmental applications, in the broadest sense. However, given their varying institutional affiliations, their perspectives on environmental issues differed considerably.

Table 1 Summary of Subject Population Characteristics
Institutional Affiliation   Geographic Area      
State government 6 Massachusetts 10 Male 8
Educational 5 Connecticut 1 Female 8
Non-Profit 3 Vermont 2    
Private Industry 2 Idaho 3    

Each subject was interviewed for a period of 1.5 to 3.0 hours. With one exception, the interviews took place in the subject's office, work area, or place of business. A number of subjects referred to and explained maps and other analysis products as part of their interviews, while's others produced laboratory notebooks and other aids they aids they use in working with the GIS. Interviews were tape-recorded.

The structure of interviews was somewhat flexible, in order to encourage subjects to follow associative connections and " think out loud ". The interview had a checklist of topics to be explored, and attempted to each of the following questions:
  1. What is the scope and type of applications you work on?
  2. What is the most difficult GIS problem you recall having worked on? How did you solve it?
  3. How do you approach a new GIS problem? What techniques do you use for planning or design? Do you work out the whole analysis ahead of time?
  4. What if anything do you write down when you are working with the GIS? How do you organize your notes, if any?
  5. Do you think of GIS problems in terms of visual images? If so, can you describe them?
  6. What kinds of problems have you encountered or mistakes have you made?
  7. What kind of additional tools or assistance would help you solve problems or do your job better?
Subjects varied in their ability or propensity to introspect about their design and problem-solving processes. However, all subjects talked at length about the specifics GIS work in which they had been involved. In many cases, strategies and approaches can be inferred from these descriptions.

Approximately 34 hours of interview data were collected. The taped interviews were transcribed for further analysis. The transcribed interviews represent nearly 600 pages of single-spaced text. Hardcopy transcripts were then coded by hand, using a system of colors and line-styles. The coding process identified words, phrases, or sentences in the following categories:

Data items
  • Data layer or data object
  • Data attributes
  • Relations between data objects or layers
Action items
  • Operations or actions
  • Intentions or objectives
  • Evaluation criteria and problem statements
Context items
  • Strategies and analysis templates
  • Problems or needs
  • GIS application

These coding categories were designed to capture of users' discussion of GIS, and represented our initially theory concerning important primitive elements in the GIS knowledge base. After each interview had been coded, the coding results were transferred to several summary files. These files listed each distinct item within a data coding category (e.g. each GIS operation or action mentioned), with a frequency count. References to obviously equivalent concepts were grouped together, retaining the users' wording.

Table 2 presents a section of the summary file for data layers. This is intended to provide some idea of the types of items coded, as well as the frequencies and the variation in user terminology. (Note that the full summary file for data layers and data objects included about 380 separate items.)

Table 2 Extract from Data Layer Summary File
Zoning 7 wetlands 20
Agricultural 1 salt marsh 1
Soils 35 non-forested wetlands 1
Soil patterns/soil characteristic 2 non-wetland 1
Agricultural potential from soils 4 vegetation 17
Land use/land cover 26 forest/woodlands/oak woods 4
Agricultural land use/agriculture 4 forest types 1
Irrigated agriculture/irrigation 5 marshes 1
Types of irrigation/flood acres 2 swamps 1
Center pivot irrigation system 1 cranberry boys 1
    Fields 2

The coding phase produced nine summary files, one for each of the coding categories listed above, which served as input to the knowledge modeling phase of our research. Knowledge modelling is an inductive process. Working from a set of examples, the knowledge engineer attempts to drive a small set of primitive knowledge categories that systematically account for the example data. We approached our large but diffuse set of sample data in several steps. First, worked with each content category independently, to devise a taxonomy, hierarchy, or set of organizational principles that reduced the data set volume while retraining the data content. Second, we examined the interview transcripts from the perspective of these categories, looking for interactions or relationships among them. In the process, we concluded that some of the content categories, were more important than others in our users' conceptualization of the GIS domain.

Results
The study reported here produced an enormous amount of data, which has been only partially analyzed, and which can be applied to a different research questions. In this paper, we address only the primary issue in the tittle, namely, why is GIS difficult.

Openness of the GIS Domain
GIS is difficult because institutional and environmental constraints restrict the analyst's activities or interface with his/her objectives. In the real world, GIS is not distinguished from other activities that strictly might be considered to be unrelated to GIS. In particular, the general hardware and software environment that supports the GIS becomes part of the domain. Many of the problems that people cited as "GIS problems" actually stemmed from difficulties with hardware configurations, operating systems and so on. To the users that we interviewed, the GIS, the hardware, the operating system, the network, etc, are all one environment. Thus, strategies related to working with the GIS are intertwined with strategies related to other aspects of the environment.

Similarly, activities related to assembling or compiling information (e.g. reading reports, contacting town planning boards, tracing out boundaries on paper maps, doing ground checks) also appear to be viewed as part of the GIS domain. Problems of source data accessibility or quality, as well as organizational barriers related to data ownership, were all identified by our subjects as "GIS problems".

This finding is not terribly surprising. The subjects are professionals focusing on the problems that they need to solve. They use the GIS as a tool in the context of these problems; they derive no advantages from distinguishing GIS operations from other actions they may take in moving toward a solution. However, this observation does pose a problem for someone trying to model the knowledge used in GIS analysis. The domain of "strictly" GIS concepts and knowledge is large and complex in its own right. These interviews suggest that the real knowledge base needed for GIS tasks in significantly larger, and is open-ended.

Management of details
GIS is difficult it required recording, recall and use of many details, which are outside of the GIS database proper. Practically every subject reported using some kind of external aids for maintaining information about his or her GIS work. Many subjects keep notebooks or logs of their project work, recording a wide variety of items such as coordinates, tolerances, lists of data layers or maps, special cases or problems to be considered etc. They also use sketches and other graphics to record problems, procedures, or plans.

In addition to these notes on project work, several subjects indicated that they keep notebooks on topics related to the GIS software itself. These notebooks are used to record problem, workarounds, detailed procedures, shortcuts, and so. In general, subjects view this information as (essential) supplements to the GIS product documentation.

Subjects noted that many of the errors they made were due to the failure to record, or inability to recall, this type of detailed information. They had learned, through experience, the importance of maintaining and managing the specifics items of information that they, personally, tended to forget.

These finds refute an initial hypothesis. We had expected to find that one of the most difficult aspects of using GIS is generating and expressing an analysis plan, an outline of the steps that will lead from the initial statement of objective to the desired GIS products. This hypothesis was based on observations made during GIS training courses taught by the authors, as well as other sources (Marble, personal communication).

While high-level analysis planning may present problems to novices GIS users, results do not suggest that this is true of experienced users. To quote one subject.

…I don't bother to write down, because I'm not going to forget, the big picture … I write down what I'm afraid I'll forget, I guess.

Data versus Operations as the Focus of Work
GIS is difficult because users focus on data , while GIS software focuses on operations.

GIS is typically described as a set of operations applied to data: overlying polygons, creating buffers, calculating viewshed. There have even been attempts to specify a formal "algebra" of spatial operations (e.g. Tomlin, 1990). Most commonly-used GIS software is also organized around operations. Originally, each operation was associated with a command verb. Although many GIS software packages now use a graphical, menu-based interface, the action-oriented organization has not changed significantly. First, the user selects an action from the menu; secondarily, he or she supplies information on the data sets to be processed, as operation parameters.

The present research suggest that this organization does not match the way that experienced GIS users work. Users conceptualize GIS work in terms of data objects and relations, not in terms of operations or actions. This conclusion is supported by the frequency analysis of content categories as well as subject's direct statements.

The number of distinct data items coded is more than twice the number of actions (380 versus 165); similarly, the total frequency of data items is close to double the frequency for actions (1588 versus 973). In short, subjects talked more about data than about operations. Looking just at the actions, 41% of the items and 45% of the frequency is accounted for by references to actions involved in creating or editing data. (Other categories of actions include presenting or visualizing data, manipulating data and software procedures.)

One possible explanation for these statistics is that the subjects were more involved in building data bases than in manipulating and analyzing this data. Further examination does not support this argument, however. The coded results include 85 examples of non-trivial GIS applications described by the subjects. Nearly all these applications require some operations beyond simple data capture or editing, and some involved quite sophisticated modelling techniques. Similarly, there were 55 statements of selection or allocation criteria used in modelling. The subjects clearly are doing data manipulation and modelling, but they tend to think about it and talk about it in terms of building data layers or making maps, rather than in terms of applying particular GIS operations.

Direct statements by the subjects indicate that the semantic attributes of specific data layers guide analysis strategy. To the GIS software, a wetlands dataset, a land use dataset, and a parcel-boundaries, all collections of polygons that can be operated upon in defined ways. The GIS analyst, though, considers the physical, logical, and social characteristics of these datasets, as well as their spatial structure. He or she knows that a lake "feature" is likely to have at least one stream "feature" physically connected with it, or that an agricultural polygon is more likely to be adjacent to a forest polygon than a high-density urban polygon. In addition, the subjects recognize and use relations between datasets that depend on the real-world meaning of data items; they know that a stream may serve as a part of a parcel boundary, and also delimit farmland from residential land.

This rich knowledge about data appeats to be central in the strategies that experienced GIS users bring to bear on analysis problems. However, the GIS software that they use does not support these strategies. At best, the software is neutral; it simply provides no way to capture or utilize the meaning-based relationships inherent in the data. At worst, the software organization may interface with the analysis process, requiring a translation from a data-centered paradigm that loses information and introduces errors.

Implications for GIS Training and Education
This research, although conducted with experienced GIS users, has relevance to novices as well, and to the design of training programs to turn novices into experts. Assuming that the goal of GIS education is to produce professionals who can use existing GIS tools successfully, to solve actual problems, this research suggests the following recommendations.
  1. Teach GIS in a real world context
    Many GIS curricula, particularly those used in academic settings, focus on the normal prerequisites for Geographic Information Systems: projections, coordinate systems, raster-versus-vector data model, relational database concepts, and so on. Furthermore, there is often a strong emphasis on the analysis component of GIS, with datasets assumed as a "given".

    Formal topics are critically important. However, to prepare students for the scope of real-world GIS activity, training programs need to incorporate a broader set of activities. Students should work on database development, beginning with the identification of possible source documents (and the evaluation of their quality or utility). They should be exposed to the idiosyncracies of computer hardware and software. Even as they learn the prescriptive methods for particular tasks, they should also be reminded (or shown) that in reality things are less clear-cut: that data availability, data quality, economics, or polities may sometimes tradeoffs and compromises.

    Most importantly, they should work on real problems, grounded in real places they can visit and come to know. This will begin to build the rich base of spatial and non-spatial knowledge that experts appear to use in their day-to day work.

  2. Teach organizational skills
    To work in GIS, an individual needs fundamental geographic concepts, mathematical/analytical skills, graphical design principles, and a thorough knowledge of GIS software tools. However, this is not enough. The present research indicates that a typical GIS project requires the user to manage volumes of ancillary details. Realizing this, we can prepare students by explicitly introducing them to techniques for organizing information.

    Computer-based "on-line" notebooks may be a direction to explore. These could be implemented using inexpensive, off-the-shelf software packages. None of the subjects in this study used computers for recording details, but all of them responded enthusiastically to the idea of computerized tools to aid in detail management.

  3. Teach both operation-centered and data-centered perspectives
    The present research suggests that data-centered models are a common, and perhaps natural, way to conceptualize GIS problems. Knowledge about the patterns, meaning and relationships within and between data sets can effectively drive the process of GIS analysis. This appears to be true, even though that structure of existing GIS software does not support a data-centered approach very well.

    To enable students to take the advantage of the benefits of a data-centered approach, while still working effectively with current software tools, we should explicitly foster both approaches. We can encourage students to visualize data and relationships, and to use real world "common sense" in thinking about different datasets. At the same time, we can teach flow-charting, top-down design, and the algebra of spatial analysis, where the focus is on the operations that transform data. Such a hybrid strategy will hopefully result in GIS users who are comfortable with the current softwares tools and can use them well, while at the same time going beyond their limitations.
Conclusion
The present research indicates that GIS is difficult even for experienced users because: 1) it subsumes a wide range of institute and environment issues, outside the formal definition of the domain; 20 its requires management of large amount detailed information, distinct from the GIS data; 3) the task structure enforced by current GIS softwares does not match the way experts approach GIS problems. Given these findings, we should be able to augment GIS educational programmes to better prepare practitioners to use the powerful capabilities of GIS effectively, thus enhancing the benefits of this technology for society.

References
  • Dobson, J.E. Commentary: A conceptual framework for integrating Remote Sensing, GIS, and geography. Photogrammetric Engineering & Remote Sensing, Vol. 59, No. 10, October 1993, pp. 1491-1496.
  • Gordon, W,R. and Soubra, N.M. Geographical information systems and planning in the U.S.A.: selected municipal adoption trends and educational concerns. International Journal of Geographic Information Systems, Vol. 6, No. 4, 1992, pp. 267-278.
  • Jordon, L.E, J. GIM interviews the GIS industry. Geodential Information Magazine, Vol. 7, No. 10, October 1993, pp. 35-39.
  • Marble, D.F., Peuquet, D.J., Boyle, A.R., Bryant, N., Calkins, H.W. and Johnson, T. Geographic Information Systems and Remote Sensing. Manual of Remote Sensing (2en ed.) . Falls Church, VA: American Society of Photogrammetry, 1983.
  • Newell, A. and Simon, H.A. Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1972.
  • Tomlin, C.D. Geographic Information Systems and Cartographic Modelling. Englewood Cliffs, NJ: Prentice-Hall, Inc. 1990.
  • Waterman, D.A. A guide to Expert Systems. Reading, MA. Addison-Wesley Publishing, 1986.