Monday, August 29, 2011

Smallworld Technical Paper No. 3 - An Object-Oriented GIS - Issues and Solutions

by Arthur Chance, Richard G. Newell & David G. Theriault

Abstract

The current generation of software tools is inadequate to satisfy the wide diversity of GIS requirements in a seamless manner. In addition the tools provided to develop user applications or to customise current GIS offerings exacerbate the problem. Object oriented programming systems (OOPS) are now recognised as a key component in building powerful applications which are robust and maintainable and which are also to be seamlessly extendible. Unfortunately, many myths surround OOPS: that they are difficult, simply fashionable, or inherently slow. This paper will propose that an OOPS coupled with an interactive programming environment can be highly effective when applied to the demanding requirements of GIS. It describes a rich, but user-friendly, polymorphic, exemplar- based environment which supports today's emerging standards and which has proven highly appropriate to the development and implementation of GIS.

Introduction

The largest costs incurred for any organisation embarking on the implementation of a GIS are data conversion, hardware and software, and system implementation, particularly customisation. It is now recognised that no GIS on the market today does it all, thus implying that the missing functionality has to be added to the base system in order to satisfy any particular customer. Yet rarely do you see, in the many papers describing how to choose a GIS, the need for assessing the ability of a system to be customised. It is common for the next most expensive item after data conversion to be customisation.

Most checklists for choosing a GIS concentrate on long lists of superficial functionality, and little is said of an assessment of the quality of basic system fundamentals and foundations. The most important fundamentals to get right in a technology as complex as GIS are the basic database architecture, and the data models implemented within that architecture together with the software architecture needed to extend the system at low cost.

During the last ten years or so, enormous strides have been made in the development of hardware. Processors are at least 20 times faster, memory is 500 times bigger, disks are hundreds of times bigger. The advance in hardware has fuelled a parallel advance in fundamental database software. The relational model has advanced from the era when there were doubts as to whether it could be made to work at all to now being the dominant technology.

In contrast, the advance in the development of tools to develop and adapt software systems is nothing like so dramatic. We all know that most of the world's commercial software is still written in Cobol, most of the technical software is still written in Fortran. Even "modern" languages such as C do not give a whole lot more in productivity and maintainability when compared to the huge advances in other technologies.

The advent of CASE tools and 4GLs make some attempt to address the issue. It is our belief that the fundamental system architecture itself has to be rethought if large strides are to be made. This paper proposes an interactive hybrid object-oriented, procedural language as a key component of such an architecture. We are well aware that the term "object-oriented" is frequently bandied about with very few people understanding what it means. We also attempt in this paper to explain what the technology means, and why it should be of significance to the implementation of GIS.

Conventional System Structure

Traditionally, large interactive systems were developed in a non-interactive procedural language, such as Fortran or C. In order that an end user could drive such a system, an interactive command language was provided so that the user could type in his commands. Many command languages evolved into programming languages sometimes by borrowing the programming concepts of Basic. In more modern user interfaces, this command language may well be hidden behind a system of screen menus, tablet menus or other input devices. Large modular systems were glued together using an operating system commands or script (see Figure 1). Within the system, developers and customisers had a number of other languages available to them to define such things as syntax, menus, data descriptions, graphic codes, etc, all running as different processes communicating by files.

The structure of such systems was commonly organized at the highest level around the command syntax and complex commands were structured in a top down approach from there. If one examines systems that have been put together in this manner over the last five to ten years they all suffer a number of difficulties:

  • Development is slow. Users' requests for enhancement have to wait for the next release, which is usually over a year away.
  • They are difficult and expensive to maintain. During the life cycle of the system, probably 90% of development goes into maintenance.
  • Major restructuring in the light of five or ten years of hindsight is unthinkable.
  • Customisation is arbitrarily done in one or more of the many languages used to put the system together, typically Fortran or the command language.
  • Integration with other systems is nearly impossible.

[ Figure 1 not available ]

Much effort has gone into toolkits for developing and customising systems including standard graphics libraries, user interface managers, data managers, windowing systems, etc. However, if one wishes to get in and program or customise any of these systems, one is confronted with operating system commands, Fortran, C, SQL, embedded SQL, some online command language, domain specific 4GLs or a combination of these; not to mention auxiliary "languages" to define user syntax, menus, data definitions, data dictionaries, etc. With these kinds of programming tools it can take many man-months of skilled programmer effort to achieve even modest system customisation.

Development Languages

It is a common problem with systems that contain parts that are front ended by different languages that it is not possible to integrate them properly. For example, a graphics system for mapping, which is "hooked into" a database, typically does not allow the full power of the database to be accessed from within the graphics command language, nor can the power of the graphics system be invoked from within the database query language. What is really needed is a system such that all data and functions can be accessed and manipulated in one seamless programming environment (Butler 1988).

What has been shown by a number of organisations is that the same development carried out with an on-line object orientated programming language can cut such development times by a very large factor (e.g. 20). Object orientation does not just mean that there is a database with objects in it, but that the system is organised around the concept of objects which have behaviour (methods). Objects belong to classes which are arranged in a hierarchy (preferably a heterarchy). Subclasses inherit behaviour and communication between objects is via a system of message passing thereby providing very robust interfaces.

Firstly, such a language should be able to provide facilities covering the normal requirements of an operating system, customisation, applications programming and most systems programming. Secondly, the language should have a friendly syntax that allows casual users to write simple programs for quick online customisation. Larger programs should be in a form that is easily readable and debuggable. Thirdly, the language must be usable in an interactive mode.

There are several languages around which satisfy some of these requirements: Basic is alright for simple programs (large impressive systems have been implemented using some of the more advanced modern Basics); Lisp has much of the power and speed, but is hardly readable (however, much of the success of Autocad may be attributed to Autolisp). Smalltalk has both speed and object orientation, but with the total exclusion of any other programming construct. Postscript is very widely used and has a number of the desired features, but is another write-only language (i.e. "unreadable" by anyone, including the original programmer). Hypertalk is wonderful, but you would not write a large system in it. C++ has much of the required syntax and semantics, but it is not available as a front end language and can therefore only be accessed by a select few system builders, normally employed by the system vendor.

Having dismissed most of the well known languages developed during the last 30 years, then what is required? It is an on line programming language, with a friendly Algol-like control structure and the powerful object oriented semantics of languages like Smalltalk.

What Is Object Orientation?

The dominant style of programming has always been procedural, the structure of the program being organised around the functions being performed, usually in a top down hierarchy of procedure calls. An object oriented programming language is one where the program is organised around the objects being processed, usually in a hierarchy of objects which can share (inherit) the procedures (methods) belonging to other objects.

Object-orientation is not an easy concept to explain, however, its importance is not in doubt. Object orientation will become the dominant approach to structuring and building large complex systems in the future.

Efficiency in the development of computer systems depends on how easily they are modified and enhanced. Changes in an evolving system are either concerned with changes in function or changes in data structure. Procedural programming does a reasonable job in localising changes to function by means of such devices as routine libraries, but changes to data structures usually mean a cascade of side effects as the data structures are referred to in many parts of the system. Object-oriented programming goes a long way to localising these changes also.

An object comprises two things, its own state (manifested as a set of instance variables) which no other part of the system can access directly and a set of procedures (called methods) which describe its behaviour. Everything about an object is encapsulated within it and the only way of getting data out of it, or changing it, or getting it to do something is by sending messages to it. An object is a rather sophisticated extension of the concept of variable in other languages.

Quite frequently, objects of different classes might be similar, in that one class might exhibit all the behaviour of another plus some additional behaviour. In other words, one class may share a number of other classes' methods, but also has a different version of some methods and some new methods of its own. In these cases one class can be defined as a subclass of another so that it can inherit the shared behaviour.

Object orientation, which embodies the above concepts of encapsulation and inheritance brings the following benefits:

  • Code sharing is greatly increased, thereby increasing programmer productivity.
  • System modules communicate via well defined interfaces, which means it is easier to find bugs.
  • It is easier to maintain software: both minor enhancements and sometimes major restructuring. Maintenance costs are reduced.
  • The environment is not only extremely good for prototyping, but also can be used as a base for a production system.
  • User interfaces can be iterated to their optimum much more easily.
  • Object orientation is very suited to the manipulation of heavily structured data.

One of the commonest reservations echoed about object orientation is whether it can be made to work with acceptable performance. One recalls the same remarks being made about relational databases nearly 15 years ago.

It is true, object oriented languages do run slower than procedural ones. This is mainly because the message expression may take more time to evaluate and also programming style tends to lead to large numbers of messages being sent (procedures being called) to rather small methods (procedures). This performance issue is now considerably offset by improved compiler techniques, faster hardware, and using a procedural approach where appropriate.

In any case, much customisation today is carried out by writing macros in the system command language. As these are run via an interpreter, they are far slower to execute than a properly implemented object-oriented language.

At the end of the day, the benefits to the system customiser are that he can perform major system enhancements in a fraction of the time taken with a conventional system, and for the end user, a much richer functionality in his system and far fewer delays in waiting for enhancements.

Object Oriented Databases

One often hears the term object-oriented applied (sometimes wrongly) to many kinds of systems. So what does the term object-oriented database mean? At first sight it seems strange for a term which was originally used to describe a programming language, usually in comparison to procedural languages.

Now when one comes to databases, all conventional databases are (sort of) object-based. In a relational database, the objects around which the data are organised are tables, records and fields. There are some higher level objects, such as views, but there is no explicit representation of the highest level objects such as real world things. Much of the more recent development has been how to embody high level semantics in the model, but still, the database itself does not embody itself any of the behaviour definitions of the objects it contains (Oxborrow 1989).

Now, one doesn't often hear the term procedural applied to databases, although one recalls the work by Martin Newell (Newell 1975) on procedure models, in which he built a modelling system out of procedures, and all operations on the model (such as rendering) made calls to the procedure models. These then made the right responses and did the required things when asked. The behaviour of the objects to be rendered was encapsulated within the procedure models and not within the rendering algorithms. This is similar to the encapsulation concept of object-oriented programming languages.

The term object-oriented database is commonly used to mean that the unit of communication to the database is an object: you put an object in, and you get an object out. But this is a facile use of the term, since the crucial thing about object-orientation is that the objects contain their own behaviour and therefore the database needs to manage the procedures (methods) that define that behaviour. Further, communication with the database should be by a system of message passing where the user is isolated from the actual internal representations of the objects.

One view of an object-oriented database is that it is an extension of an object-oriented language to handle persistence, queries, concurrency (multi-user), security and integrity. Another view is that it is an extension of a conventional database to handle procedures, inheritance and message passing. It is a moot point as to whether the conventional divide between the ephemeral working data of the programming environment and the persistent data in the database should be maintained or not.

Commercial object-oriented database systems are only just beginning to appear in the market place. Relational databases suffer a number of serious short-comings for applications in GIS (Egenhofer 1989, Frank 1988, Oxborrow 1989). Some of these problems stem from the particular features of the implementation of current commercial relational databases, not using the facilities of those databases in the most appropriate manner, and the awful syntax and current limitations of SQL (Herring et al 1988). However, the absence of real world semantics in the relational model itself means that the tools provided are at a very low level.

All implementations of the relational model compromise to some extent Codd's rules, from those which are no more than tabular representations to the ones that satisfy most. In particular, the semantics of range queries and versions are missing. However, tabular representations are a good way to make an efficient engine for managing persistent data. It is therefore our belief at the moment, that a practical way to implement an object-oriented database across many platforms is to combine relational (or tabular) technology with an object-oriented language. This allows higher level semantics to be embodied in the object-oriented environment of the language.

The Magik Object-Oriented Programming Environment

Magik is an extremely powerful language for the implementation of large interactive systems. The language is a hybrid of the procedural and object oriented approaches and program development is carried out in an interactive environment. The interactive environment allows changes to the system to be immediately tested, without a prolonged linking process and regardless of the size of the system.

We have implemented Magik in order to build an open, seamless development environment. The way this is achieved is by embodying the following features in the language and its development environment:

  • There is but one language for system, application and customisation development.
  • Both object orientation and procedural methodologies are supported.
  • Development is in an interactive environment.
  • The language is expressive and very readable.
  • There is an extensive library of standard object classes, methods and procedures.
  • The language is built as a platform suitable for delivering commercial systems.
  • Applications can be transferred with a minimum of effort between hardware platforms.

It is our belief that the presence of all these features is essential if commercial systems are to be developed, maintained and customised with a minimum of programmer effort. It is the lack of a viable language with a sufficient subset of these facilities that has stimulated us to produce our own which embodies all of them.

[ Figure 2 not available ]

Magik allows programs to be developed in one seamless environment, meaning that systems programming, applications development, system integration, and customisation are all written in one environment in the same language. Thus, end users who wish to customise the system can be confident in the quality of the tools provided because they are identical to the development tools used by the core and application system developers. Further, existing systems, such as most database management systems, can be fully integrated so that to the user they appear as part of one homogeneous system.

The Virtual Database

It has been said that GIS could be regarded as an integrating technology providing a window into many disparate distributed databases. If this goal is to be achieved then an architecture is needed in which the databases to be integrated need to be set up as servers to the single client GIS.

There is a number of shortcomings in existing available database technology for the building of a GIS. Nevertheless, there are now many organisat-ions which have committed in a big way to one of the emerging de facto standard database systems. It is not acceptable for a GIS vendor to try to displace such a database with something else tuned for GIS applications. It is necessary to engineer a solution which preserves the user's investment while at the same time doing as good a job as possible in providing a GIS capability. As mentioned above, if one tries to handle all data in the commercial DBMS, then it is highly likely that a serious performance problem will result. If one runs a geometric DBMS alongside, then serious problems of backout, recovery and integrity may result. It would seem that what is needed is some "virtual DBMS" which can act as a front end to two or more physical DBMSs, and that this should handle versioning (Easterfield et al 1990, Newell et al 1990) and have a knowledge of all aspects to do with data dictionary, integrity, and access to the various data. Data modelling of objects allows the user's models to be built with full recognition of their semantic content a key feature not provided in the relational world (Worboys et al 1989).

We have built a low level interface between the object-oriented and tabular worlds in which a table maps onto an object class, a record maps onto an instance and a field maps onto a slot. Higher level abstractions are then modelled wholly in the object-oriented world. Such a representation is ideal for many of the navigational style queries that one undertakes in a GIS.

The User's View Of An Object-Oriented GIS

Any system built using an object-oriented environment could also be built by other methods, and the end user may well be hard pressed to tell the difference. Sometimes, systems claim to be object-oriented, because they are built out of an object-oriented language which is non-interactive. In such systems, end users are denied the major merit of the approach, which is the ability to modify and extend the system on line. The claim to object-orientation may be valid, but, so what, if the rest of the world cannot use it.

Icon driven user interfaces are sometimes called object-oriented. This description might be justified if the icons represent data objects and when clicked they know what to do. For example on the Macintosh, clicking on a document icon results in the appropriate word processor (drawing program, spread sheet, etc.) being started. This differs from function icons to which you must later supply the data.

For a GIS to deserve the term object-oriented it needs rather more than an object-oriented systems language and user interface.

The customiser of a system built with an interactive object-oriented front end language is provided with an extremely open architecture in which he can access and use many existing classes and their methods. He is provided with browsers to explore this rich environment of existing functionality in order that he can utilise it and modify it to make his own extensions to the system. The analogy has been made to the hardware designer who is given an extensive library of standard components of which he knows how they perform to given inputs, even though it is unnecessary to know how they work internally. However, in the object oriented world he can also make his own components (classes) which can borrow behaviour from one or more existing classes (multiple inheritance).

The GIS applications programmer perceives all items as objects which have their own behaviour. Although the data and behaviour may eventually be stored in separate locations (in our case, in a Magik object library and an underlying database), from the user's point of view the objects are self contained items.

As a simple example, consider the following fragment from a GIS shown in Figure 3.

The object type BUILDING understands messages, which are relevant for all types of building, such as foot print (square metres) and volume (cubic metres), which are then automatically inherited by HOUSE and OFFICE.

House extends the behaviour of BUILDING with, for example, approximate gas consumption according to the rules used by the gas board knowing the volume of the house and the number of occupants.

[ Figure 3 not available ]

From the application programmer's point of view, he can retrieve HOUSEs from the database using queries on stored or calculated values.

Within a GIS context, for all these objects, spatial properties such as AREA may be inherited from SPATIAL_BEHAVIOUR. BUILDING and its sub-classes could understand and respond to messages like ADJACENT_TO, CONTAINED_IN, NEAR_TO, etc. Should a user wish to enforce his own definition of say, NEAR_TO for a HOUSE he could do so.

One step further is that geometry (one of the spatial attributes) is also treated as an attribute of an object. That is the geometry of a HOUSE can be retrieved or changed by using methods which operate on the object HOUSE, the fact that the actual geometry may be stored in many separate tables in an underlying database is irrelevant for the GIS applications programmer.

Conclusions

We have been engaged in developing a new kind of software architecture for building and maintaining large, interactive, databased applications such as GIS. The main issue that we have tried to address is the large costs involved in developing such systems and particularly the costs of implementation and customisation. It is our conclusion, that object-oriented programming technology is sufficiently well understood and gives such astounding benefits that now is the time to apply it to real commercial systems. We strongly advocate that such an environment itself should be interactive and that access to all objects in the system should be available in the user interface, and not hidden deep in the bowels of the system where only the vendor's system programmers have access.

We advocate implementing an object-oriented database capability by front ending a version managed tabular datastore with an object-oriented language. Existing databases should also be accommodated in the same way, as large amounts of data are already committed to these databases.

We observe that GIS is particularly well suited to object-orientation, and so the benefits of the approach are considerable.

References

Butler, R. (1988). The Use of Artificial Intelligence in GIS. Mapping Awareness and Integrated Spatial Information Systems, Vol. 2, No. 3.

Easterfield, M. E., Newell, R. G. and Theriault, D. G. (1990). Version Management in GIS Applications and Techniques. Conference Proceedings, EGIS, Amsterdam, April 1990.

Egenhofer, M. J., and Frank, A. U. (1989). Object-Oriented Modelling in GIS: Inheritance and Propagation. Auto- Carto 9, Baltimore, April 1989.

Frank, A. U. (1988). Requirements for a Database Management System for a GIS. PE & RS Vol. LIV, No. 11, November 1988.

Herring, J. R., Larsen, R. C. and Shivakumar, J. (1988). Extensions to the SQL Query Language to Support Spatial Analysis in a Topological Database. Proceedings of GIS/LIS ï88, Vol. 2, San Antonio, Nov 1988.

Newell, M. E. (1975), The Utilization of Procedure Models in Digital Image Synthesis. Doctoral Dissertation, Dept. of Comp. Science, University of Utah, Summer 1975.

Newell, R. G., Theriault, D. G. and Easterfield, M. E. (1990). Temporal GIS Modelling the Evolution of Spatial Data in Time. Conference Proceedings, GIS Design Models and Functionality, Leicester, March 1990.

Oxborrow, E. and Kemp, Z. (1989). An Object-Oriented Approach to the Management of Geographical Data. Conference Proceedings: Managing Geographical Information Systems and Databases, Lancaster University, September 1989.

Worboys, M., Hearnshaw, H. and Maguire, D. (1989). The IFO Object-Oriented Data Model. Conference Proceedings: Managing Geographical Information Systems and Databases, Lancaster University, September 1989.

No comments: