Monday, October 10, 2011

Exporting Smallworld Data to KML

There was a recent question on the sw-gis Yahoo group asking about how to export a Smallworld trail to KML for use in other applications. It turns out that can be done easily using the open source Magik Components Library (mclib). The following video demonstrates how this is done. Important information:
- Magik Components Library link is here
- Thanks to Brad Sileo (iFactor Consulting) for contributing this code to the MCLIB project.



Give it a try and let me know what kind of cool export applications you are using. And if you have any improvements to make to the functionality, please feel free to contribute to MCLIB.

Monday, August 29, 2011

Smallworld Technical Paper No. 14 - GIS in the Cable Market

by John Rand MSCTE, Design Manager, Cambridge Cable Ltd, Cambridge U.K.

John Rand has considerable experience in the cable TV and Telecommunications industry, originally specialising in the Local Network with British Telecom. He joined Cambridge Cable on formation and was instrumental in the development of the integrated cable TV and telecommunications network design. Responsible for selecting and implementing GIS including advising on development. He studied Telecommunications at Southgate College, London and cable television at Atlanta Georgia. Currently studying Management at Anglia University, Cambridge, specialising in Operations and Project Management.

Abstract

This paper will look at the cable industry's requirements for a Geographical Information System (GIS) and the contribution a GIS will make to the cable TV telecommunications business. Analysis will be made of existing CAD systems and the GIS capabilities applicable to the unique UK cable industry and the reasons for changing systems. This industry is expanding at a rapid rate and I have illustrated how GIS can be used to meet the business plan goals. The areas of implementation, finance and marketing are also discussed.

The paper concludes that to reap the full business opportunities presented to this unique industry a GIS is the only credible system available for utilising the company's resources to their maximum benefit.

Introduction

GIS is only just starting to emerge as a useful tool within the cable industry. Its presence has been known for many years and although a few vendors have tried to develop a successful product, none until recently appear to have achieved this. However that scene now appears to be changing and we are at a point now where before us lies a seemingly endless vista of possibilities for the use of the system within the industry. Though the nature of the system is far-reaching in every aspect of our business, the implementation and development of our requirements are by no means straightforward and indeed are proving to be quite complex. This paper will look at how Cambridge Cable intends to use this "State of the art" technology through its benefits, to reach our vision of becoming the premier provider of entertainment, information and communications services for the benefit of the community and our customers, employees and shareholders and how this technology can benefit the whole industry. The presence of GIS will eventually be experienced in just about every department of the company providing the core information to drive the company forward. It will eventually be as commonplace as any other information system only more crucial.

The leading cable companies in the UK are beginning to implement GIS. This paper will look at the benefits of GIS to the cable industry, the newest of the utility companies.

The UK Cable Market

In order to appreciate fully the importance of the GIS industry within the cable market it is essential to understand the unique nature of the cable industry within the UK and the implications. A brief illustration of the industry follows.

UK Cable Market

Within the UK, 139 cable television franchise areas have been created by the Department of Trade and Industry (DTI). Franchise areas cover areas of dense population therefore only 70% of the UK is covered, however, expansion and creation of other franchise areas are possible. The first 12 franchises were awarded in the mid 80s and the remaining 127 franchises areas in the last five years. Cable television companies holding franchisee area licences can also apply for "Public Telecommunication Operator Licences" (PTO) for their areas, therefore creating a dual service industry, cable TV and telecommunications. The regulatory bodies for cable television and telecommunications are the Independent Television Committee (ITC) and Office for Telecommunications (OFTEL) respectively. The provision of two major service products by one company makes us unlike any other utility company.

[ Figure 1 not available ]

Cambridge Cable's Position within the UK Market

Cambridge Cable Limited (CCL) was formed in July 1988. It was awarded the Cambridge franchise in June 1990 and started constructing the network in June of 1991. The Anglia franchise was acquired in December 1992 thus making a total of approximately 200,000 homes covered by our operation. CCL is jointly owned by Comcast Communications of Philadelphia, USA and Singapore Telecom International.

How the Industry Works

The measure of the size of a franchise or company is how many homes fall within the franchise boundary; the penetration of our services into this number is one of the core statistics to watch. This represents expected income revenue with which to repay investment. The income from the two services is subtly different in that from cable TV it is a set flat monthly rate, depending on the chosen package, whereas telecommunications revenue is dependent upon usage. As owners of a cable TV franchise, there is no competition for broadband services, (British Telecom cannot operate cable television services on its network until 1997); however, British Telecom is our main competitor for "Local Loop" services. This is where the cable companies must use all their resources to succeed and gain the upper hand. The leading UK cable companies pride themselves in using "state of the art" technologies and practices to achieve this and thus the importance of using GIS becomes abundantly clear.

Economics of the Industry

The basic economics of the industry are similar to those of any other; finance is raised and used to construct an infrastructure network over which our services can be carried. Both services will be constructed as one network. The incremental costs for the second network are minimal as the greatest investment lies within the civil construction costs.

Cambridge Cable has four main strategic goals. With the assistance of GIS all these objectives can be achieved and maintained efficiently.

1. To create and develop profitable market opportunities. Through geographical market analysis, correct products and services can be determined and potential opportunities exploited.

2. To provide a wide range of differentiated quality services and products at competitive prices. Again GIS will be invaluable as a tool for market analysis.

3. To ensure the network is "future-proof", user-friendly and cost-effective. GIS will be used to simulate different architectural models and assess new technologies, giving us the required information to build an economical and reliable network.

4. To hire, develop, and retain the right people at the right time. The implementation of GIS as "state of the art" equipment demonstrates commitment by Cambridge Cable to new technology and to providing people with the right tools and information to develop careers.

[ Figure 2 not available ]

GIS will be a significant force in achieving these goals and contributing to the company's success.

The obstacle to this is that the existing situation relies on manual interaction between the different departments. Currently, design is drafted on a CAD system and from there on is printed and used in paper format. Other systems exist within the cable TV operation, Subscriber Management System, Network Management Systems and the Telecom Network Circuit Assignment Systems, however none of these interact leaving numerous opportunities for miscommunication and "information-error".

The supra-system of the business and the requirement for return on investment and instant current information on network and customers is causing stress on the sub-systems of:-

Subscriber Management System

Network Management System

Telecom Circuit Assignment System

plus various manual systems.This creates the need for a global system. The answer is the implementation of a GIS which can facilitate the interaction of all these sub-systems.

The combination of return on investment, instant access to current information on the network and customer information is essential; the supra-system is causing stress on all three of these sub-systems, creating the need for a global system. The answer lies in implementing GIS which can facilitate the interaction of these sub-systems.

The Cable Industry and GIS

Cable Requirements of a GIS

The process of obtaining customers to bring in revenue begins with constructing the network, therefore, design is required. The principal requirement is for a system able to produce comprehensive designs, information on the areas already constructed and on the network status which will be readily available to those requiring it at any time. An ideal example of how a complete system would work is as follows:

1. Survey information would be collated from the field on a portable PC and input directly on to the digital Ordnance Survey map. This information would then be downloaded into the GIS and the design created thereon.

2. On completion of the design, customer addresses would be transferred to the Subscriber Management System and automatically "populated". Telecom assignment information would also be transferred to the Circuit Assignment System and purchasing would also automatically receive Bill Of Information (BOM) information.

3. The GIS information would be linked to the cable television and Telecommunication Network Management Systems. Should there be a network performance problem or outage, instant geographical information/reports can be generated alongside instantaneous information for customer services.

4. Black spot analysis for maintenance purposes becomes effortless and marketing can identify meaningful information.

[ Figure 3 not available ]

History of GIS in the Cable Industry

As mentioned earlier GIS has never quite found its feet within the cable industry. The majority of cable companies have either used pen and paper or CAD systems which are extremely stylised (Newell and Sancha 1990). A few GIS vendors have tried to develop GIS systems for the cable market. Cambridge Cable purchased a CAD system early 1991 and had been using it as a successful tool until recently. Many functions available on GIS were not available on CAD and this was compounded by the lack of support given to the product. The CAD system imposes severe limitations on effective use within this rapidly growing industry, Newell and Sancha (1990) commented "Several of the established CAD vendors tried to adapt their CAD systems for GIS applications. This resulted in most unsatisfactory compromises"; "CAD vendors continued to try to convince the industry that they had a viable product by integrating their databases with the CAD function" and item referred to "marrying together two inadequate systems" (Newell and Sancha 1990). The two technologies of database and CAD do not integrate easily. In late 1992 the industry started to talk more about GIS. No single GIS system proved to be totally reliable and no single vendor stood out. Cambridge Cable were approached by Smallworld - a company well established in GIS and based locally in Cambridge - to work with Smallworld to develop the combined cable TV and telecommunications model. Also to establish an unrivalled product for the industry. Fundamental requirements included; data capture, performance, customisation and integration (Newell and Theriault 1989). These aspects were severely limited or non-existent with Cambridge Cable's current system.

[ Figure 4 not available ]

Implementation of GIS

It is clear now that no improvements to the CAD system could have provided our business with its requirements. The GIS is now installed within the design department and is already proving beneficial through its ability to calculate system performance thus helping to obviate the need to use design contractors.

The next stage of implementation is to integrate the Network Management Systems and Subscriber Management System etc.

The final and more idealistic stage of implementation is that to integrate GIS throughout the company to provide full, up to date network information to everyone. "As the number of users sharing information in this way increases, the system will constitute a continually improving Geographic Information System for the benefit of all" (Bernhardsen and Tveitdal 1986). This will also improve the "work conditions for the specific personnel groups" Kubik, Merchant et al 1987) in that having relevant current information immediately at hand will be of enormous benefit for optimum performance.

Why Invest in GIS

Should a cable company invest in GIS? Considering the magnitude of the initial investment, should they stay with their manual paper or CAD methods? When also is the best time to invest? The analogy I put forward is that of comparing GIS to that of computing in the 60s. The first generation have high purchase costs, high maintenance and few benefits but as things have progressed no business would be without one. GIS is now sufficiently developed to be useful to the cable industry. Moving eventually, like computing in the 60s, no cable operation will exist without a GIS. With networks growing at a tremendous rate any wise company would invest in GIS. The thought of transferring enormous amounts of data at an advanced stage of the build is horrifying and expensive.

The investment in a GIS system as a proportion of the total investment in constructing the total network is only a fraction of the costs. Considering this will be controlling the network assets and providing the benefits described later, it can be seen as an essential long-term investment.

What are the Benefits to the Cable Business

  • Improve the quality of network design by enforcing engineering rules and standards which can be preset and fine tuned.
  • Facilitate the achievement of cost reduction and quality improvements in passing addresses to the Subscriber Management System with implementation of an automated interface.
  • Improve repairs to the network through the provision of visual aids on fault investigation.
  • Simulation of different architectural network models to evaluate the most cost-effective solution to design scenarios, e.g. "Fibre to the Feeder" architecture versus "Fibre to the Kerb".
  • A GIS is a very useful tool in evaluating new technologies and their impact on existing networks, e.g., PDH Versus SDH
  • Analysis of financial comparisons of percentage turnover ploughed in against speculation.
  • Having advanced equipment attracts the right calibre of staff and enables them to advance their careers with current technology.
  • Attraction of investment within the company by being seen as innovative and conscious of the need to have accessibility to vital information.
  • Accurate inventory of assets and asset management for capital accounts. Also analysis of potential acquisitions including identification of existing or potential plant within those areas.
  • Interactive queries for precise retrieval of information concerning the network.
  • Quicker response times to customer orders, due to readily available information, especially regarding telecom enquiries and indication of likely installation dates.
  • Substantial savings can also be made through integrating GIS with purchasing, warehousing, and the construction programme. Bill of Materials (BOM) created by GIS can be transferred to the purchasing computer system where they can be ordered on minimum lead time in relation to the construction schedule and received in the warehouse for "just in time" materials management.
  • Geographic survey information captured on GIS can be sold commercially to any other parties interested in such data.
  • It can also query information without the need to survey.

Economic Benefits

Many of the above points can represent very tangible cost savings and through collation of data this can be proved. However with GIS there are considerable intangible cost benefit savings that only GIS can give, as opposed to improving existing systems. An example of this is that, after the initial investment in GIS, savings in staff can be made without the usual element of human error.

Intangible benefits:-

More information (marketing, customer service, fault locates etc) Better analysis with less labour time (marketing, new technologies) Ability to do analysis not possible before (new RF and telecom technologies) Better decisions (build areas, new technologies) Better planning (network design, business plan) Better understanding and analysis of highly complicated systems

[ Figure 5 not available ]

Return on Investment

Deciding where to build currently concentrates on areas of highest density. This is desirable because eventually we want to provide service to every home in the franchise area. The denser areas are typically those with the best demographics although, currently, no marketing analysis is done to determine if any of these areas are better than others. Through using GIS as a sophisticated marketing tool and analysing areas, the best potential dense areas can be built first. Early high penetration will be achieved and high revenue will be received, yielding a high return on investment.

Cable Marketing and GIS

Marketing

It is essential to hit the right potential customer base with the right services. Traditionally lower socio-economic groups are better target groups for cable TV while the higher groups are more likely to be interested in telecommunication services. However it is being found now that the types of socio-economic groups mean slightly less than customer "Lifestyles" which concentrate on the use of disposable income. If this is now to be used, data set analysis can be done prior to design in order to identify the correct market and concentrate in building in that area first. Without GIS the process would be a very lengthy and laborious task.

Since telecommunications is regarded as an essential service, churn is not experienced to the same extent as with cable TV customers. If market identification and "right sizing" can be advised prior to a sale, enormous savings can be made in the areas of abortive sales calls, installation and equipment retrieval. Sophisticated marketing analysis can be done on the remaining potential customer base to determine the required product.

Conclusion

Clear benefits can be seen in implementing a GIS system within a cable business and the advantages are clearly defined. There is also confidence within the industry that there are credible vendors with a tremendously useful product of enormous value to a company's operation.

The near future for GIS looks exiting and in the long term there will be far reaching effects on our business. It is an essential tool in effective competition. A culture change in the working environment will be required to make acceptable this prolification of invaluable information. Precise marketing is that key and, by using GIS to interact and analyse all available information, cable companies will be able to achieve greater success within their market.

References

Bernhardsen,T and Tveitdal,S. 1986. Community Benefit of Digital Spacial Information. VIAK A/S - Auto Carto London, Vol.2.

Dickinson,H.J. and Calkins,H.W., 1988. The Economic Evaluation of Implementing a GIS. International Journal of Geographical Information Systems, Vol.2, No.4, pp307-327.

Joint Nordic Project, 1987, Digital Map Data Bases, Economics and User Experiences in North America (Helsinki, Finland: Publications Division of the National Board of Survey, Finland).

Kubik,K., Merchant,D. and Schenk,A. 1987. Design Considerations for Urban Information Systems. A-ASPRS- ACSM, Vol.5.

Marble,D.F. and Peuquet,D.J., 1983. Geographic Information Systems and Remote Sensing. Manual of Remote sensing, 2nd ed, American Society of Photogrammetry, Vol. 1.

Newell,R.G. and Sancha,T.L., April 1990. The Difference Between CAD and GIS. Computer Aided Design magazine.

Newell,R.G. and Theriault,D.G. September 1989. Ten Difficult Problems in Building a GIS. Presented at British Cartographic Society Symposium, Cambridge.

Theriault,D.G. April 1989. An overview of Geographical Information - the technology and its users. Presented at conference, Geographic Information Systems.

Acknowledgements

David Theriault, Smallworld Systems Ltd. Keith New, Cambridge Cable Ltd.

Glossary CHURN - Turnover of customers, disconnections after connection. OUTAGE - Complete loss of service. PDH - Presynchronous Digital Hierarchy. SDH - Synchronous Digital Hierarchy.

Smallworld Technical Paper No. 13 - The Wide Area Connection

by John Rowland, Grampian Regional Council

Abstract

GIS data is voluminous, demanding upon bandwidth and therefore normally requires high speed network links. This has served to constrain "real time" wide area distribution of GIS data. In conjunction with British Telecom, Gandalf Digital Communications Ltd and Smallworld Systems Ltd, Grampian Regional Council believes it has been able to implement a realistic solution to this problem using Smallworld's recently developed "Persistent Cache" functionality running over British Telecom "Kilostream" links.

This paper:

  • briefly explains Grampian Regional Council's requirement for wide area GIS;
  • overviews wide area communication options;
  • explains the basic concept of intelligent bridging;
  • describes the key features of the Smallworld System which have been used to implement wide area connections;
  • reviews experience to date;
  • briefly considers what the future may hold.

Grampian Regional Council & its Corporate GIS

Grampian Regional Council administers a land area of approximately 8,000km2 which is home to a population of 530,000, half of whom live in the City of Aberdeen. As with other Scottish Regional Councils, its responsibilities include the provision of water, drainage, roads, economic development, strategic planning, fire, police, education and social services.

The Council's main headquarters is Woodhill House in Aberdeen, some departments also operate from a number of divisional and other offices located throughout the Region.

In 1992 the Council commenced implementation of a Corporate GIS which was installed in Woodhill House for use by four departments (Economic Development and Planning, Property, Roads and Water Services). The Council selected Smallworld GIS running under UNIX as its core system. At present all departments share a single corporate GIS database which is managed by a Sun MP630 file server. With ongoing data capture this database continues to increase in size; at the time of writing it held 4GB (Giga bytes) of GIS data.

Responsibility for maintaining this database and ongoing implementation of the system on behalf of user departments is vested in a six person team called the "GIS Unit". To date the Council has acquired a total of twenty nine GIS "seats" from Smallworld with more on the way. Ten of these seats have recently been acquired by the Department of Water Services for installation at six different office locations remote from Woodhill House (see figure 1).

Until recently it had not been viable for the Council to operate their Corporate GIS over a wide area network. However, Smallworld's recently developed Persistent Cache database management software combined with "state of the art" network bridge technology has enabled the Council to implement wide area connections using leased British Telecom Kilostream lines. At the time of writing two of the Department of Water Services' remote offices have been connected to the main file server in Woodhill House.

[ Figure 1 not available ]

Wide Area Connection Components

The wide area connection has four key components: a physical communication link (British Telecom 64Kbps Kilostream in the first instance), intelligent bridging (Gandalf LANLine), Smallworld version managed GIS database and Smallworld Persistent Cache software (2).

Physical communication links

A 500m x 500m Ordnance Survey vector tile of an urban area typically contains around 250Kbytes of uncompressed data. In order to pass such a tile over a network and display it in a total elapsed time of less than 45 seconds the network has to pass data at a speed of in excess of 44Kbps (Kilo bits per second). In order to view 1km2 of similar data in the same time the speed would have to increase to in excess of 180Kbps.

This should not be a problem over a local area networks with a bandwidth of10Mbps (Mega bits per second). However, if all that there is between office locations is a public telephone network, a couple of high speed modems operating at 14.4Kbps and the inherent "dial up" delay of analogue communications, then there clearly is a problem.

There is no alternative but to seek a digital communications link . Depending upon what you are prepared to pay, digital links can provide effective line speeds of 64Kbps up to in excess of 8Mbps with minimal "dial up" delay. They can either be ISDN ("pay when you use") dial up links or dedicated "Kilostream" or "Megastream" leased lines.

ISDN

ISDN In United Kingdom ISDN (Integrated Services Digital Network) is available either as ISDN2 providing an effective 128Kbps line speed using two 64Kbps channels or ISDN30 providing an effective 1.92Mbps using thirty 64Kbps channels. At the time of writing British Telecom ISDN2 socket installations were being charged at approx £400 per site, line rent at £84 per quarter and transmission at normal telephone call rate charges.

Leased Lines

Leased lines normally incur an initial installation charge and a subsequent annual rental charge which varies according to distance from the nearest digital exchange. At the time of writing British Telecom were charging £900 per site to install 64Kbps "Kilostream". The annual line rent of a link between two sites varies according to distance and proximity to BT exchanges, some indicative figures are quoted in the ISDN2 v Kilostream comparison below.

In contrast 2Mbps "Megastream2" currently costs £6,200 per site plus £750 per link for a first installation and 8Mbps "Megastream8" £9,734 per site plus £2,625 per link. Line rents vary according to distance between BT exchanges, for example if two exchanges were 50km apart, Megastream2 would currently cost £15,740 per annum to rent and Megastream8 £55,108 per annum. Even the most optimistic GIS cost benefit analysis may have difficulty in justifying expenditure of this magnitude!

Despite current talk of information super highways it is of little surprise that many multi site GIS installations are still reliant upon using tapes, discs and couriers to transfer data between individual sites.

The wide area connections to Grampian Regional Council's six Water Service remote offices are being implemented using a single 64Kbps Kilostream channel to each office.

[ Figure 2 not available ]

Intelligent Bridging

Bridge or gateway devices are needed to connect the physical wide area communication link between two remote sites to the local area networks (LANs) at those sites.

A bridge is effectively a filter which joins two network segments such that data will only pass through the bridge to a second segment if it is destined for a device connected to it. Bridges are commonly used to segment local area ethernets so that unwanted data packets are not allowed to flow along segments where they are not needed.

In a UNIX environment bridging is achieved using the IP (Internet Protocol) part of the TCP/IP protocol (1). Every device connected to an ethernet has its own unique IP address. A data packet being transmitted from one device to another always carries with it the IP address of the device to which it is being sent. In the case of a data packet which is broadcast to all devices on a network the IP address is coded so as to indicate that it needs to be delivered to every device.

Gateways are special devices for transferring data between two different networks which adhere to different network protocols. As such they actually have to restructure the data packets which pass through them and are therefore inherently slower than bridges.

Wide area physical communication links between sites are nearly always slower than the local area networks they connect together, hence bridge or gateway devices are needed to prevent unwanted local area traffic from escaping to and causing congestion on the physical wide area link. Bridges supplied by Gandalf and other vendors for this purpose incorporate a number of intelligent features to enhance their performance.

Data Compression

Data Compression algorithms are used to compress transferred data, so as to achieve actual throughput which exceeds the quoted bandwidth of the physical wide area communication link. The degree of compression depending upon the extent that data is already compressed. For example tests at Grampian Regional Council indicate that their Gandalf "LANLine" bridges operating over 64Kbps Kilostream are able to compress raw NTF files by ratios in excess of 3:1 and already compressed TIFF files by ratios of around 2:1, thus achieving effective throughput of data in excess of 192Kbps for raw NTF and 128Kbps for TIFF. Even higher compression ratios of up to 8:1 can be achieved with these devices.

[ Figure 3 not available ]

Transparent Automatic Dial Up

Transparent Automatic Dial Up Bridges built specifically for connecting local area networks to "dial up" links such as ISDN embody an "automatic dial up facility whereby (for UNIX networking) the bridge is configured with a table which maps different network IP addresses to the phone numbers to which they are connected. Thus packets emanating from a "departure" site will cause their interconnecting bridge to automatically dial up the phone number of the "destination" site.

ISDN bridges will normally also have a configurable "time out" connection period which specifies how long an ISDN connection should remain connected for after a packet has been transmitted. For example if the time out were set to 30 seconds then the connection will close every time there is a break of 30 seconds between transmitted packets. Given that ISDN connection dial up can be made in as little as 5 seconds it is quite feasible to make several very short connections during the course of the working day and only incur a relatively small phone bill.

Automatic dial up and subsequent timed out disconnection is totally transparent to the user thus the ISDN bridge provides a virtual permanent connection.

Bandwidth On Demand

Bandwidth On Demand ISDN2 incorporates two individual 64Kbps channels which can either be used in parallel to achieve an effective 128Kbps bandwidth (with compression actual throughput will be even faster), or separately to send data to two different destinations at the same time. Similarly Kilostream can be installed in multiples of 64Kbps channels and used in much the same way.

"Bandwidth on demand" characteristics of local to wide area bridges enable individual ISDN and Kilostream channels to be automatically opened and closed to different destinations according to actual traffic volumes. With the Gandalf "LANLine" bridges it is also possible to mix and match Kilostream and ISDN together such that an ISDN connection can be opened when a single permanent Kilostream channel becomes overloaded.

Virtual Extended Local Area Networks

The net effect of state of the art intelligent bridging used in conjunction with digital wide area communication links such as ISDN and Kilostream is to create a virtual extended local area network. In a UNIX environment client workstations located at one site can access server devices at another site several kilometres away as if both devices were connected to the same local area network. Albeit with degraded performance if the volume of data being transferred between sites exceeds the available bandwidth of the physical wide area link.

Not only does this permit remote offices to access main office data, but also to output data to peripheral devices, such as expensive large format electrostatic plotters, located in the main office.

Database Version Management

In order to understand how Persistent Cache is being used to provide Grampian Regional Council's "wide area connection" it is first necessary to provide a brief explanation of their implemen-tation of Smallworld's version managed database.

Smallworld Version Management permits several versions of the database to exist simultaneously. In Grampian's case these versions are organised hierarchically as illustrated by figure 2. There is a single definitive top alternative" which is normally never written to directly. Each department is then provided with its own version of the "top alternative" which again are normally never written to directly, instead all users who are required to write to the database are each provided with their own "personal writable alternative".

For routine data capture work users are usually asked to update their departmental alternative on a daily basis by "posting up" their own personal alternative to it. This has to be preceded by a "merge down" of all changes which have already been posted to their departmental alternative. Once all personal versions have been "merged and posted" a departmental administrator then ensures that their own department's alternative is "merged and posted" to the "top" definitive alternative. Thereby inheriting changes and updates made by other departments.

Grampian Regional Council's GIS Unit is responsible for maintaining the Ordnance Survey map base and other shared corporate datasets such as a number of different gazetteers. Within the alternative structure the GIS Unit is treated as another department thus departments, and in turn end users, have their map base maintained for them by virtue of the "post and merge" procedures.

Within the UNIX file system the GIS database is held in a set of files storing different types of data (eg geometrical points, lines, areas, associated attributes etc). Database alternatives can be created so as to either be located totally within a file set held in a single directory or, created so as to reside in a separate sub directory with the same file structure. Thus the UNIX file system can if desired be configured so as to totally or partially mirror the database alternative structure (figure 3). This in turn implies that different alternative versions of the database can be stored on different storage devices on the same network.

Persistent Cache

Smallworld's Persistent Cache software (2) enables all or a subset of a GIS database to be cached to a local disc attached to a client workstation which is in turn configured to be a local cache server to both itself and other clients. By maintaining a copy of frequently accessed data in the local cache, it is an elegant and transparent way of providing large systems with high performance over low speed communication links.

In figure 4, workstation A is a local cache server located at a remote site along with client workstation B. GIS read transactions generated by workstations A and B look first to the local cache to retrieve data. If the requested data has not been cached it is retrieved from the main file server via the wide area connection and then cached.

The local cache has a configurable operating capacity, once this capacity has been filled old cached data is deleted from the cache on a "least recently used" basis. The cache capacity can be set to be large or small depending upon the size of the required database subset. If need be (local disc space permitting) it could be set to be large enough to replicate the original database.

When using Persistent Cache, remote site users are able to retrieve cached data very quickly and uncached data at the speed of the wide area connection. Hence if a subset of the main database is cached there will be occasions when read transactions may suddenly appear to slow down as data is retrieved over the wide area connection.

Write transactions write directly to the user's alternative every time a database record is inserted, updated or deleted and then subsequently copied back to the local cache.

At appropriate periods of time, remote site users initiate merging and posting of their changed data with higher order alternative versions. The merge and post processes are run on whichever machine the various alternatives are held. The local cache being updated where new "merged down" change data is located in a geographical area that is already held in cache.

By virtue of the ability of being able to map alternative versions of the database onto different UNIX directories (see figures 2 and 3) user's alternatives can either be held on the main server back at headquarters or somewhere locally at the remote site. This provides organisations with a high degree of flexibility as to how they operate over wide area connections.

Holding Remote Site Alternatives on Main Server

If users' alternatives are located at headquarters then all write data is passed over the wide area connection whenever a database record is inserted or updated. In a data capture environment this implies that relatively small amounts of data are passed frequently over the wide area connection.

Database commits and alternative version posting are processed back on the main server and therefore no data is passed over the wide area connection. Similarly the merge process (merging down of changed data from higher order alternatives) is also undertaken back on the main server, however the amount of changed data passed back across the wide area connection will depend upon the volume of merged down changed data which maps onto currently cached "geography". By holding all remote site alternative change data on the main server the remote site users do not need to be concerned with data backup and other routine system administration tasks which can all be undertaken back at headquarters.

[ Figure (diagram) not available ]

Holding Remote Site Alternatives Locally

By holding user's alternative change data locally no write data is passed over the wide area connection until the locally held alternative versions are merged and posted with and to higher order versions located back on the main file server. If daily posting and merging is undertaken then this implies a daily transfer of a larger volume of change data over the wide area connection.

The volume of changed data merged back down to the locally held alternatives is entirely dependent upon the amount of data which has been recently posted to the top (definitive) version of the database by other users. This could be considerable if say a new batch of Ordnance Survey maps had been recently loaded.

Populating the Local Cache

The local cache is essentially an extended reflection of the data which a local client work-station holds in memory. It is therefore composed of a subset of object class layers for "blocks" of geographical extent. For example Grampian's Water Service divisional offices cache background map and water supply object class layers for all or part of their divisional areas of operation.

Upon initial creation the local cache is "empty" and must be populated. Users can be left to do this during the course of natural usage, upon first access all data is "hauled" over the wide area connection and then cached. This could be a little tedious if two or more users at the local site are simultaneously hauling data over a 64Kbps line. They could therefore instead organise to "zoom out" to a large extent of geography as they leave for home so that the area in which they wish to work the following day has been cached upon return to work the following morning.

Alternatively initial cache data can be written to tape by staff back at headquarters and then copied into the local cache in order to "kick start" it.

Grampian Regional Council's Wide Area Connection

Kilostream v. ISDN2

Although the Council already had some operational wide area communication links it was decided that the Corporate GIS would have its own dedicated links because of difficulties in extending heavily subscribed existing facilities to sites where GIS was required. The lowest cost option able to provide acceptable performance was therefore sought. This turned out to be a choice between ISDN2 and single channel Kilostream. Capital installation costs were very similar for both (approx £2,500 per site) however, in the case of ISDN2 ongoing running costs varied considerably according to degree of use.

For total daily connection times of less than about four hours per working day ISDN2 is cheaper to operate than fixed fee Kilostream as illustrated below for a notional 247 working days per year at current British Telecom day rate call charges:

[ Figure (cost notes) not available ]

The above costings indicate that the most cost effective option is dependent upon the nature of GIS use at the remote site. If there is a low level of write transaction at a site where a significant proportion of the database is held on the local cache then ISDN2 provides a very flexible and potentially inexpensive wide area link. However, if there is a high level of regular write transaction or considerable regular "hauling" of uncached data throughout the working day then Kilostream is going to be the more viable.

Because it was known that the first two Water Service offices to be connected were "heavy" GIS users (they had been previously using GIS in a standalone capacity) and there still appeared to be technical problems handling broadcast messages over ISDN it was decided to adopt Kilostream for the first wide area connections.

Experience to date

Initial use indicates that the successful operation of the wide area links is more dependent upon operational management than technical factors. The two remote sites connected to date comprise of two locally networked GIS workstations currently used for data capture work. By its very nature data capture work does not involve frequent extended panning across the map base, hence "hauling" of uncached data has not been a problem with a relatively large capacity cache which was pre-populated prior to installation.

Data transfer across the wide area connection performs rather like a motorway contraflow, in so much that if there is very little traffic on the motorway then, ignoring speed limits, traffic flow is virtually as quick as if there were no contra flow. However as the volume of traffic increases the actual throughput speed decreases in almost exponential proportion.

1km2 of inner city water data takes only slightly longer to display when retrieved over the wide area connection as when retrieved straight from cache. However 1km2 of inner city water data plus all Landline OS data takes significantly longer to display.

Grampian's two Water Service offices have been configured so that local user's alternative versions are stored back on the main server, consequently data is passed over the Kilostream every time a record is inserted or updated. Users have noticed a degradation of write transaction time when they both write simultaneously. The degree of degradation is acceptable but does indicate that sites with a number of writing users may need to either store their alternative versions locally or be provided with access to additional communication channels over the wide area link.

The conclusion to date is that the nature of GIS usage needs to be understood in order to specify and configure a wide area connection for optimum performance.

[ Figure 4 not available ]

What Of The Future

Grampian Regional Council believes that it has been able to implement wide area networked GIS at realistic cost using technology which is available today. It has been proven that a single channel Kilostream link operating at 64Kbps is adequate for the scale of present implementation. Furthermore this has been achieved with a great deal of "behind the scenes" activity which is totally transparent to the user.

The computer press makes great play of cheap high speed local and wide area ATM (Asynchronous Transfer Mode) networks being the way of the future (3), however the technology is not yet available and until it is, it is difficult to see how GIS data can be viably transferred between different systems in anything like real time.

In the longer term the Council is keen to reduce the cost of providing wide area connections to more marginal GIS users by using ISDN2 instead of Kilostream. It is also keen to exploit the potential for transfer of data between different organisations using ISDN. The cost of operating ISDN2 between locations over 35 miles apart is the same no matter whether they are 36 or 500 miles apart. Unlike "fixed" Kilostream links, ISDN connections can be made between any two locations which can dial to one another.

Persistent Cache has also been seen as a way of relieving congestion on heavily used local area networks. The Council is currently planning a 6 seat GIS sub network in its headquarters which will use Persistent Cache to reduce the volume of GIS data over the building's main backbone LAN.

Acknowledgements

The authors wish to thank British Telecom, Gandalf Digital Communications Limited, Grampian Regional Council and Smallworld Systems Limited for their support and assistance in compiling this paper. Particular thanks go to Alistair Reid, Andrew Swanson and George Wallace of Grampian Regional Council for their part in installing wide area connection components and Andrew Reid of Gandalf for his enthusiastic support, also to the staff of the Department of Water Services for acting as "test drivers".

References

1. SOUTHERTON A. Modern UNIX, Chapter 4, Wiley 1992.

2. NEWELL R.G. BATTY P.M. GIS databases are different. Proceedings of the AGI 93 Conference Part 3.

3. UNIX NEWS No 56 October 1993, ATM is the wave of the future p63-65.

Copyright © 1996 Smallworld Systems, Inc. All rights reserved

Smallworld Technical Paper No. 12 - AM/FM Data Modelling For Utilities

by Peter M. Batty

Abstract

Data modelling for AM/FM is more complex than for traditional applications for a variety of reasons, in particular because of the need to model spatial and topological relationships. This paper examines the general differences between data modelling for AM/FM and other applications, and looks at a range of common modelling issues in utility applications, including the efficient handling of various types of tracing and network analysis, outage management, and generation and maintenance of schematics.

Introduction

There are some significant differences between data modelling for traditional applications and GIS or AM/FM, which arise from the need to model spatial and topological relationships. Utility applications have some additional complications compared to many GIS applications, due to the need to model complex networks, and these complications are not well handled by traditional GIS data models (the term GIS data model is used in two different contexts in this paper: one is the system data model used by the GIS vendor to model spatial and topological aspects of data stored in the system, and the other is the user-specific data model developed on top of the core system for a specific application).

This paper starts by discussing the general differences between data modelling for GIS and other applications, from a user application perspective. It goes on to look at a variety of modelling issues relevant to utility applications, primarily from a system data model point of view. In particular a number of non-traditional modelling techniques which offer benefits for utility applications are examined. A more detailed look is then taken at a fairly complex application which is common to most utilities, outage management, considering how the system data model features which have been discussed can be applied in practice, and what sort of trade-offs need to be considered in the design. Finally, the creation and maintenance of the user-specific data model is discussed, and in particular the use of CASE tools is examined.

Why Is GIS Data Modelling Different?

A traditional data modelling exercise goes through two major stages: the production of a logical model, and then a physical model. The logical model represents a model of the real world and is completely independent of how the system is physically implemented. The physical model represents the actual data structures (typically tables) which are used to implement the logical model. By far the most common way of representing a data model is using some form of entity-relationship diagram in which the objects (or entities) which need to be modelled are displayed as boxes, and the relationships between them are represented by lines.

Dangers in Trying to Represent Spatial Relationships

A data model for a traditional application explicitly records all relationships between entities which may be of interest to the application. This provides a big trap for the unwary person producing a GIS logical data model, because in a typical GIS application there are many relationships which will be derived implicitly from the spatial characteristics of objects. For example, in a utility application you might want to know which facilities are in a given work area, which are in a given tax area, which cables are underneath roads, and so on. In a traditional logical model this would result in defining relationships between work area and all facilities, between tax area and all facilities, between cable and road, and so on. It is possible to very quickly end up with a logical model in which almost everything is related to everything, like the following:

[ Figure not available ]

It should be fairly obvious that this is not a very helpful model, although the author has seen a number of logical models like this. The second variant of the not terribly useful logical model is the following, which again the author has seen in real situations:

[ Figure not available ]

In this case the designer has realised that the previous logical model is not useful, and has tried to show that objects are related by their location. While this model represents a little better what we are trying to model, it is still not very helpful. Firstly, objects are in the majority of cases not related because they have exactly the same location, but because their locations have some more complex relationship between them, such as one crosses the other, one is inside the other, or one is within 100 feet of the other. This could in theory be modelled by defining suitable relationships between locations, but in practice it is totally impractical to maintain explicit relationships between all locations. Secondly, since all objects are shown as having a relationship to location on this diagram, this again clutters things up (if non-spatial relationships were also shown on this diagram it would be very confusing). If we understand that any object may have a spatial component to it, and that through this it can be related to any other object with a spatial component, then it is generally clearer not to try to show this on our logical model.

Although the previous two models were somewhat simplified examples, as a general rule it makes sense to omit spatial relationships from a logical model. This is not an absolute rule though. In particular there are some relationships which could be derived spatially but which could also be regarded as more explicit. A common example is a hierarchical aggregation of areas. For example, in a utility there is usually a hierarchy of organiz- ational areas, such as local office areas, which are aggregated into districts, which are aggregated into regions (use of the terms district, division and region, and their position in the hierarchy, seems to vary from one utility to another). Since a district is always the aggregation of a number of local office areas, this is quite a strong relationship and it may be useful to show this on the logical model (whether this relationship is modelled explicitly in the GIS is a separate implementation decision which will be discussed later).

Representation of Topological Relationships

Similar issues apply to the representation of topo-logical relationships: which objects can be connected to which other objects? As with spatial relation-ships, topological relationships can be quite exten-sive, especially in a utility application, and show-ing them explicitly may clutter up the data model diagram. However, there is a better case for explicitly showing topological relationships than spatial relationships. The main one is that in a utility application there are usually definite rules which are associated with topological relationships, for example a cable can be connected to a trans-former but not to a gas main. This is not generally the case with spatial relationships, for example you would typically not prohibit a cable from crossing a gas main (there may be some such spatial constraints, but they are normally far fewer, and too complex to represent on a data model diagram).

It is very important to make sure that topol-ogical relationships (or connectivity rules) are documented in some form, but this does not necessarily need to be done by showing them on the data model diagram. There are various other options, such as creating a separate entity-relationship type diagram just showing topol-ogical relationships, using a matrix or table to show which pairs of objects can be connected, or just listing the valid connections for an object with the rest of the documentation about that object.

A diagrammatic or tabular representation of connectivity rules can generally only convey high level information about the rules. Rules can be quite complex, so it is often necessary to include some additional description in addition to a diagram. For example, a rule might be that it is only valid to connect two primary electrical conductors if the phases of one are a subset of the other - so a cable with phase BC could be connected to a cable with phase B, but not to a cable with phase AB. Another rule might be that it is not possible to connect two gas mains with different diameters unless they have a suitable fitting in between them.

Some people might debate whether defining these sort of complex rules is part of the data model or part of the application. There is a strong argu-ment for including these as part of the data model, since enforcing them is fundamental to the integrity of the database. Also, when using an object-oriented approach to design and development, which is widely recognized as being of benefit in complex applications like GIS, the behaviour of an object is an important part of its definition and when taking this approach, connectivity rules should definitely be regarded as part of the data model.

In summary, it is difficult to give a hard and fast rule for how to document connectivity rules. In many cases it may be appropriate to display the high level rules on the data model diagram (ignoring complex constraints), if this can be done without detracting from the clarity of the diagram. Whether this is practical depends on the number of topological relationships and the number of explicit relationships which need to be shown on the diagram. In any case, it is likely that addit-ional documentation will be required for each object to explain more complex aspects of the rules.

Constraints Imposed by the System Data Model

We have already mentioned that we are really discussing two separate types of data model in this paper: the application-specific or user data model, which has been the main topic of discussion so far, and the system data model which is provided by the GIS vendor.

It is the system data model which handles fundamental issues such as how spatial data and topological relationships are modelled. When implementing a GIS, you typically cannot change the system data model. This puts constraints on the way that you do data modelling, which at times may hinder you but at other times (hopefully most of the time!) should help you. It helps you because the GIS vendor has implemented a model and functionality which handles the most complex fundamental aspects of the system. It also gives you a frame of reference in which to think about how to specify spatial and topological relationships, which as we have seen already can be difficult to do if you start with a blank sheet of paper. For example, most systems use some variant on the traditional GIS model in which an object can be a point, line or area. As you define the objects in your data model, you categorise each one as a point, line or area, which makes the modelling process much simpler than if you did not have this framework to work within.

However, while this simplification generally helps you, there is a danger that the system data model may not be sufficiently rich to handle the complexity of what you want to model. The experience of this author is that the traditional point-line-area model has a number of shortcomings, especially for modelling utility networks. The next section of this paper looks at various aspects of the system data model, and in particular looks at some examples of non-traditional modelling approaches which offer benefits for utility applications.

System Data Model Issues

This section considers various aspects of the system data model provided by the GIS vendor which are important in being able to model utility networks effectively.

Sheet-based Versus Continuous Models

At a fundamental level, it is very important that the database is seamless, so that an object like a cable is always stored as a single object regardless of how long it is, and it does not have to be split into mul-tiple objects because it crosses arbitrary map sheet or tile boundaries. This significantly simplifies application development and data maintenance.

Most systems now provide this capability at a basic level. However, it is important to consider how you can access these objects as a user or application developer. Because of the very large data volumes in typical GIS databases, most systems only allow you to work on a subset of the database at one time for analysis or update purposes. Being able to access the whole database at once without constraints offers significant advantages for many applications, such as network analysis and outage management.

Providing this seamless access to a database which may be tens or even hundreds of gigabytes in size, and obtaining good performance, is obviously not a simple task. There are two key issues in achieving this. The first is a good spatial indexing mechanism. The most common approach is to use some form of quadtree index. There is extensive literature on this topic - for example see Samet, 1990. The second key issue is the under-lying DBMS architecture. The server-oriented archi-tecture used by standard commercial DBMSs like Oracle, Ingres and Sybase is fundamentally unsuitable for providing the required performance in a networked environment with many users. A client-oriented DBMS architecture can provide an order of magnitude better performance for this sort of application. This DBMS architecture is actually far more important in achieving good performance in multi-user networked environments than the spatial indexing mechanism used, but it has recei-ved far less attention in the literature. See Newell and Batty, 1993, for more details on this topic.

In summary, the important things to look for in a system data model in this area are that it is seam-less, that it allows unconstrained access to the whole database for update or analysis, and that it delivers good performance in a production environment.

Spatial Object Versus Spatial Attribute Model

As we mentioned earlier, most system data models are based on some variant of the point-line-area model, in which each object belongs to one of these categories. Some systems have extended this model, for example by adding a "control point feature" which is like a point, but has two connection nodes. This is suitable for modelling certain electrical objects like transformers and simple switches. However, the basic approach is still that each object has a (single) spatial type. Each object will then have a number of alpha-numeric attributes defined for it (size, material, equipment number, etc.).

A much more flexible model, especially for utility applications, can be obtained by looking at the spatial aspects of an object in a different way. Instead of insisting that an object has a single spatial type, we can allow an object to have multiple spatial attributes, each of which has a spatial type such as point, line or area. This simple step of moving spatial information from an object level to an attribute level gives lots of new modelling possibilities. We will look at some examples to illustrate this.

Depending on the application, you may wish to regard a road either as a line or as an area. If doing some kind of route analysis, you will be interested in tracing along its centreline. If looking at access to properties along the road, you will be interested in the right of way area associated with the road. With the traditional spatial object model you would need to model these two things as separate objects, and typically you would need to write some specific code to create and maintain the relationship between these two objects. With the spatial attribute model, you can simply give the road two spatial attributes, a centerline which is linear, and a right of way which is an area.

A very common requirement in utility applicat-ions is to be able to display an object at a location which is offset from the location where it really exists. For example, many electric utilities display transformers offset from the cable to which they are attached. This can be very simply handled by the spatial attribute model: one spatial attribute can be used to store the actual location, and another spatial attribute can store the location where its picture is to be displayed. This applies to many situations, for example where multiple conductors are running through the same duct, and you may want to dis-play each conductor offset by a different amount.

Another area in which the spatial attribute model greatly simplifies modelling is in the hand-ling of multiple representations of the same object. It is particularly common in utility applications for the same object to appear on multiple different types of map, including various schematics. Again this can be handled simply by defining multiple spatial attributes on an object: one which repres-ents the actual location, and additional ones which represent the position of the object in each type of schematic representation. There are additional modelling techniques which can be useful in handling schematics, such as the use of multiple worlds: we will return to this subject later.

Basic Network Topology Modelling

The way in which network topology is modelled is obviously of fundamental importance to utility applications. With the traditional model, a linear object typically has two nodes, one at each end. Connected objects are defined as those sharing a node, so other objects can only be connected to the end of a linear object. If an new service line needs to be connected in the middle of a cable, for example, the cable must be split into two separate cables to model the connectivity correctly. This can lead to having to split something which is really a single object into many different objects, typically replicating the attributes on every instance, which causes problems in terms of data storage, data maintenance and performance. The following diagram shows the sort of situation we are talking about:

[ Figure (drawing) not available ]

These issues can be overcome by using a two level linear network model. With this approach, every high level linear object - we call this a chain - is made up of one or more (continuous) low level linear objects - we call these links. In the above drawing, the main (secondary) cable geometry would be a single chain consisting of nine links, and each service cable geometry would be a chain with just a single link. The links define the connectivity, but the chains are the spatial attributes associated with an object - so we can just have a single secondary cable object with one set of attributes, rather than having to create nine secondary cable objects. This approach obviously allows points to be connected in the middle of a chain too, for example a single primary conductor could have many transformer connection points along its length.

Modelling Complex Network Topology

Tracing through a linear network is a fundamental requirement for many utility applications. Common requirements for controlling a trace include stopp-ing at specified objects, possibly qualified by attribute, for example stopping at all open valves or switches. Another important requirement for electrical networks in particular is being able to do directional tracing - upstream or downstream.

For simple networks, these requirements can be met by the traditional linear network model consisting of links and nodes, where a link runs between two nodes, and two links are connected if they share a common node. However, utility networks often include objects whose connectivity cannot be easily modelled using this simple model, in particular various types of switching or control facilities. For example, consider the following diagram of a transfer switch:

[ Figure (diagram) not available ]

This switch has three connections, one input and two outputs. The switch can be in one of two positions. In position 1, shown in the drawing, current from the input goes out on output 1, and in position 2 it goes out on output 2.

There are several things which can help us produce an elegant solution to the problem of modelling these complex objects. The first is the spatial attribute model: we could model our transfer switch with three point geometries (spatial attributes), to represent the input connection and the two output connections. The second is that we need some way of defining the behaviour of this object within a trace - the trace needs to know that if it reaches the input connection point, then if the position attribute of the transfer switch is equal to "Position 1" then it should continue tracing from the output 1 connection point, and otherwise it should continue tracing from the output 2 connection point.

This is a situation where object-oriented programming is very useful. With conventional procedural programming, we would need to modify our tracing code each time we wanted to handle a special case object like this, which makes it impossible to create a general purpose trace routine, and causes support and maintenance problems. In an object-oriented system, we define the trace behaviour on each object, so that the general trace code does not need to be modified and new object behaviour can be introduced very easily. For example, our trace code could be set up to check whether any object it hit had a special method defined called trace_outputs, and if it did then it would call this method to get a list of nodes from which it continue the trace (a method is similar to a function in a procedural programming language - see Batty, 1993, for more information on object-oriented programming in GIS). The following is an example of what this method would look like for the transfer switch:

method transfer_switch(trace_input)         if trace_input = input_connection         then                 if self.position = 1                 then                          return {output_connection1}                 else                         return {output_connection2}                 endif         elif trace_input = output_connection1 and                  self.position = 1         then                  return {input_connection}         elif trace_input = output_connection2 and                  self.position = 2         then                  return {input_connection}         else                 return {}         endif  endmethod  

This method can contain any kind of program-ming logic, so the behaviour can be extremely sophisticated if necessary. This gives us an elegant way of handling the transfer switch.

The transfer switch is a fairly simple example - we also need to model more complex devices such as the following switch cabinet (this particular example is an S&C model PMH9):

This is displayed as a single object on the map, but internally it contains four switches which can be operated independently. Each switch controls current on three phases (A, B and C). The left hand diagram shows all three phases combined, and the right hand diagram shows the phases separately. The two switches on the left hand side are group operated switches, which means that they are either open on all three phases or closed on all three phases, and the two switches on the right hand side are fuse switches, which can be independently open or closed on each phase. The trace behaviour we want needs to recognise the positions of all these switches and derive the correct output for a given input appropriately.

The behaviour of the PMH9 switch cabinet is significantly more complex than that of the transfer switch, so we really need something more to help us model that. For this we will introduce another couple of new concepts: multiple worlds and hypernodes.

[ Figure (diagram) not available ]

A good approach to this problem is to model the internal structure of the switch cabinet as a separate set of GIS objects with their own attributes and topology - in this case we need to model bus- bars, fuse switches and group operated switches. We will lay these out in a schematic representation as in the diagram above (the simpler left hand representation is sufficient providing that we store separate attributes for the switch position on each phase, and that the tracing function used can stop based on complex predicates involving these attributes). An issue we need to resolve is where to place these objects - they provide more detail than we really want in our main geographic data-base. This is one area where the concept of multiple worlds is useful. A world is an independent coordinate system within the same database. Many different worlds can be created, and it is possible for a world to be related to an object. In this case we create a new world which we think of as being owned by the switch cabinet. We then create the objects representing the internals of the switch cabinet in its internals world. These can be created from a list of standard templates, or objects can be created and edited individually. These objects are connected using ordinary topological rules, and normal trace constraints will apply, like not going through open switches.

The one thing which is still missing is a link between the cables which are connected to the switch cabinet, which exist in the main GIS world, and the internal switches and busbars, which exist in a separate world belonging to the switch cabinet. This is where the hypernode comes in. A hyper-node is an object which has two point geometry attributes, and special tracing behaviour defined on it, similar to that which we defined on the transfer switch. In this case the special behaviour is quite simple - it just says that if the trace hits a point belonging to a hypernode, then it should continue tracing from the other point belonging to the hyper-node. In this way a hypernode can be used to make a trace jump ("through hyperspace" - hence the name!) from one point to another. The two points (or "ends") of a hypernode can be in different worlds. Hence we can add hypernodes which connect the cables coming into the switch cabinet to the appropriate connection points in the internal model. The nice thing about this approach is that we do not have to define any special trace behaviour on any objects, even though we are modelling some very complex behaviour. We simply specify that the switch cabinet is an object which has internals, and all the special behaviour we need is already defined on the hypernode, which is a standard system object. For a more detailed discussion of the use of multiple worlds, see Newell and Doe, 1994.

Schematics

We have already mentioned that it is a common requirement in utility applications for an object to have multiple representations, appearing not only on one or more types of geographical map, but also potentially on various types of schematic diagram. Several of the modelling techniques which have already been discussed are very useful in handling schematics. Multiple spatial attributes can be used to store the location of the object in each schematic. The geometry for each different schematic can be stored in a different world, to provide a clean separation between each schematic and the geographic representation.

In some types of schematic there may not be a one to one correspondence between objects in the geographic world and objects in a schematic. For example, many cables shown in the geographic world may be combined into a single line section in a schematic. The spatial attribute model is not sufficient to handle this case: we need to be able to handle (explicit) relationships between objects. The ability to define and maintain explicit relation-ships of various types (such as one to many and many to many) is important for many aspects of data modelling. The system should be able to automatically enforce rules relating to relationships, like referential integrity, or more complex rules. For example, in the case of a schematic line section which is related to multiple cables there must be a mechanism for ensuring that the schematic is updated appropriately if any of the cables are modified. These data modelling require-ments can be met using a DBMS feature known as a trigger, which is discussed in the next section.

Maintaining a Complex Model

In a GIS there are often complex relationships which need to be maintained, and complex rules which need to be validated whenever certain objects are updated. It is difficult to consistently validate rules via specific code in the application, because we need to ensure that validation is always done, whichever mechanism is used to update the record. For example, whether the record is created or updated by a data translator, or via one of a number of interactive menus, we always want to make sure that the same validation is done. This can be implemented by the use of a DBMS which supports triggers. A trigger is a function (or method in object-oriented terms) which is invoked whenever a specified object or attribute is inserted, updated or deleted. Ideally, it should be possible to invoke the full range of GIS functions from within a trigger, and it should be possible to cause the current transaction to be rolled back if an invalid condition is found within a trigger.

Triggers can be used for a wide variety of functions. At a very simple level, for example, a trigger could be defined to create a given anno-tation at a standard offset from a certain type of point whenever the point was inserted or updated. A trigger could also be used to implement complex connectivity rules, for example checking that the phases of two connected cables are compatible, and returning an error condition which will roll back the current transaction if they are not. A trigger could also implement more complex functionality such as updating an associated schematic geometry when the geographical representation of a record is updated.

An Application Example

This section looks briefly at some design issues involved in a common utility application, outage management, as an example of the sort of trade-offs which need to be considered when designing a data model.

Outage Management

We will consider the design of an outage management system for a radial electricity network. The basic idea of this application is to record calls from customers whose power is out and from this information predict which device is most probably causing the outage. The customer calls may be entered into a separate system by the telephone operators, and then passed on to the GIS for outage analysis. In a hierarchical network, several customers will be fed from a single transformer. A number of transformers will typically be on a section of network which is isolated by a fuse switch (i.e. if that fuse switch is out, all customers served from all transformers downstream of that fuse switch will be out). Further up the hierarchy there will be other devices such as reclosers which are possible causes of an outage.

In order to predict which device or devices are the likely cause of an outage, we need to look at the pattern of calls. If we receive several calls from customers served by the same transformer, then we would predict that the transformer was the probable cause of the outage. However, if we had predicted several transformers beneath the same fuse switch, then we would change our prediction to say that it was most likely that the fuse switch was out rather than all of the individual transformers. Exactly how many devices need to be predicted before a device further upstream is predicted depends on the type of device upstream and a range of other factors. A detailed discussion of all the design requirements is beyond the scope of this paper. However, it is sufficient to know that for any predictable device on the network, we need to be able to efficiently identify the switchable devices immediately downstream of that device, the transformers directly fed from this device, and if the device is predicted as being out we need to be able to calculate the number of customers and the total load (kVA) downstream of that device.

To meet the requirement of being able to efficiently identify the immediately downstream predictable devices and transformers from a given device, we have several options. We could dynamically trace downstream each time we needed this information. This could potentially involve tracing downstream for some distance, which could be a performance issue. A second possibility is that we could construct a separate network, using a similar approach to that which was discussed for schematics, which contained only the devices we were interested in for outage management, connected by linear objects which could be formed from an aggregation of several cables in the detailed geographic network. We would still have to do a trace each time we needed the downstream devices, using the "outage network" but performance should be better as the network is simpler. However, creating a separate network has a data storage overhead, and we need to write some application code (probably a set of triggers), which creates and maintains the second network automatically. A third option is that we could maintain a set of explicit relationships which models the hierarchy of predictable devices. This would again have some storage overhead, and would need some triggers writing to maintain the hierarchy, but would probably give the best performance of any of the approaches.

So we have (at least) three possible approaches to this problem. The simplest one in terms of the data model and application development require-ments is likely to be the least efficient in terms of performance. This is a case where prototyping is very useful to try the different approaches. This application was recently implemented by the author, and the first approach which was tried as a prototype was the first option described, which was most attractive because of its simplicity. The performance obtained with this approach was tested and found to be good, so it was decided that it was not worth prototyping the other options. This sort of situation occurs quite frequently, where you have a choice of deriving complex relationships between objects on the fly, or of creating more explicit relationships which will give you better performance when querying the relationship, but which has overheads in terms of data storage, application development and performance of updates. On a case by case basis you need to evaluate the pros and cons of the different options, and this may often involve prototyping some of the options.

Managing The Data Model

This section briefly discusses technology which can help in the development and maintenance of a data model, in particular the use of CASE technology.

Problems in Designing and Maintaining a Data Model

One of the largest costs in most large GIS implementations is the cost of customising the system to meet an organisation's specific requirements. In turn, designing and maintaining the data model for the GIS is typically one of the most significant elements of this customisation. Data modelling is a complex task for most applications, but this is particularly true for GIS as we have seen already. GIS projects typically have quite long life cycles, and the technology is relatively new to most users, which means that at the outset of the project they typically do not realise the full capabilities of the system. Both of these things contribute to the fact that requirements are very likely to change during the course of the project, and these often require the data model to change. Also as new applications are developed and added to the system, there may well be requirements for further changes to the data model. Since GIS projects involve capturing large amounts of data, it is critical that these changes to the data model can be made without losing any data which is already stored in the system.

With most traditional GIS software, it has been very difficult to address these issues. Typically the design of the data model takes a long time and is difficult to subsequently change. This usually means that a very long period of time is spent at the beginning of the project doing requirements analysis and data model design to try to make sure that it is exactly right (which of course it never will be) before any other work begins, as it is so difficult to make changes subsequently. This section briefly discusses how a CASE tool can be used to address these issues, and how this in turn radically changes the way in which one can approach the problem of customising a GIS.

What is a CASE Tool?

The acronym CASE stands for Computer Aided Software Engineering, and it is used to describe a variety of computer-based tools which can be used to assist in the design and development of computer programs. Such tools have been developed for various purposes, including the analysis and documentation of procedures and data flows, and the design and documentation of a data model. It is the latter function which we consider here: the use of a tool which can be used to define a graphical representation of a data model, in the form of an entity-relationship diagram which we discussed earlier. By clicking on individual objects it is possible to define more detailed information about them, such as what attributes they have, what the types of these attributes are, and so on. With some CASE tools, it is possible to automatically generate code which will create the data structures which have been designed.

Compatibility Between the CASE Tool and the DBMS

For doing a high level conceptual design, you do not necessarily need a close correspondence between the CASE tool being used and the DBMS which will eventually be used to implement the actual system. However, if the CASE tool is to be used to do the physical database design and to actually generate code, then clearly a much closer link is required between the CASE tool and the DBMS, and application development environment, used to implement the actual system.

This requirement can really be split into two. The first is that the CASE tool supports all the data modelling concepts supported by the DBMS: all its datatypes, all the types of relationships it supports, and other concepts which we have discussed such as triggers. In particular for GIS the CASE tool needs to understand the way in which spatial information and relationships are stored in the DBMS. The CASE tool may extend to covering aspects of the user interface of the application, such as which fields are visible to the user and what sort of interface is used to edit objects and their individual fields.

The second requirement is that the CASE tool must be able to generate code in an appropriate format which will implement the data model which has been designed. Providing that common data modelling concepts are supported, it is obviously technically possible for one CASE tool to output code in multiple formats suitable for different DBMSs.

These requirements are to a certain extent independent of each other. If the first one is met but not the second then it is at least possible to use the CASE tool to do the physical design for the system, but the code to implement this then has to be written manually. On the other hand, it is possible to have a CASE tool which supports a subset of the concepts supported by the DBMS (with GIS, for example, a CASE tool which supports alphanumeric datatypes but not spatial datatypes), which could be made to output code in an appropriate format for the DBMS, but further development work would then be required in the DBMS environment (outside the CASE tool) to incorporate any of these additional concepts in the application. In this situation it is likely to be much more difficult to usefully maintain the data model using the CASE tool after its initial creation, since the CASE tool does not know about any of the changes which have been made to the data model within the DBMS environment. Clearly, the CASE tool is much more useful if it meets both of these requirements in full.

Maintenance of the Data Model

While the requirements in the previous section demand a very close link between the CASE tool and the DBMS, the biggest benefits the author has found from the use of a CASE tool come from taking this integration one stage further and providing the ability to update the data model of an existing DBMS which is already populated with data, without losing any existing data. This is particularly important in GIS projects, since as we mentioned, they tend to last for a long time, requirements are particularly prone to change during the course of the project, and typically the database will contain large amounts of data when these changes have to be made.

It is highly desirable to be able to make data model changes on a test version of the database so that they can be tested before applying them to the production version. Ideally one would like to be able to do this without replicating data in the master database.

Use of an Incremental Development Methodology

If suitable tools are available to allow the data model of a populated database to be easily changed, this can significantly change the approach which is taken to an application development project. Instead of taking the traditional approach of trying to completely design the data model at the beginning of the project, which typically takes a very long time, it is possible to start with a much simpler core data model and develop it over time in parallel with the development of application prototypes. This allows benefits to be delivered to users much more quickly. For a more detailed discussion of the use of CASE tools with GIS, see Kendrick and Batty, 1994.

Conclusion

This paper has covered a range of issues relating to GIS and AM/FM data modelling. We started by considering how best to represent spatial and topological relationships when designing an application data model, which is not specifically covered by any of the common approaches to data model design. We then considered how new features in the GIS system data model could help the user model certain things more easily. From this perspective it is important that GIS vendors continue to look at enhancing their system data models rather than just continuing to use the simple point-line-area spatial object model which is still the most commonly used approach. It is also important for users to consider the system data model of any system which they are evaluating. Finally, we discussed CASE technology which can simplify the creation and maintenance of a data model, and which in doing so can radically change the approach which is taken to a GIS development project, by allowing the data model to be developed incrementally rather than having to completely design it at the beginning of the project.

References

Batty, P.M, 1993: Object-Orientation - some objectivity please!: Proceedings of GIS 93 Conference, Birmingham, UK.

Kendrick, G., and Batty, P.M., 1994: Use of an Integrated Case Tool for GIS Customisation: Proceedings of EGIS 94.

Newell, R.G., and Batty, P.M., 1994: GIS databases are different: Proceedings of AM/FM Conference XVII, pp 279-288.

Newell, R.G., and Doe, M., 1994: Discrete Geometry with Seamless Topology in a GIS.

Samet, H., 1990: The design and analysis of spatial data structures: Addison-Wesley, Reading, Massachussetts, 493 p.

Smallworld Technical Paper No. 11 - Use of an Integrated CASE Tool for GIS Customization

by Gillian Kendrick and Peter Batty

Abstract

The implementation of large corporate GIS systems places heavy demands on the customisability of GIS products. This paper examines the use of an integrated CASE tool in GIS customisation. The authors start by discussing some of the common problems which designers of GIS systems face in developing new applications. The paper describes the or features which a CASE tool should include in order to address these issues. The similarities between the technology which can be used to underpin both a CASE tool and a GIS will be mentioned.

The paper describes the practical use of a CASE tool in application development. It also discusses the management of multi-user application development and the benefits gained from utilising a CASE tool in the development process. It is shown that the use of an integrated tool facilitates the adoption of a new incremental design methodology.

Introduction

One of the greatest costs in large GIS implementations is that of customising the basic system to meet an organisation's specific requirements. There are many aspects to this customisation. The user interface of the system may be tuned to speed up data capture or else to streamline the execution of particular queries. The way in which object's spatial attributes are displayed must be specified. However, one of the most significant parts of any customisation involves the design and maintenance of the application's data model.

Data modelling is a complex task for most computer applications, but this is particularly true in GIS for several reasons. The first of these is the large number of classes of objects which are involved in these systems. Each implementation will typically involve hundreds of different object classes. The second factor is the number and variety of relationships between these objects. Relationships are of three main types: aggregation and association, e.g. a building is 'part of' a school; spatial, e.g. a house is 'near to' a lake; and topological, e.g. a valve is 'connected to' a pipe. Of these types, only the first may be explicitly represented in terms of traditional relational joins. The others will be derived indirectly from spatial attributes of the objects or else through the interaction between these spatial attributes.

For many organisations, the GIS will only form a part of the corporate computing system. There will be existing data bases holding data such as customer information. Some of this data may be relevant to the GIS, and therefore part of the customisation will involve integrating these existing data models with that of the GIS (Bundock & Theriault 1992). This integration will again increase the size and complexity of the overall design.

Another aspect of GIS implementations, which has an impact on the maintenance of the data model, is that user requirements change. GIS projects typically have quite long life cycles, and the technology is relatively new to most users. This means that at the outset of the project they may not realise the full capabilities of the system. As new applications are developed and added to the system, changes are required in the data model.

Since GIS projects involve capturing large amounts of data, it is critical that these changes to the data model can be made without losing or compromising any data which is already stored in the system.

With most traditional GIS software, data model evolution is a difficult issue. This has meant that a long period of time has been spent at the beginning of the project on requirements analysis and data model design to try to make sure that it is exactly right, which of course is rarely achieved. This has had to happen before any other work could begin, meaning that there has been a very long delay for users between the purchasing of a GIS and the start of productive use of the customised system.

This paper describes how an integrated CASE tool can be used to address all of these data modelling issues. This in turn radically changes the way in which one can approach the problem of customising a GIS, using an interactive design methodology.

What is a CASE Tool?

The acronym CASE stands for Computer Aided Software Engineering. It is used to describe a variety of computer-based tools which can be used to assist in the design and development of computer programs. Such tools have been developed for various purposes including the analysis and documentation of procedures and data flows, and the design and documentation of a data model. It is the latter function which we consider in this paper as this is the most appropriate kind of tool to help with GIS customisation.

A CASE tool which is to be used for data model design must display a graphical representation of the model. This is usually in the form of an entity-relationship diagram which shows the entities, or classes of objects, and the relationships between them. Such a facility gives the designer a good picture of a large design. The tool allows plots of the diagram to be made. These can be included as part of the documentation of the overall design.

The tool allows the designer to interact with the spatial representations of object classes and relationships. A graphical user interface (GUI) provides an environment in which the attributes and behaviour of the objects and relationships can be edited and queried. The tool produces automatic documentation on selected parts of the design.

One of the ways in which a CASE tool can reduce the time taken to implement a design is by automatically generating the code which creates the data model. This facility has two main benefits. Firstly the designer no longer has to worry about implementation details; he can concentrate on the task of data modelling. Secondly, the automatic code generation speeds up the process of creating data mode]s, and avoids programming errors.

The tool should also provide facilities which enforce the correctness of the data model. It should validate that the design is consistent. Checks made by the tool at the design stage will reduce the amount of time spent in tracing bugs by the designers at a later date. This will again speed up the delivery of finished applications.

This section has described general requirements for a CASE tool which is to be used for data model design. The next two sections describe in more detail some of the requirements for CASE tools which are to be used in GIS. The first section covers general requirements for tools for GIS data model design. The second considers the extra requirements for an integrated tool and covers some of the benefits provided by such a tool.

CASE Tools for GIS

Multi-User Design

It was mentioned in the introduction that one of the reasons for the complexity of GIS data models was that they involved a large number of classes of objects. It will usually be the case that more than one designer will be involved in the development of these objects, their relationships and their behaviour. This means that it is important that the tool used to develop the data model is multi-user.

Each designer should be able to work on additions or modifications to the data model without being affected by changes which others are making to different parts of the design. These changes will also be made over a long time scale - maybe hours, days or weeks. The designer may wish to try out several alternative versions of the data model in order to settle on a best design. When a new part of the model has been completed or re-engineered, the tool must offer support for 'merging' the new work in with the rest of the design.

The requirements of CASE tool database technology in this area, involving long transactions and version management, are very similar to those of GIS. (Easterfield, et al. 1990).

Modelling of Spatial Data and Topological Relationships

GIS data models are special because they require the user to model spatial attributes of objects. New data types such as point, area, chain, raster and grid are supported. The tool also models the topological relationships which will exist between some objects. For example a valve location 'connects to' a pipe centreline, a land parcel 'shares' its boundary with another land parcel.

These aspects of GIS data models should be supported in a CASE tool which is to be used in GIS customisation.

Integrated CASE Tools for GIS

For high level conceptual design, it is not necessary to have a close coupling between the CASE tool and the DBMS used to implement the system. However, if the CASE tool is to be used for the physical database design and to generate code, then a much closer link is required between the tool, the DBMS, and the application development environment used to implement the actual system. The requirements for the tool if it is to achieve this higher level of integration can be separated into two parts.

First, the CASE tool must support all of the data modelling concepts supported by the GIS. These include things such as the datatypes, relationships and other system specific concepts such as triggers, validators, and enumerators. For object oriented systems, the CASE tool should also manage the behaviour for the object classes (Booch 1991). In particular for GIS the CASE tool needs to manage spatial information and topological relationships. The CASE tool may extend to covering aspects of the user interface of the application, such as which fields are visible to the user and what sort of interface is used to edit objects and their individual fields.

Second, the CASE tool must generate code in an appropriate format which implements the data model which has been designed. in the case of the work reported here, the tool produces a script in the language Smallworld Magik.

These requirements are to a certain extent independent of each other. If the first one is met, but not the second, then it is at least possible to use the CASE tool to do the logical design for the system, but the code to implement this then has to be written manually. on the other hand, it is possible to have a CASE tool which supports a subset of the concepts supported by the DBMS (with GIS, for example, a CASE tool which supports alphanumeric datatypes but not spatial datatypes). This could be made to output code in an appropriate format for the DBMS, but further development work would then be required in the DBMS environment (outside the CASE tool) to incorporate any of these additional concepts in the application. In this situation it is likely to be much more difficult to usefully maintain the data model using the CASE tool after its initial creation, since it does not know about any of the changes which have been made to the data model within the DBMS environment. Clearly, the CASE tool is much more useful if it meets both of these requirements in full.

A CASE tool which can generate the data model for a GIS application automatically is a very useful part of a development tool kit. There are two more facilities however which are necessary in order to make the tool truly useful for corporate GIS design.

Integration of Existing Data Models in the GIS Data Model

Something that was mentioned in the introduction to this paper is that a common requirement in GIS implementations is that existing (legacy) databases, and therefore data models, should be integrated into the system. It will often be the case that there are relationships between objects which have been created as part of the GIS and those which belong to these other databases.

As an example consider a large utility company that has a customer information database. It is undesirable to move this data to the GIS as many other applications will already be designed to run on it. Part of the requirement of the GIS application is to associate spatial information with those customer records.

To facilitate this integration, the CASE tool can 'reverse engineer' the data models from these existing databases. The objects can then be integrated into the whole design within the domain of one tool.

Maintenance of the data model

GIS applications involve large amounts of data, and as mentioned before, they are also particularly prone to having the user requirements change during the life cycle of the project. If a change in the user requirements means that the data model must be modified, then it is essential that facilities are provided to 'forward engineer' the populated database to be consistent with the new datamodel.

A common feature of CASE tools is that they generate the code which creates a design. It is less common to generate the code which will 'forward engineer' an existing design when the data model definition held in the tool has been changed. A CASE tool for GIS should offer support to evolve a database from one schema to another.

Features of an Integrated CASE Tool for GIS

Based on the previous sections, we can summarise that an integrated CASE tool should have the following features:

  • Support all of the data modelling concepts which are supported by the GIS DBMS
  • Generate the code which will implement the data model in the GIS DBMS 'reverse engineer' existing data models so that their schema can be integrated into the overall design for the system
  • 'forward engineer' populated databases when the design of the objects and relationships is updated.

Such a tool provides significant benefits in the speed of GIS customisation and also in the maintenance of the applications throughout their life cycles.

Similarities Between a GIS Application and a CASE Tool

Several of the capabilities provided by a CASE tool are similar to those which are found in a GIS.

Firstly, in both systems, the user creates objects which combine both spatial and alpha-numeric attributes. In the GIS, these objects are things such as houses, pipes or rivers. The alpha-numeric attributes of a house are things such as its address and owner, and its spatial attributes are its position and footprint. In the CASE tool, the objects which the user works with are the definitions of the classes of objects which are to be used in the GIS application. For example, a CASE tool object has an alphanumeric attribute which holds the name of an object class, this could be 'House'. The spatial attributes of these objects are used to position them in the entity-relationship diagrams. Similarly, the CASE tool stores relationship objects, their attributes would include the type of the relationship, e.g. 'part of' or 'connected to'.

In both the CASE tool and the GIS, users can query and edit the properties of the objects through a GUI. This interface provides facilities for plotting and reporting.

One of the areas in which a GIS and CASE tool are most similar is in their need to provide support for multi-user working in a long transaction database. This should not be surprising as both systems are used as design tools as well as information systems.

The Smallworld CASE Tool

After a consideration of the benefits of using an integrated CASE tool, and a survey of the tools currently available on the market, Smallworld chose to implement its own. Because of the similarities outlined above between GIS and CASE, it was possible to implement the tool as a specific GIS application with its own data model.

Working with the CASE Tool

The tool is activated during a normal GIS session. The designer can have both the GIS application and the CASE tool running at the same time.

[Figure not yet available]

Figure 1. Integration of GIS and CASE in one environment

Application Development with the CASE Tool

Smallworld GIS contains a powerful data dictionary which supports many advanced data modelling features. These include the definition of new data types, triggers and validators. The data dictionary is capable of storing the definition of object classes, their attributes, behaviour and also the details of the topological and associative relationships between them. The data dictionary can manage many different versions of a data model and allows the designer to look at these different versions within one GIS session.

The CASE tool operates directly on the GIS application's data dictionary. If the GIS application involves integration with external databases, the data models held in these databases are reverse engineered into the tool. Once present in the tool, the user can add spatial attributes to the objects, and can also define relationships between them and the objects which live in the main GIS database. The user can also add behaviour to these external objects.

The CASE tool supports a steady progression from a logical to a physical definition of the required data model. It does not demand that the data model is completely defined but will advise on areas which are not yet complete. Facilities are provided which allow the designers to document the model extensively as they are working.

Developing the Data Model

As already mentioned, the CASE tool is a multi-user facility. We will now describe how many users can work on the same model.

[Figure not yet available]

Figure 2. Version management of both schema and GIS data

In figure 2, the boxes to the left of the dotted line represent the different alternatives in the CASE tool's database. Beneath the Top alternative there are two others where the designers Phil and Betty can work independently. The user Phil also has another two sub alternatives where he can try out different designs. The area to the right of the dotted line shows the alternative tree in the main GIS application. Beneath the Top, there are two alternatives Live Data and Test. The alternatives beneath the Live Data alternative are those where the GIS application is being used. The GIS applications may include data capture, analysis of existing data and design. The part of the tree under the 'Test' alternative is used by CASE designers to try out their developments. When Phil decides that he has completed a part of a design in the alternative 'Design 1', he can make the CASE tool 'apply' the new data model to the alternative 'Test 1'. At this point, he can immediately start testing the new data model in the GIS application. If after testing, the new part of the design is found to be a success, the changes he has made can be 'posted' up to the Top of the CASE database. If the design was unsatisfactory, he can proceed to improve it and then 'apply' the new changes.

When changes have been made to the design which have been thoroughly tested they can be applied to the Top alternative of the GIS application's database. That alternative will have its data model forward engineered'. Data model changes then spread through the alternative tree as users at lower alternatives request to see them.

[Figure not yet available]

Figure 3. Part of a GIS application's data model

The use of an integrated CASE tool in GIS allows designers to develop data models iteratively. There no longer needs to be such a great reliance on getting the design right first time. This in turn means that the end users can rapidly be presented with a prototype system which allows them to more easily understand what the system can offer. They are therefore able to offer much more feedback in the design of the system, leading to the development of systems which are much better suited to their requirements.

Data Model Re-use

Another feature of the CASE tool is that it facilitates the re-use of pieces of design. It can archive complete sections of data model in a format which can be read in by another tool. This has enabled the setting up of a library of designs. This greatly speeds up the development of new applications as they rarely have to be designed from scratch. An existing 'template' data model can be loaded in to an applications CASE tool and this can be thought of as the first prototype.

Conclusions

Since Smallworld commenced using an integrated CASE tool for application development the following benefits from this approach have been found:

  • Customisation of the system has been made more accessible. The tool has reduced the size of the knowledge barrier which designers have to overcome, as they no longer have to be trained in how to talk to the underlying DBMS directly. They can instead spend more of their time with design considerations. This means that more people are able to customise the system with less initial training
  • The tool has enabled the creation of applications which better meet the customer's requirements through the interactive development approach. Users are given the chance to interact with a version of the application at an early stage. They can then refine their requirements, the developers can evolve the design and the tool will evolve the database.
  • Designs are more easily re-used. The archiving facility in the Case tool allows parts of designs to be stored in a form where they can be loaded in to another tool. This means that a library of commonly used parts of designs can be set up. When a new application is being developed, much of the initial data model can be created from these standard components. This has two main benefits: firstly the development time for new applications is greatly reduced; secondly, the quality of developed applications is improved. Time invested in improving the quality of the basic data model components leads to improvements in the overall system quality.
  • Multi-user working is easier to manage. It is always a difficult problem when many designers are working on the same project to avoid duplication of effort, and conflicts between their different designs. The CASE tool aids in these problems by incorporating a number of validation and completeness checks which will prevent the design from becoming inconsistent. The users can develop parts of the data model independently in separate alternatives. When they incorporate these into the total design by merging' their changes, the DBMS automatically spots any conflicts in the design which the designer has introduced and allows him/her to reconcile them.
  • An Incremental development methodology means that designers can get production systems in place much more quickly.

References

Booch, G. (1991) Object Oriented Design with Applications. Benjamin/Cummings, Redwood City California, 1991

Bundock, M. & Theriault, D. (1992): Integration Of Case Technology into the GIS Environment, in AGI92 Conference Papers, Birmingham, November 1992

Easterfield, M., Newell, D.G. & Theriault, D. (1990): Version Management in GIS - Applications and Techniques, EGIS 90 Conference Proceedings, Amsterdam, April 1990.