Abstract: Teaching in an Internet-based learning environment provides unique challenges. Of particular interest is the question of maintaining link integrity. Although several modern hypermedia systems can guarantee that links within their systems are maintained, it is more difficult to ensure that links to external Web servers remain intact. The system we have developed maintains consistency to external Web sites for all essential documents (as specified by the courseware authors).
Also to be considered is the problem/question of how an administrator of one server knows when a document on another Web site has been moved or removed. In the current climate, the answer is that they will rarely know.
Hyperwave [Andrews et al., 1995, Maurer, 1996], which was developed to address these problems, has been used for the management of all electronic material in our Internet-based learning environment.
Figure 1: The dangling link problem.
Online guides and lists of links also suffer from this problem to a lesser extent. It is of course much easier to alter a Web page to reflect a change in a URL than to reprint an entire book. Web site changes still require manual alteration of the document in question, and more importantly, someone must be available first to notice that a particular link is now out of date.
What is needed is a method of specifying links in relatively stable Web documents and printed material in a manner which allows automatic notification of broken links and provides a simple method of correcting or re-directing these links.
Before moving on to the problem of links to documents stored in external Web servers, we first highlight a few key built-in features of Hyperwave.
Hyperwave was developed to store high-volume hypermedia resources, which can be spread over multiple servers [Kappe, 1995]. The long-term maintenance and development of databases becomes crucial when databases grow beyond a certain size. To expedite management, particularly of links, a more structured approach needs to be pursued right from the beginning [Flohr, 1995, Maurer et al., 1994].
Hyperwave follows a more sophisticated concept than most available Web server systems. The project's aims are [Andrews et al., 1994]:
The core of a Hyperwave server is formed by a set of object-oriented databases. It supports customary database features such as the assigning of attributes to objects, indexing, and searching. The server also maintains a hierarchy of user and group information. The following attributes can be assigned to any document:
Hyperwave also addresses this problem on a 'site' basis. The details of a link within a document stored in a Hyperwave server are kept separate from the actual document itself. The server is always aware of exactly which document a particular link refers to and, more importantly, which documents make use of the link.
When a document is moved on a Hyperwave server, all links referring to that document are automatically updated. Similarly, when a document is deleted, all links referring to it are 'deactivated'. As soon as the document referred to is restored or a new one created, the link becomes active once more. This automatic maintenance of local links also works across multiple Hyperwave servers.
A full description of Hyperwave, including all technical details, can be found in [Maurer, 1996].
However, Hyperwave does not check that links that point out of Hyperwave documents to documents on remote Web servers remain intact and this is the topic of the rest of this paper.
Hyperwave allows links to remote sites to be loaded as entities in their own right. When a user accesses a link in a local document, instead of connecting directly to the remote document, it simply connects to the local remote link object. The remote link object automatically refers to the remote document and this redirection is virtually transparent to the user.
Figure 2: De-referencing a link.
By using and maintaining the remote link objects, several features can be implemented:
Another complicating factor is that the content of a document may change to the point where it is no longer relevant in its original context. The current relevance of a document must ultimately lie in the hands of the author who decided to use the link in the first place. It would therefore be sensible for the link cache to report to the author that a remote document has been altered.
The HTTP protocol used by Web servers allows document information to be retrieved without actually downloading the entire document [http://www.hmu.com/web_links/http2]. The remote documentís modification time is ideal for noting changes in a remote document. This modification time can then be added to the remote link object as an attribute for future reference.
After creating the original remote link object, and checking the existence of the remote document, the text part of the document can be cached (ignoring images and other embedded items). The remote link object can then use the cached document in the event that the remote document becomes unavailable. When supplying the temporary cached document, the viewer should be informed that the document they are seeing is not the original and they should be allowed the option of trying the original remote document.
Figure 3: Supplying the cached document.
When a user on the Hyperwave system decides that they want the external links in a document to be maintained, they add a custom attribute "CacheURLs" to the document. This flags the document to have all external links de-referenced by the link_setup program, which runs as a CRON job once a day. The user also has the ability to run link_setup immediately if they wish, via an HTML form and CGI.
Once it executes, link_setup processes each document on the server with a new CacheURLs attribute. It removes all URLs linking off the local site (with the exception of those accessing CGIs) and creates remote link objects in the Remote Link Cache (RLC). The HTML of the source document is altered to use the links in the RLC (see figure 3). As each remote link is created, a text-only version of the remote document is downloaded and stored in the Remote Document Cache(RDC).
The CacheURLs attribute is altered to indicate a successful load and an attribute is added to the remote link object, indicating the last modification time (and date) of the remote document. Finally, the user is emailed the results of the load, stating which links were successfully loaded and reporting any problems.
The link_maintenance program also runs as a CRON job but on a more frequent basis. Under normal circumstances (ie. there are currently no broken links) it will attempt to verify all links in the RLC every 6 hours. If there are links which have previously been found broken and are using temporary documents from the RDC, then link_maintenance will check these links once an hour until they are either corrected or deemed deleted.
Upon execution, link_maintenance checks each remote link in the RLC. First it attempts to get the last modification time of the remote document referenced and compares this to the modification time attribute of the remote link object. If these time stamps are identical then no further action is required for this link, otherwise the remote document must have been changed and the owner of the remote link is informed via email.
In the case of a deleted remote document, or a non-responding site, the remote link object will be re-directed to the cached version of the document. The link owner is then informed of this action. Links that are still not resolved after a week are deemed to be lost and the link owners are emailed once again, with the advice that they should remove the link permanently.
The actual work involved in creating the raw functionality was quite simple compared to the complications encountered in making the system as convenient and transparent to the user as possible. Other than improving the user interface, most of the future work required will be in determining the optimum periods for checking links (both working and broken).
[Andrews et al., 1995] Andrews, K., Kappe, F., Maurer, H. and Schmaranz, K., (1994). On second generation hypermedia systems. In proc. ED-MEDIA 95 - World Conference on Educational Multimedia and Hypermedia, pages 75-80. AACE.
[Flohr, 1995] Flohr, U., (1995). Hyper-G Organizes the Web (1995). Byte, November, pages 59-64.
[Harley Halm & Rick Stout] Halm, H., and Stout, R. The Internet Yellow Pages. Osborne, McGraw-Hill.
[Jones and Pitkow, 1995] Jones, R. and Pitkow, j., (1995). Atlas: Supporting the Web with a Distributed Hyperlink Database, http://www.cc.gatech.edu/grads/j/Kipp.Jones/atlas/boston/atlas.html
[Kappe et al, 1993] Kappe, F., Maurer, H., and Sherbakov, N., (1993). Hyper-G ? A universal hypermedia system. Journal of Educational Multimedia and Hypermedia, 2(1):39--66.
[Lennon and Maurer, 1994] Lennon, J., and Maurer, H, (1994). Lecturing technology: A future with hypermedia. Educational Technology, 34:5--14.
[Kappe, 1995] Kappe, F., (1995). A scalable architecture for maintaining referential integrity in distributed information systems. Journal of Universal Computer Science, 1(2):84--104.
[Maurer et al., 1994c] Maurer, H., Philpott, A., and Scherbakov, N. Hypermedia systems without links. Journal of Microcomputer Applications, 17:321--332.
[Maurer, 1996] Maurer, H., (1996). HyperWave: The Next Generation Web Solution.. Addison-Wesley, UK.