Opportunities for electronic document delivery in libraries

Hans Roes, Tilburg University Library


Text of a paper presented at the third series of the Telephassa seminars in June and July 1992. These seminars were jointly organised by Tilburg University, Autonomous University of Barcelona (Spain) and University of Patras (Greece). The text has been published in the Proceedings of this seminars (see the list of publications). A slightly edited version of this paper was also published in:

Hans Geleijnse and Carrie Grootaers (eds.), Developing the Library of the future: the Tilburg Experience. Tilburg University Press. Tilburg 1994.

As with all of the HTML versions here: the text may slightly differ from what appeared in print, but the content remains the same.


The problem

Now that current awareness services are emerging on a larger scale and cheaper than offered by traditional abstracting and indexing services, the disclosure of articles published in journals is improving rapidly. This not only applies to local serials collections but also to serials collections in other libraries, nationally and abroad. However, finding references is one thing, getting the primary information referenced to is another thing. It took me for instance over four weeks to get hold of photocopies of an article needed in the preparation of this lecture. Of course this was an Inter Library Loan request. Fortunately I could find a lot of material in my own library using the Excerpta Informatica database and using the Online Contents database. Still it took a lot of time to gather all the primary information. In fact I am still waiting for material requested abroad so you can't expect me to give a comprehensive view on the subject of electronic document delivery. I will however try to point out the main issues in electronic document delivery at the moment and the sometimes difficult choices facing libraries in this area.

This lecture will focus first on the concept of electronic document delivery, this might seem an unambiguous term but it certainly isn't. From this the point of standards will flow rather naturally. Of course standards depend on what concept of electronic document delivery is chosen. Next the issues facing libraries in their relationship vis a vis publishers who have some plans of their own in the same field are dealt with. If there is time enough left I will tell something about the development of a document server at Tilburg University as an illustration of what has been said before. Finally I will conclude with a number of strategic issues facing libraries in the field of electronic document delivery.

The concept of electronic document delivery.

The idea behind electronic document delivery probably dates back to the days of the first computers. Cawkell (1) reports on projects in this field going on since the mid 1960 's, using early fascimile technology. Cawkell states that

"The phrase 'electronic document delivery systems' self-evidently implies the supply and reproduction electronically of the kind of information usually provided in the form of print on paper".

This is a very wide definition which encompasses nearly everything in the field of document delivery as long as something in the process is electronic. This may vary from the already mentioned fascimile technology to satellite, and the documents can be represented in many forms, from ASCII to various kind of bitmaps or combinations of both, and perhaps stored in all kinds of ways on all kinds of media.

On the other hand, Cawkell seems to restrict his definition to documents which are available in printed form. With the advent of electronic publishing it is possible that a document will never be printed at all. At the danger of causing even greater confusion one could also state that the whole concept of what a document is, is becoming rather vague with the advent of multimedia.

In this lecture I will follow Cawkell and restrict myself to the delivery of copies of articles appearing in printed journals. That means documents are copies of articles. Also the term electronic will be left as vague as it is with Cawkell, so we are able to deal with different kinds of electronic delivery.

I will be even more pragmatic in the sense that the main issue is the question implicitly raised in the introduction: how can an individual researcher get hold of primary information as easy and as soon as possible and in an efficient as possible way. Obviously this implies the need for information technology and so we come to the term electronic document delivery.

Obviously, if speed, easyness and efficiency in document delivery is the issue, the natural starting point for any document delivery service are the reference databases themselves. They contain all the bibliographic information that is needed to identify the request for an article in an unambiguous way. All the researcher needs to do is to make the request by adding his personal data, which to a large extent may already be present in administrative databases. An application connected to the reference databases could produce a worklist for document delivery personnel and the electronic world could stop here if document delivery personnel were to make photocopies and send them by snail mail to the applicants. We might term this as phase I in the development of an electronic document delivery system. The bottleneck to speed in such a system will be the amount of money we are willing to spend on personnel. The process is very labour intensive, but easy to implement and no standards are called for. In fact we only made online ordering possible. Such a system is implemented in the Excerpta Informatica database in Tilburg. Another example is Uncover2 of the Colorado Association of Research Libraries (CARL).

However one can easily see that this approach has a number of drawbacks. The first one is that each time an article is requested library personnel has to go to the shelves and make a photocopy, even if it is a request for an article that has been requested before. The second is scheduling and recognizing double requests if there is more than one reference database. The third is that Inter Library Loan requests are not easily dealt with. If we ignore for a moment the latter two drawbacks and concentrate on the first one it is obvious that the only solution is to have some sort of storage and retrieval mechanism which can reproduce copies of articles which are in frequent request. If we could solve this problem and at the same time had a mechanism for unique article identification for the case in which there are several reference databases, the application could be build in such a way as to automatically produce a hardcopy together with a header page containing the address of the applicant which could again be send by snail mail. The question is how to store the copies. Now that scanning equipment is maturing to the extent that scanning becomes an act rather equal to making a photocopy the most efficient way is to bitmap the articles' pages. If we could also send these bitmaps or images together with appropriate header information over the network, we could both speed up internal delivery and Inter Library Loan delivery. This is what we might label as phase II in the development of an electronic document delivery system. In this phase standards for storage, retrieval and sending of image files become of crucial importance. The ADONIS initiative resembles a phase II system, but not in all aspects.

To make a long story short, phase III would be to have the full text of the article available in machine readable form in the reference database itself, which would then be no longer a reference database but a full text database. Of course this requires the content of the article to be present in an other form than bitmaps. A phase somewhat in between II and III would be to have the images available from socalled hyperlinks in the reference databases, and in this way accessible from the researchers workstation. Again standards will be necessary. Perhaps we could think of a phase IV were there would be no longer a printed journal at the basis of the document delivery system, that is, truly electronic publishing. Lynch (2) mentions that this would not only require technical standards to which we will come in a minute but also standards in the way of how to deal with electronic publications in the scientific and library processes, for instance how to bibliographically describe an electronic journal article and how to cite it. The first fully electronic journal has however already appeared: "Current Clinical Trials".

Standards in electronic document delivery

As in any area of information technology also in electronic document delivery systems standards are important to guarantee portability of the system to take advantage of hardware developments (3) and therefore to protect ones investments in systems development. But standards are also important to make cooperation with other organisations possible. While the first argument is true for all IT developments the second is of special importance in the library world were the exchange of documents is every day business (4). The main standards in the field of document imaging have to do with storage, retrieval and communications (5). An important development for libraries in this respect is the work of the Group on Electronic Document Interchange, the GEDI (4).

The GEDI is a group of library organisations from several European countries and the United States which set as its task to define a framework for electronic document delivery providing standards and protocols based on existing OSI standards. The report of the group describes the service model, file format and file transfer mechanism. The GEDI proposals fit in what I have termed a phase II system, although storage (other than temporary) is not mentioned.

  • The service model advocated by the GEDI makes a distinction between GEDI domains and private domains. Agreement is only necessary in GEDI domains, not between private domains. The private domains are connected through GEDI domains were the relay functions between private domains and GEDI domains are specified. Via the GEDI domain an other private domain can be reached.
    GEDI model
    Figure 1 GEDI Service model(4)

  • The GEDI file format seperates electronic documents in two parts: the document information and the document image, or more general the representation of the document itself. The cover information contains:
    • information on the document interchange format itself,
    • information on supplier and consumer domain,
    • information on the applicant and
    • bibliographic information.
    For the document image GEDI chooses TIFF Class B. TIFF stands for Tagged Image File Format and is in the opinion of the GEDI the most widely supported image format. This implies support of CCITT Group III and IV compression algorithms. The GEDI states explicitly that since document information and document image, or more general, document representation itself are clearly separated, this document interchange format can easily accomodate additional formats for the representation of documents such as SGML and ODA. SGML and ODA will be discussed in a minute.
  • The GEDI file transfer mechanism presupposes on the low level of the model (layer 3, network) the X.25 protocol. On the application level (layer 7 in the OSI model) the GEDI chooses File Transfer, Acces and Management (FTAM ISO 8571). FTAM supports the transport of big binary files and is commercially available on different hardware platforms.

The GEDI proposals seem to be in line with developments in IT (3) and are backed by important library organisations like the British Library Document Supply Centre, OCLC and RLG. We can conclude therefore that adherence to these standards is safe in developing electronic document delivery applications.

As stated before, the GEDI proposals make a clear distinction between cover or header information and the document image itself and let open the possibility of additional formats like SGML and ODA. This opens the way to what I have labeled phase III in electronic document delivery since SGML and ODA deal with standards for representing full text documents.

SGML, Standard Generalized Markup Language is an ISO standard (8879) in the area of Information Processing - Text and Office Systems. SGML is a language in which it is possible to make a clear distinction between the structure and the appearance of a text. The language doesn't describe how a document should be structured but offers the opportunity to make such a description. In this way SGML is rather a meta language which tells us how to specify markup rather than telling us what markup is or means (6). This is reached by rules for describing socalled Document Type Definitions (DTD). The DTD is the actual description of the structure of the document. A DTD together with a document can easily be transferred from one computer system to another since both use plain ASCII, but through means of coding any structure -however complex- and any character can be represented. In fact interchangeability of electronic documents is one of the main reasons for SGML. Moreover SGML makes the storage of text on electronic media independent of the hard and software used.

Once a documents is represented in SGML it also becomes possible to make different uses of it. One important use is of course the input for the printing process, another important use is to store parts of or whole documents in databases for retrieval purposes (7). SGML is gaining ground in the publishers world, the Association of American Publishers has developed general DTDs (8) and Elsevier Science Publishers has an operational system for the production of socalled heads: bibliographic information, including abstracts on articles published in their journals, and Elsevier is investigating the possibility for SGML coding of full text articles (9).

Whereas SGML is becoming a standard in the publishing world, ODA, Office Document Architecture or Open Document Architecture, as it is also referred to, seems to emerge as an important standard in office automation. ODA uses a hierarchical and object oriented document model with a clear separation between structure and content and this clears the way for multimedia documents (10). ODA knows two standards for the representation of documents: ODIF (Office Document Interchange Format) and ODL (Office Document Language). In ODIF format ODA documents can be exchanged via electronic mail or via file transfer. ODL is more interesting since it is a standardised representation of an ODA document in SGML (11).

The similarity between ODA and SGML is that both are standards for the interchange of electronic documents, or electronic interchange of documents, and both are based on the same idea of distinguishing between structure and content. The main difference is that SGML hardly uses any semantics, but is rather a language to specify markup via DTD's, whereas ODA has its semantics clearly defined. SGML developed in the American publishers world, whereas ODA developed in the European world of computer manufacturers (12).

From the viewpoint of libraries, with there close relation to publishers, SGML is probably the standard to be reckoned with in the near future, especcially when phase III in electronic document delivery is about to begin. However, standards have a long way to go and even in the diversified publishers world, SGML is slowly making its way. Publishers are still a long way from fully automating there production processes. I think it is wise for librarians to follow SGML closely in the coming years. The demand for fast document delivery has however already reached publishers and so they started there own phase II electronic document delivery system: ADONIS.

Publishers initiatives in electronic document delivery

Of course, all this talking about electronic document servers is a very scaring thing for publishers, at least I would be concerned if I were a publisher. Just imagine a world were a small number of well equipped libraries would operate document servers and be willing to offer their services to other libraries as well in a fast and efficient way. The sheer possibility of close cooperation between libraries could make serials management an issue quite different from what we have today. If we think further the possibilities could become even detrimental to the whole scientific process or information chain, but I will get to that later when I will be discussing the strategic issues.

Of course publishers have been worrying longer about the issue of multiplication of their articles in libraries. In their excellent overview of the ADONIS project, Stern and Campbell (13) state that the original idea, to be dated in 1979, is

"that publishers can gain copyright revenue by supplying their journals in machine readable form for document delivery centres to print out individual journal articles on demand at lower cost than photocopying from back runs stored on shelves".

Other factors mentioned by Stern and Campbell have to do with the problem of the so called fair use clause in much legislation which allows for unlimited photocopying and, interesting enough, since Stern and Campbell are from the publishers world, the pressure on serials budgets facing libraries. It should be noted however that the ADONIS project has from the beginning sought a close cooperation with the library world. The British Library Document Supply Centre for instance has participated all the way. Important participating publishers are Elsevier, Springer, Blackwell. I will give a short description of what ADONIS is. For those who are interested, I have brought a demonstration disc.

  • ADONIS restricts itself to about 400 journals of about 30 publishers in the biomedical sphere, these restrictions have to do with market studies and with storage issues, the pharmaceutical industry offers the best commercial perspective,
  • the articles are scanned and indexed and stored on CD-ROM, were it should be noted that the index is a rough bibliographical one, with no subject indexing, which will become available through specialised abstracting services and can be linked to ADONIS via the ADONIS unique article identification number.
  • participating libraries receive one CD-ROM a week, containing about 10.000 bitmapped pages, which can be retrieved and previewed on low cost workstations. A jukebox is available.
  • the workstations generate a statistics file, which give important market feedback, but also form the basis for billing. Libraries pay a yearly flat fee of about 22.000 Dutch guilder and pay a copyright fee per article. The amount of the copyright fee depends on wether or not the libray has also a hard copy of the journal. If not 12 Dutch guilders have to be payed for each article printed, If the library has a subscription to a hard copy of the journal a fee of 6 Dutch guilders per article is due. (14)

From a study of the ADONIS trial period, Phil Barden of the British Library Document Supply Center concluded that the ADONIS service was competitive with conventional supply via photocopy. Barden however mentions that the substantial throughput at BLDSC provides economies of scale which are unlikely to be available at other document supply centres (15).

ADONIS seems to me a commercially viable service and obviously the publishers have thought the same since ADONIS is a commercial entreprise since the beginning of 1991. ADONIS is an example of a typical phase II electronic document delivery system. The cost however seems to me prohibitive at the moment, since we can savely assume that a manual document delivery service of one full time equivalent can operate 10.000 articles a year at an average cost of about 5 Dutch guilders per article, if we ignore the copyright issue, which we can in this case because of the fair use clause. This is not enough to take care of the substantial flat fee, the cost of equipment (which can be roughly estimated at about 5000 guilders a year with an initial investment of 30.000 guilders not including the jukebox), and of course, even an ADONIS system needs to be operated. This would indeed require a substantial throughput to reach the economies of scale that Barden mentions. It is not surprising that the pharmaceutical industry was chosen as the primary market target. What we saw in the history of abstracting and indexing services is happening again in document delivery, the first parts of society to benefit from new developments will be the richer parts. Libraries however have a bigger market to serve, which was one of the basic reasons for developing current awareness services of our own and the very existince of current awareness services calls for fast document delivery systems as I stated in my introduction.

The main problem with a project like ADONIS, and there are comparable projects in France originating in the library world, is that they choose a supply driven approach. Select a couple of hundred journals in fields where a lot of purchasing power can be expected, scan and store them all and try to sell as many articles as possible. This causes a tremendous overhead and leads to neglecting those parts of libraries serials collections which are in less frequent use, or serve a market with relatively little purchasing power. It also creates a dual system for document delivery, an electronic one together with a manual one. Not only are journals missed, but there is also a chance that some journals are delivered for which is there no demand in a specific library. Stated another way, one might say that an ADONIS like system is not very well integrated with other library systems and procedures. Publishers have however a strong position in the proliferation of ADONIS like systems since storage of articles electronically is a violation of existing copyright law. But it seems to me that arrangements can be made with publishers to cope with the copyright problem. More general considerations will be saved for the conclusion.

The Tilburg document server

The document server project under development in Tilburg tries to solve some of the problems mentioned with ADONIS like systems.

First of all we believe that a document server should be integrated with the reference databases we operate to disclose our own serials collection. The reference databases in Tilburg are of course the Online Contents Service but also the Excerpta Informatica Online Database. From both services users should be able to apply for copies of articles.

Secondly, an article is only scanned when there is a demand for it and only those articles are stored. When a request comes in for an article requested before, the document server should be able to handle the request nearly automatically.

Thirdly, the document server must be able to communicate with other document servers to handle Inter Library Loan requests from other libraries and those originating in our own library.

We expect very little overhead with an approach like this since all we do is upgrade existing manual document delivery systems while at the same time maintaining and even increasing integration with our existing library systems.

Crucial to this development is, apart from the already mentioned GEDI standards, a standard for unique article identification, especially since requests can come from different - possibly overlapping- reference databases, something like the ADONIS identification number. We haven't solved the issue yet but think that BIBLID (ISO 9115 Documentation - Bibliographic Identification (BIBLID) of contributions in serials and books) could be the solution.

The way the document server works can be explained by the following drawing.

General Model Document Server

Figure 2 General Model Document Server

Conclusion: strategic issues for libraries

In my outline I mentioned four strategic issues facing libraries in the field of electronic document delivery.

- user orientation: productivity of faculty and students

Mary Berger from Engineering Information states that:

"... engineers need answers, not pointers to answers. They want these answers in the form of full text documents, quickly and reliably" (16)

Surely this will apply to other scientists as well. Undoubtedly our users will be very pleased with a possibility for ordering online copies of articles of their interest. Even better would be if they could view the images on their own workstations and decide to print them out on local laser printers. If the scanning process can work fast enough the goal of information at your fingertips comes closer and closer. This would mean an improvement of the productivity of our faculty and students. An important issue will be though if we are able to deliver such a service at a reasonable cost.

- serials management and the predictability of acquisitions costs

Electronic document delivery offers great opportunities for serials management since it offers by its very nature the ability to monitor the use of our collections. This is another reason for integrating systems like these in our existing library systems and procedures. The predictability of acquisitions costs can however be impaired in two ways. First of all there is the widespread and justified fear that costs will rise absolutely, because hardcopy subscriptions will exist side by side with the cost of maintaining and expanding the electronic collection in the form of an image database. I think this category of costs can be predicted and is a necessary investment if we want to improve access to primary information, full text documents. What will be of more concern is the way in which copyright issues will be cleared. The most probable solution will be one were a fee per article will be paid to copyright clearance centers. These costs depend on the amount of articles demanded and the easiest solution would be to pass these costs on to users. I think it is important here to make a distinction between our commercial activities and our primary activity which is serving our faculty and students, but I am not sure publishers will appreciate this distinction.

- opportunities for cooperation between libraries

Electronic document delivery can greatly improve the efficiency of Inter Library Loans. This requires however that libraries agree on standards, not only in the technical field which is well covered by the GEDI proposals, but also on procedures (17). The opportunities are worthwhile and cooperation offers benefits for all parties.

- relation with publishers

Publishers are very well aware of the threats that electronic document delivery poses to their business. Karen Hunter, vice president of Elsevier, states that "the days of doing business as usual are over" and has listed ten issues for every publisher of which the last one is "Decide to defend your rights" (18). The problem for publishers is the reverse one of the one facing libraries in the predictability of acquisitions costs. The predictability of turnover will be impaired once the focus is shifting away from subscriptions to journals to income from copyright on individual articles. I think solutions for this copyright problem have to be found but will not be easy. Brownrigg and Lynch proposed an analogy with performance rights (music) since:

"The distinctions between display, performance and replication may ultimately prove to be of little use with electronic media and the new kinds of work that inhabit these media." (19)

With electronic document delivery systems of phase II and over, libraries will take over part of the publishing function. Indeed they will be publishing on demand, as there is no real distinction between document delivery and publishing on demand except for which party performs the document supply function. However, if we remind the advance we seek for our faculty and students, I think we can find common ground for libraries and publishers. It is my belief that in the end the convergent forces between libraries and publishers will be stronger, because these forces have to do with long term interests. These long term interests are:

  • first of all university libraries and scientific publishers work for the same group, faculty and students. Both parties gain with a flourishing research climate and conversely, both parties would suffer if a competitive struggle would lead to an improductive climate for doing research and for publishing or sharing the results of this research,
  • secondly I believe it is of great importance to have the -what I will call- authorisation function carried out by the market. Universities need a third party to decide wether other not the results of their research will be published. This garantuees an independence which is best governed by market forces. Perhaps not the best alternative but I don't know of any better.
  • the third point is that regardless of which party will be performing the document supply function, libraries will always have a function in the disclosure of information. First of all there is an economic reason: only non profit institutions are able to assist their users in the retrieval of information. Secondly for a more philosophical reason: it is hard to imagine a world in which publishers would control a significant part of the information chain except in a very monopolistic market. This implies at least a gateway function for libraries. On the other hand it is my belief that for an economic reason a cooperation between libraries and publishers would be best in the short and long term. Libraries could make agreements with publishers to change their subscriptions in a way which would allow for a price proportional to use, publishers would then be rid of the problem of dealing with many many individual users. This would better fit in a world in which the focus is shifting from journal to article, but would introduce uncertain- ties for both libraries and publishers. Only by cooperation between libraries and publishers something can be done about these uncertainties.

References

  1. A.E. Cawkell, Electronic document delivery systems, Journal of Documentation, vol. 47, no. 1, March 1991, pp. 41 - 73.
  2. C.A. Lynch, The development of electronic publishing and digital library collections on the NREN, Electronic Networking, vol. 1, no. 2, Winter 1991, pp. 6 - 22.
  3. N.D. Natraj, Architectures and standards: considerations in document image systems, Document Image Automation, November - December 1991, pp. 333 - 336.
  4. Electronic Document Delivery. Towards further standardization of international interchange. Agreements of the Group on Electronic Document Interchange (GEDI), October 1991.
  5. R.E. Wiggins, Document Image Processing - new light on an old problem, International Journal of Information Management, 1990, pp. 297 - 318
  6. D. Barron, Why use SGML ?, Electronic Publishing, vol. 2, no. 1, April 1989, pp. 3 - 24.
  7. J.G. Kircz, J. Bleeker, The use of relational databases for electronic and conventional publishing, Journal of Information Science, 1987, pp. 75 - 89
  8. J. Grootenhuis, Standard Generalized Markup Language: tekst en uitleg, Informatie Management, 1990.
  9. J. Bleeker, Standard Generalized Markup Language. Een gestandaardiseerd coderingssysteem voor betere informatieoverdracht, Open, vol. 21, no. 5, 1989, pp. 180 - 185.
  10. H. Brown, Standards for structured documents, The Computer Journal, vol. 32, no. 6, 1989, pp. 505 - 514.
  11. L. Ottes, A.J.G. van Rijen, Office Document Architecture, De standaard voor elektronische documenten, Informatie, vol. 33, no. 7/8, pp. 465 - 552
  12. W. Appelt, Normen im Bereich der Datenverarbeitung, Informatik-Spektrum, no. 12, 1989, pp. 321 - 330.
  13. B.T. Stern, R.M. Campbell, ADONIS, Publishing journal articles on CD-ROM, Advances in serials management, vol. 3, 1989, pp. 1 - 60.
  14. J.A.W. Brak, Elektronische documentleverantie via het ADONIS project, Open, vol. 20, no. 10, 1988, pp. 346 - 348.
    B. T. Stern, H.J. Compier, ADONIS - Document delivery in the CD-ROM age, Interlending and Document Supply, vol. 18, no. 3, 1990, pp. 79 - 87 ADONIS News, vol. 2, no. 2, November 1991.
  15. P. Barden, ADONIS, The British Library experience, Interlending and Document Supply, vol. 18, no. 3, 1990, pp. 88 - 91.
  16. M.C. Berger, Document delivery by database producers - closing the loop, Information Services & Use, vol. 8, 1988, pp. 195 - 200.
  17. J.J. Branin, Delivering on promises: the intersection of print and electronic information systems in libraries, Information Technology and Libraries, vol. 10, no. 4, december 1991, pp. 321 - 331.
  18. K. Hunter, Document Delivery: Issues for Publishers, paper presented at the Annual Meeting of the AAP Professional and Scholarly Publishing Division, Washington, 14 February 1992.
  19. E.B. Brownrigg, C.A. Lynch, Electrons, Electronic Publishing, and Electronic Display, Information Technology and Libraries, vol. 4, no. 3, september 1985, pp. 201 - 207.