Current Awareness Services at Tilburg University
Hans Roes [1]
Tilburg University Library
PO Box 90153
5000 LE Tilburg
The Netherlands
This is a HTML version of an article which appeared orginally in The Electronic Library 11(2) April 1993. The text may differ slightly here and there but the content remains the same.
Introduction
Library automation in the past twenty years concentrated on the use of computers in traditional library services. First of all administrative processes were automated, next the card catalogue was transformed into an Online Public Access Catalogue (OPAC). These traditional processes were very much book oriented, while in the same period the importance of journals in the scientific process has grown dramatically. To cope with the growing demand for disclosure of articles, mainly outside the library world socalled abstracting and indexing services (AIS) emerged. Some characteristics of AISs however make them less well used by library users as the OPAC. This was one of the reasons for Tilburg University to start the socalled Online Contents project in 1989. Aim of this project was to realise, from the user's point of view, a disclosure of the journals collection in terms of articles similar to that of books. A service comparable to ISI's well known Current Contents, but mapped to the journals collection of Tilburg University.
This article describes the development of current awareness services at Tilburg University, a comparison with similar projects is beyond the scope of this article. First a short description is given of the context of the development of the Online Contents service, a more embodying program towards a modern library concept. Next the relative merits of AISs are discussed from a user's point of view, providing considerations for developing a current awareness service for the library's journals collection. Then special attention is paid to the retrieval software used for the services; Verity's Topic. Finally plans for the future will be discussed.
The Tilburg HT DIC program
The Online Contents project is part of a more ambitious program to increase library services to users. This is the socalled High Tech Documentation Information and Communication program [2], worked out in cooperation between the library and computer centre of Tilburg University. The central two aims of this program are knowledge navigation and desktop integration.
Knowledge navigation means the guidance of users at any level of experience in a world of rapidly growing possibilities for heterogeneous online information services. Besides the OPAC, Tilburg University library offers its users access to fifteen other services, varying from a Campus Wide Information System to AISs in the field of computer science (the Excerpta Informatica Online Databases), to the Online Contents service to be described below. Recently acces to the Internet has been established and soon OPACs of other university libraries in the Netherlands will be added with the realisation of the Open Library Network. To give users a hand in this massive bulk of information the KUBguide was developed, a central user interface to all these services. This interface is developed further to incorporate knowledge about the services that can be reached via it, including eventually knowledge about the peculiarities of the retrieval engines of these services. [3]
Desktop integration, the second keyword, led to a user desktop from which information can be retrieved, processed and edited to create new products of scientific information. This goal was reached with 386SX networked DOS pc's with an MS Windows graphical user interface. The KUBguide as well as other university (VAX) minicomputers can be reached from the integrated desktop which at the same time offers still growing possibilities for file transfer, database management, statistical packages, spreadsheets, word processing and graphics software as well as more general tools. With extensive funding from the Dutch department for education 250 of these integrated desktops for students were installed in the new library building which opened its doors in March 1992. At the same time faculty equipment was updated, involving another 1200 pc's.
AISs compared to the Online Contents service from a user's point of view
As stated in the introduction abstracting and indexing services are very important tools in the disclosure of literature published in periodicals. From a user's point of view though there are some characteristics which make them less well used. This is undesirable because of the importance articles have relative to books in the scientific process nowadays. These characteristics are:
- Most AISs are expensive and usually are not well fitted
into library budgets. This implies very often that sometimes
highly unpredictable costs of searches are passed on to users.
Where the OPAC is usually for free this means a first barrier
for users.
Of course, developing and maintaining a similar service for a library's journals collection is also costly. However these costs are predictable and not dependent of the use made of the service. In september 1992 for instance there was a grand total of 779 hours of searching in the Online Contents service. Part of this time was probably idle, but compared to the $ 102 per hour which it costs to search the Current Contents file on DIALOG this gives room for developing and maintaining a service. - AISs do not match the library collection the user is familiar with at his own institution. This causes two kinds of problems. One is finding references to articles in journals not present in the institution's library which means going into the sometimes time consuming effort of obtaining these articles. For exhaustive literature searches consulting one or more AISs will of course remain important. The second problem is that journals present in the institution's collection may not be covered in the particular AIS consulted leading to a possible underutilisation of the collection. This is not so much a problem for users, but it certainly is a problem for a library trying to optimise it's journals collection with scarce resources. The opportunity for obtaining management information on the use of journals was an important consideration in developing the Online Contents service.
- There is a proliferation of AISs in commercially attractive fields like biomedicine and computer science, but 'less important' subject fields like for instance philosophy are badly covered. In the first case the user faces an ample choice which often calls for help of information specialists, in the second case there is hardly a choice.
- Many AISs suffer from severe time lags which typically range from three to six months. This time lag has to do with the labour intensive production process which usually implies manual data entry of self written abstracts and assigned subject headings for later retrieval purposes.
Despite these considerations AISs perform an important task in the disclosure of articles in periodicals and will remain to do so for exhaustive literature searches. From the user point of view however, AISs are much less friendly than the OPAC. A service which discloses a library's periodicals collection in terms of its articles in a way similar to the one in which the OPAC discloses books, adds important functionality for users and should lead to a better collection management tuned to the needs of users.
The Online Contents service
As stated above, goal of the Online Contents project was to realise a service for the disclosure of articles in the journals collection of Tilburg University. The total number of journals is about 3000, of which 1600 were selected for coverage in the service. Of course, the service could not cost too much, so an approach like the one used by AISs was impossible. The goal can however be reached if a combination of cheap data entry can be combined with strong retrieval tools. The main costs of AISs derive from the laborious process of writing abstracts and assigning subject headings. Therefore the choice was made to concentrate on the contents pages of journals only. These contain at least article title and author information. And in the most simple case the service can be used for socalled current awareness searches: a user is able to view from his desktop the contents of the latest issue of his favourite journals. But since most article titles in scientific journals are quite expressive about the content, the idea was that a strong text retrieval package would enable users to conduct subject searches as well, even without assigned keywords. With the growing of the database, it was thought that within two or three years the service would gain enough substance to perform meaningful, though not exhaustive, subject searches as well.
The cheap data entry problem was solved by using scanning and optical character recognition (ocr) which was maturing at the time to a technologically and economically feasible solution. This method also allows for speed in the data entry process. As new issues arrive at the library they are administered before going to the scanning department. At daytime the journals' contents pages are scanned, while overnight the OCR process is carried out automatically. The next day correction, editing and tagging takes place and these contents are entered in the database at the time the issues reach the library's shelves. Depending on the timeliness with which journal issues arrive at the library this means a considerable gain in speed as compared to AISs.
In this context it is interesting to note that a rough comparison of the scanning/ocr method with manual data entry of contents pages showed that both methods are almost equally time consuming! The first method is faster when for instance relatively exotic names and terms have to be entered, the second one is faster if the contents are in the native language of data entry personnel. It should be noted however that no quality comparison was carried out [4]. There are though typical ocr errors like for instance an 'm' recognized as 'ni' or 'in'.
A partial view of a contents page in the database is given below:
Help Search Topics Filters Documents Exit __ Document Viewer_____________________________________________________________ The electronic library : the international journal for minicomputer, microcomputer, and software applications in libraries T-nummer : 6911 . Jaar : 1992 . Volume : 10 . Nummer : 2 . UDC : 69(05) / 029(05) / 02(05) . Invoer : 17-jun-92 . Inhoudsopgave --------------------------------------------------------------- The Electronic Library Manager's Guide to the Silk Road 83 Small can be beautiful: Automation efforts at the Nigerian Institute of International Affairs Library Adeniran,O.R. 87 _______________________________________________________________________________ Document 1 of 5 Line 1 of 30 Online Contents Database Topic Copyright(c) 1988, 1990 Verity, Inc.
Figure 1.
Contents of The Electronic Library, vol. 10, no. 2
At the time of developing these procedures it was realised that scanning and ocr'ing is inefficient from a more broad point of view. In fact, printed material is turned into machine readable form where it in a considerable number of cases probably existed in machine readable form before it got into print. Since most publishers have automated their production processes, why not try and obtain the machine readable contents of the publishers directly ? This idea has been put to a test in a second current awareness service which was developed in cooperation with Elsevier Science Publishers (ESP). Of course this possibility had been realised before and one of the criteria for choosing retrieval software was that it should be able to cope with texts in several native formats, ranging from ASCII to popular word processing formats and SGML and ODA. Elsevier had developed a database containing article heads in SGML format and was willing to deliver these heads on a monthly basis for experimental purposes. This implies that the record format for the Online Contents database differs from the ESP database. Where in the first one the record is the contents page, in the second the record is the article head. The article head offers in fact a record comparable to the one offered by most AISs, containing besides the journal information and article title and author, an abstract and keywords assigned by the editors. The difference with an AIS is -of course- that article heads are prepared before journals go into print. Coverage of the ESP database is restricted to those journals to which Tilburg University subscribes, maintaining the mapping between the service and the journals collection.
The retrieval software used [5]
As stated above, cheap data entry has to be combined with strong retrieval software to build a service which besides current awareness searches offers the possibility for more general subject searches in an otherwise bibliographically poor environment. If the assumption is made that titles of articles in scientific journals are expressive about the content of the articles -an assumption which of course does not hold in all cases- then there is an opportunity.
The current awareness services at Tilburg University use Topic, a content based retrieval package of Verity Inc. Topic was developed mainly for full text retrieval but has advantages which make it useful for the applications developed in Tilburg for current awareness.
First of all Topic offers a good set of Boolean operators with, besides the traditional ones has the proximity or adjacency operators PARAGRAPH, SENTENCE and PHRASE, and operators which can work with weights assigned to the seach term: AND, OR and ACCRUE. The traditional Boolean operator AND is called ALL, and the traditional Boolean OR is called ANY or WORDGROUP, the difference between the latter two being efficiency in searching. ACCRUE requires some explanation. It can be thought of as an in between of AND and OR. Where an AND query scores only when all the search terms are present, and an OR query scores if at least one of the search terms is present, an ACCRUE query scores a document or record, higher the more search terms are present. For instance a query of the form
ACCRUE ("apple" "bull" "commodore" "digital")
with weights of 0.5 attached to each of the four search terms would score 0.5 if only one of the search terms is present, 0.75 if two of the search terms are present, 0.87 if three of the search terms are present and 0.93 if all four of the search terms are present. Records are ranked by their score in Topics 'Results Browser'. Verity has labeled this mechanism as 'relevance ranking'.
Secondly, queries, or topics as they are confusingly called by Verity, can be edited with a graphical-like user interface, where the query or topic is represented as a tree, where root and branches are queries ending in leaves which are the literal search terms. At the nodes of the three the operators are specified and some operators allow to specify weights for search terms. An example of a simple topic is given in figure 2. In this way simple queries can be combined to built more complex queries without having to worry to much about Boolean syntax (forgotten parentheses, inside out evaluation of a Boolean query).
Help Search Topics Filters Documents Exit __Topic Editor_________________________________________________________________ PARENTS CURRENT CHILDREN /- 0.80 information-retrieval-uf-- --information-scie----information-retr--+- 0.80 information-retrieval-rt-- Accrue |- 0.50 bibliographic-retrieval -- |- 0.50 fact-retrieval -- |- 0.50 legal-information-retrie-- |- 0.50 retrieval-system -- |- 0.50 sdi -- \- 0.50 text-retrieval -- ________________________________________________________________________________ information-retrieval KUB/ESP Current Awareness Service Topic Copyright(c) 1988, 1990 Verity, Inc.
Figure 2.
Example of a topic-query
From figure 2 a third advantage is clear. Topics can be named and as of Topic version 3.0 can be annotated. Combine this with the fourth advantage that topics can be saved in the database and in effect one could state that topics in a way are a system for knowledge representation and automatic keywording. A good query can be stored in the database and given the name of a classical subject heading. Less experienced users can search for these 'subject headings' in the 'Topic Browser' and start a search with them. The topic will not only score documents/records which were present in the database at the time the topic was built but will also score on documents newly added to the database. The topic in effect incorporates the knowledge of the information specialist which has built it and can be seen as a keyword searching for the documents it discloses. The world has turned upside down. Instead of assigning subject headings to each record added to a database, as is being done in traditional AISs, information specialists now build queries which act as guided-missile-subject-headings.
Some final aspects to be mentioned here are that topics are indexed for fast retrieval performance; users may edit existing topics and save these edits without affecting the 'system topics'; and that a Topic database is organised in partitions which are searched incrementally. Once a partition is done, results are put on the screen, while searching goes on in other partitions. The partitioned architecture not only gives fast -intermediate- results but also opens up the possibility for distributed processing by spreading partitions over several CPUs.
Topics in practice
While Topic offers great possibilities for full text retrieval in theory, it is hard to make full use of these possibilities in practice. Just imagine the work to translate a traditional decimal classification system into a system topic set. Also there is something of a culture shock among information specialists when first confronted with topics as retrieval tool. A third problem at the moment -not specific to the software- is that the record structure of the Online Contents database is not very well suited for making full use of topics. Since the record is the contents page of a journal, careful use of operators is called for to prevent false hits where one search term is found in one article title, and another search term is found in another article title. Only by making use of the proximity operators this problem can be solved. But using proximity operators implies that weights cannot be used in the queries which means loss of relevance ranking. For this reason, but also because a project is underway to create a shared database for article data in the Netherlands, a new Online Contents service is being developed at the moment were the record is no longer the contents page but rather a set of data on one article. In figure 3 an example of such a new record is given. This record structure also allows future extensions of the service with abstracts. Further aspects of the Dutch initiative for a shared Online Contents are discussed below.
Help Search Topics Filters Documents Exit __Document Viewer______________________________________________________________ TITEL: Software for information storage and retrieval tested, evaluated and compared, Part II - Classical retrieval systems DOOR: Sieverts,E.G.; Hofstede,M.; Haak,,Ph.H.; Nieuwenhuysen,P.; Scheepsma,G.A.M.; Veeger,L.; Vis,G.C. IN: The electronic library : the international journal for minicomputer, microcomputer, and software applications in libraries Jaar : 1991 Volume : 9 Maand : Nummer : 6 Pagina : 301 ________________________________________________________________________________ Document 7 of 62 Line 1 of 19 OLC2 test database Topic Copyright(c) 1988, 1990 Verity, Inc.
Figure 3.
Example of a record in the new Online Contents database
For the ESP database mentioned above a rather different approach for the generation of system topics has been chosen. An important difference between the ESP database and the Online Contents database is that the first is rather homogeneous. The journals covered are mainly in the field of computer science and they are all in the English language. The Online Contents database on the contrary covers all the subject fields of a library in a gamma-oriented university, ranging from economics and law to linguistics and philosophy with a lot of social sciences in between. On top of that journals in different languages are covered in this database.
Following an idea of Verity's Cliff Reid, the Excerpta Informatica thesaurus, a thesaurus in the English language and in the field of computer science, was programmatically converted into a system topic set for the ESP database. This generated a topic set which works rather well.
For the Online Contents database a simple topic set for retrieving quickly issues of a particular journal was also programmatically generated. When the Online Contents service new style is operational these topics will prove to be very helpful, since with them it is possible to re-generate the contents page of an issue of a particular journal, thus maintaining current awareness possibilities.
Another approach which should gradually lead to a well equipped topic library is described below.
Future developments
At the moment three goals are important in the further development of current awareness services. One concerns the implementation of an SDI module, the other national cooperation in the Netherlands, the third is the development of a document server. This paragraph shortly describes these developments.
For SDI purposes Topic's Batch Profiler is scheduled for launch in autumn. With the Batch Profiler users will be alerted by electronic mail if new records of their interest are added to the database. The Batch Profiler uses topics to determine wether or not records are of interest to a particular user. In this way a nearly automated SDI service will be implemented. The trick is to have information specialists of the library to built these topics for faculty and staff, and in this way to harvest little by little a set of system topics in a demand driven fashion.
The project in Tilburg has drawn national attention and with a grant from the Dutch government a team from Tilburg University, the Dutch Royal Library and PICA, the Dutch organisation for library automation, has developed a format for a shared Online Contents service along with procedures for the exchange of Online Contents records. This work was finished in June 1992 with a similar service becoming operational at the Royal Library [4]. Via PICA, records will be exchanged between Tilburg and the Royal Library and the model can easily be extended to incorporate other libraries. Production of records will not only take place at the two libraries mentioned since from January 1 1993 on, Swets subscription service will produce records for 7000 journals which are in frequent demand in Inter Library Loan (ILL) traffic. The current awareness services will form the basis upon which a fast document delivery system between Dutch university libraries will be built. Also cooperation with other publishers is sought to obtain records from their sources as well. Wolters Kluwer Academic Publishers has stated its intention for a similar cooperation as between Tilburg University and Elsevier.
The third development is in line with the above mentioned fast document delivery system for ILL, but takes the idea one step further. Why not give the same service to our faculty and staff and let them order online copies of articles to which they found references in the current awareness services? If a request for an article comes in it will be scanned and stored for later reuse. The challenge is to get the images over the network and display them on endusers' workstations. Of course a service of this kind needs to be worked out in close cooperation with publishers because of the copyright issues involved.
Author information
Hans Roes (1957) studied economics and is librarian for economics and computer science at Tilburg University Library since 1990. He is manager of the current awareness services project and the document server project and was a member of the Dutch project team for a national Online Contents service.
References
- I would like to thank the two anonymous referees for their comments on an earlier draft of this article.
- This program is described in more detail in two
publications:
- The New Library and the Development of Innovative Information Services at Tilburg University. - Tilburg : Tilburg University Press, 1989.
- Documentation, Information and Communication at Tilburg University : Plan of Action - Research - Services / ed. by L. Wieers. - Tilburg : Tilburg University Library, 1990.
- KUBguide can be reached via the Internet: log on to kublib.kub.nl with username KUBGIDS. Public access is restricted though.
- Producing a bibliographic database through scanning and OCR: the Online Contents Project in the Royal Library of the Netherlands/ Mathilde Ongering and Michel Wesseling. Program, vol. 26, no. 4, October 1992, pp. 393 - 399.
- This section does not attempt to give a full description
of the software. Some reviews are:
- Text Retrieval Software for Microcomputers and Beyond : An Overview and a Review of Four Packages / G.W. Lundeen and C. Tenopir. Database (Weston, CT) vol. 15, no. 4, 1992, pp. 51 - 63.
- Smart Document Retrieval / E.L. Appleton. Datamation (New York, NY) vol. 38, no.2, 1992, pp. 20 - 23.