Inputting CD-ROM records -- a comparison of bibliographic software
Thomas J. Walker and P.M. Choate. 1994. BioScience 44:269-271

By permission: Copyright 1994 by the American Institute of Biological Sciences


Compiling a computerized bibliography for biological research was once an onerous task that could be somewhat eased by expensive on-line searching. Now the parts are in place for such a compilation being relatively easy and inexpensive: research libraries purchase or lease computer-searchable versions of the major literature indexes; the researcher uses keywords, category codes, and Boolean operators to find pertinent records, often with abstracts, and downloads the records; bibliographic software imports the records into the researcher's growing bibliographic database.

How well and how easily downloaded records are imported should be critical in a biologist's choice of a bibliographic program. Such programs are generally good at selecting, sorting, formatting, and exporting records once they are imported or keyed in correctly. No other feature is likely to be as important for saving the researcher's time as is ease of importation. Once records are in any bibliographic database program and parsed into appropriate fields, they can be transferred to other, more feature-rich database programs (bibliographic or otherwise) that lack multiple, literature-index-specific import templates.

While teaching a course in information techniques in research to incoming graduate students in the biological sciences, we discovered major mismatches between the records that students downloaded and the bibliographic software that biologists use. To better advise students what to expect, and in hope of stimulating software producers to improve their products, we undertook to quantify importation problems and to rank the bibliographic programs that biologists are most likely to use.

Working with the databases

Seven literature-indexing databases were selected for their importance and availability to biologists. All were on CD-ROM except Current Contents, which had been transferred from weekly floppy disks to the hard drive of one of our library's work stations. In the list below, CD-ROM disks that were searched are identified by date. Searching software, supplied with the database, is in parentheses.

We downloaded 100 records from each of the indexes, evenly distributed among the listed disks. For each disk, we first searched for records with mosquito and polymorphism. If that yielded fewer records than needed, we searched for insect and polymorphism and then for insect# (words beginning with insect) and #polymorphism. Duplicate records among searches were eliminated by combining search results. When more records were found than needed, we stopped downloading when we had filled our quota.

Evaluating the bibliographic database programs

As one exercise in our course, students find out how researchers cope with scientific literature. In interviews of more than 90 researchers in the past five years, students reported six bibliographic programs. Excluding Sci-Mate, which is no longer sold, we evaluated the five remaining programs and the supplementary programs required by three to import downloaded records:

The versions we tested were for the PC except Pro-Cite, for which an updated program was available only for the Macintosh.

Following the directions that came with each program, we attempted to import the 700 downloaded records. To make sure that any difficulties we encountered were not merely the result of our inability to understand the directions, we sent a draft of our manuscript and the records we had downloaded to each software company and gave them two weeks to phone or fax better instructions. All five companies sent more recent versions of their software (the versions listed above). In four cases, the newer software performed better, and we report here only the improved results. To quantify how well the programs imported the records, we set up the following four categories:

To produce a score that summarized how well a program handled the 100 records from a database, we gave one point for each category 1 and half a point for each category 2. We gave no points for the other two categories because the program had failed to import all information needed. We also evaluated two other features of the import function: how records that duplicated existing records were handled and what information was provided to the user concerning rejected records and incomplete records.

Only Reference Managers accepted records from all databases, and in only 7 out of 35 attempts did a program correctly import all records from a database it accepted (Table 1). Reference Manager achieved the highest overall mean score (91), followed by Papyrus (79), Endnote (75), ProCite (64), and Ref-11 (0).

The programs differed greatly in how they handled duplicate records. Papyrus asked at the beginning of the import session whether the user preferred to be prompted at each duplicate or whether the program should automatically import or reject duplicates or automatically substitute the imported record for the duplicated record in the database. Reference Manager afforded the options of importing duplicates or placing suspected duplicates into a file for later perusal. ProCite paused at duplicates and asked whether or not to import. Endnote did not identify duplicates.

If a program fails to import records, the researcher must find the rejected records and manually key them in or modify them to make them acceptable. The programs had important differences in how much help they gave the user in identifying rejected records. Reference Manager and Papyrus provided the user with a file of rejected records. The other programs gave no help, forcing the user to manually identify rejects by comparing the records imported with the records tendered.

Similarly, the user benefits if the program identifies records that lack information in one or more of the fields needed to produce a bibliographic entry. Papyrus, but none of the others, flagged incomplete records and gave a count of such records at the end of each import session.

Combining the results from all aspects of importing records that we evaluated, we ranked the programs. We found Reference Manager to be better than Papyrus, which was much better than Endnote, which was better than ProCite, which was much better than Ref-11.

Improving performance

Even though importing records is of great importance to the user, the most competent program we tested imported and parsed correctly only 90% of the test records. All five programs allowed the user to create import templates that should eliminate the zero scores in Table 1 and perhaps improve some of the others. In fact, the maker of Ref-11 reported that keeping up with changes in record formats was not worthwhile and that user-created templates should be the preferred solution. However, most researchers beginning their computer bibliographies would probably prefer to avoid spending time learning to program within the program they have just purchased to make computerization easy.

We did not find creating import templates within bibliographic programs to be easy; therefore, we resorted to a different solution. We used search-and-replace macros in a word-processing program to make downloaded records conform to an existing import template. Papyrus, in addition to instructing users how to make templates, includes a cleanup utility that makes some categories of refractory records compatible with standard templates. When this utility was used on Biological Abstract records, Papyrus' score on Biological Abstracts went from 0 to 92, and its overall mean score changed from 79 to 92.

An important reason that all programs had imperfect import templates is that the formats of downloadable bibliographic records are unpredictable. Producers of literature indexes and CD-ROM versions thereof have not standardized their formats but change their formats from database to database and from time to time. Records of different forms of literature (e.g., journal article, book chapter, and symposium paper) are especially troublesome because there is no standard tag that indicates the literature category. Overall, journal articles were the most stable in format and were handled best by the programs we tested.

Limitations

We would be remiss if we did not point out three important limitations to compiling a research bibliographic database solely from downloaded records. First, machine-readable indexes generally go back only as far as the early 1970s, and the CD-ROM versions often omit the earlier years; in many fields of biology, much of the pertinent literature predates 1980 or 1970. Second, no single database, nor all major databases combined, completely indexes current primary biological literature. Third, items that are in databases may be difficult or nearly impossible to locate because they are inadequately indexed or because keyboard errors were made when the record was entered.

The experience of Deitz and Osegueda (1989) is sobering. They searched the eight most relevant computerized databases for references published during 1981-1987 on an easily searched subject: treehoppers. Their computer searches retrieved a total of 194 relevant references, with the greatest number from any one index being 112. They found an additional 172 relevant references by other means, chiefly by "snowballing" (checking all references cited in each publication added to the bibliography) and by requesting reprints from treehopper workers. More succinctly, they found no more than 31% of the 366 relevant references in any database and only 53% in the eight databases combined.

Another caution is that literature databases are copyrighted, and downloaded records may be used only as permitted by the copyright holders. Any use other than in the downloader's personal bibliography may invoke a fee or be denied.

Those interested in reviews of other bibliographic programs and of other features of the programs reviewed here should consult Neal (1993), who extensively reviewed the 5 programs we used plus 7 others, and Stigleman (1993), who gave information on 53 programs. Ours is the first critical evaluation of import capabilities.

Acknowledgments

This is Florida Agricultural Experiment Station Journal Series R-03405.

References cited

Deitz, L.L., and L.M. Osegueda. 1989. Effectiveness of bibliographic databases for retrieving entomological literature: a lesson based on the Membracoidea (Homoptera). Bull. Entomol. Soc. Am. 35:33-39.

Neal, P.R. 1993. Personal bibliographic software programs -- a comparative review. BioScience 43:44-51.

Stigleman, S. 1993. Bibliography formatting software: an update. Database 16:24-37.

Thomas J. Walker is a professor of entomology and P.M. Choate is a senior biological scientist in the Department of Entomology and Nematology, University of Florida, Gainesville, FL 32611-0620. Walker's research interests include ecology, acoustic behavior and systematics of crickets and katydids, and migratory behavior of butterflies. He also teaches courses on information techniques in research. Choate's research centers on ecology and systematics of tiger beetles. He teaches courses in computer use. © 1994 American Institute of Biological Sciences.

By permission: Copyright 1994 by the American Institute of Biological Sciences