Library Philosophy and Practice Vol. 7, No. 1 (Fall 2004)ISSN 1522-0222Studying the Reader/Researcher Without the Artifact: Digital Problems in the Future History of BooksDorothy WarnerAssociate Professor, LibrarianJohn BuschmanProfessor, LibrarianRider University Libraries Rider University Lawrenceville, New Jersey 08648 | ||||
(This is an edited and updated version of a paper given at the Society for the History of Authorship, Reading, and Publishing {SHARP} 2002 Conference, University of London, July 11th) IntroductionIt is salient to begin this article with some examples of fertile and groundbreaking study emanating from the history of the book, reading, and publishing:
What do these examples have in common? They represent important and interesting work that could be accomplished because the documents and the publications exist, and they exist primarily because they were printed and reprinted, simply kept somewhere, preserved and archived. The study of reading, books, book production, editing, and the research process posits a very simple assumption: that which has been read, edited, absorbed, used and studied will still exist as an artifact. As Ronald Schuchard wrote, “what interests the scholar ... in the archive [is] the preservation and accessibility of the materials of the creative imagination, the physical materials, including all the detritus, debris, and ephemera of art, biography, history. And the archival preservation of these materials is crucial for the minor as for the major figures of a literary generation” [6] - the very authors, as Michael Winship [7] points out, that most people read the most, after all. However, the trend toward digitization, promoted by those who want information available instantly and in a “more accessible” format, poses a very fundamental challenge to the essential assumption that those items will exist in future. The dramatic move to exclusive web-distribution of federal and state government information and data in the United States is a good case study of this problem. Essentially, this project has been undertaken without planning or budgeting for archived, permanent and secure (hat is, unaltered) access. A front page story in the New York Times detailed the digitization project in the US Patent Office of 18th and 19th century patents - and the discarding of the original documents. One person did some dumpster diving outside the Office and came up with four original application copies of some of Thomas Edison’s patents. [8] Much of the newly-digitized data is the raw material for scholars in such far-flung subjects as law, the environment, education, demography, and of course economics and business. Data and documents are not in danger only from governmental sources, but in private databases as well. Significant numbers of novels, scientific journals, and publishing records - economic and editorial - to give only a few examples, are now extant largely or exclusively in digital form. ProblemsOur profession’s policies note specifically the “threat to information posed by technical obsolescence, the long-term retention of information resident in commercial databases, and the security of library and commercial databases.” [9] However, in the haste to make information available electronically there are few agreed-upon plans for the preservation of digital information and much has already been lost. For example:
Problems Beyond Government InformationNor are these problems limited to government information. The preservation of electronic journals is also a concern for libraries. Wiggins notes the irony of the demise of the Committee on Institutional Cooperation’s CICNet Journal Archive due to lack of funding. For six years, from 1991-1997, the group attempted to archive electronic journals. The archive has vanished. “Ironic, indeed, to lose not a mere collection but an archive whose purpose was to prevent loss of electronic content. How many pioneering e-journals, many of them hosted on now defunct Gopher servers, were lost for eternity?” [20] In a related issue, an attempt to obtain an article beginning on page 415 of a scientific journal revealed that the online version, available via Science Direct, only shows articles in that volume up to page 389. The response to a query to Science Direct was that at least 2% of its electronic journal content is missing. [21] Winship observed over a decade ago a need to identify, locate, and interpret the primary sources for publishing history.
Given the subsequent media monopolies which control global publishing that Schiffrin [23] and Miller [24] have identified, the preservation of current electronic publishing files, e-mails, and electronic editing, and in some cases digital publishing seems very much in doubt for future scholars of our current literature. It is clear that we are rushing ahead before we are ready. A Senior Vice President at Elsevier who is an original member of the Task Force on Archiving of Digital Information convened by the Research Libraries Group and the Commission on Preservation and Access in 1994 states that “there is no magic bullet in electronic archiving. Those of us who are spending large chunks of our professional time on the topic know that it will require a lot of trust and good-faith effort to continue to move things forward. It is too important and too expensive to be left to chance.” [25] Another expert is troubled by the suggestion that a magic bullet solution (“a simple, universally applicable, one-time fix”) has even been proposed. [26] Moreover, there is no overall plan for archiving federal government documents that exist only in digital format. Instead each agency determines its own preservation policy. A representative from the Bureau of Labor Statistics (BLS) recently promised a conference audience that all digital information at the BLS would be preserved forever, but will Congress adequately fund BLS to be able to follow through on this guarantee? The Government Printing Office (GPO) has had significant budget cuts at the same time that Congress has given GPO the mandate to cut printing costs by making information available digitally. This, of course, does offer wider access to the information today, but what about tomorrow? The rush to make information available quickly and widely, often for “future planning” purposes, has overshadowed the need to ensure that the very same information will continue to be available for planners, literary scholars, and historians of the future. The cart is again before the horse in several areas which we will now discuss in brief: standards, costs, digital preservation strategies, reading mechanisms, and the context of digitally preserved information. StandardsThere is a vigorous debate over technological and software standards since “no computer technical standards have yet shown any likelihood of lasting forever.”[27] This is an important area since standards “can assist by facilitating the transfer of information between hardware and software platforms as technologies evolve” and “resources which are encoded using open standards have a greater chance of remaining accessible after an extended period than resources encoded with proprietary standards.” Descriptive metadata has no agreed-upon standard. Metadata is defined as: “data about data or information known about the image in order to provide access to the image. This usually includes information about the intellectual content of the image, digital representation data, and security or rights management information.”[28] Typical metadata standards are US MARC and the emerging scheme, Dublin Core. Research is being conducted to attempt to develop a uniform standard which must exist for any of the electronic preservation models to succeed. [29] CostsCost considerations are substantial.
The Yale University Libraries Project Open Book, studied the costs of converting into digital image the printed text and accompanying materials in 10,000 brittle books.
Digital Preservation StrategiesIn international discussions regarding archiving issues there is a presumption that for online journals, migration will be the digital preservation methodology of choice. Migration is defined as the “periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation.” [32] For example, the information on a floppy disk may be transferred to a CD-ROM format, offering only a temporary preservation since the CD-ROM format must then be migrated when the technology changes again. However, a great number of questions still need to be answered and “until those questions are resolved, libraries will be understandably reluctant to make a permanent switch from paper to electronic collections. What should be archived and in what format? How many copies of the archive are needed? Who holds those copies? What is the access to the archive and who controls that access? How does licensing affect archive building? What can the scholarly community afford?” [33] The digital information must be refreshed without changing it and in a new operating environment the copy is not exactly the same as the original, requiring decisions about the aspects that need to be preserved. Metadata can assist here in providing information about migrations and the effect on the digital object. In some cases, software that is “backwards compatible” can simplify the migration process (the most recent version of the software having the capability of decoding the files created in the earlier version). However, there is no guarantee as to the compatibility over time as technological developments become increasingly complex and/or it is no longer financially worthwhile for a software manufacturer to support such compatibilities. Some question the practicality of migration while some point out that each new format will require a unique solution. The most extreme (and ironic) version of this is the preservation on paper or preservation quality microfilm. It is worth noting the obvious again: archival quality paper or microfilm record can last up to 500 years. [34] However, the disadvantage of preserving a digital record on print or microfilm is that the record may not be able to adequately represent the original object since the digital functionality of the resource can be destroyed, like the computation capabilities, graphic display or indexing , equations embedded in a spreadsheet, and the impossibility of printing out an interactive full motion video or preserving a multimedia document as a “flat file”. Concerns over data loss and the loss of functionality or the “look and feel” of the original platform are still of a concern regarding the migration method. Reading MechanismsClifford Stoll has described one of the other primary problems previously alluded to: “electronic media aren't archival [and] the physical medium isn't the problem. It's the reading mechanism.” He goes on to give many examples of the now-extinct formats and the machines that read them: 78-rpm records, 8-track tapes, 100-column punch cards, and 5-inch glass lantern slides. Further, there is an equally impressive list of soon-to-disappear formats and readers like Betamax tapes, and single-side, single density diskettes. As Stoll notes, the information contained in these formats may be perfectly good and workable, “but they become increasingly expensive to read, as equipment becomes expensive to maintain or simply cannot be repaired.” [35] Libraries and archives all over are slipping and sliding toward exactly this problem: the replication of the information into a more current format is very expensive and this promises to further strain library budgets - exactly what the National Archives faced in converting UNIVAC-stored Census information. Because of the concern of potential technological obsolescence, there is a substantial amount of printing taking place of electronic government documents as lengthy as 500 pages (both state and federal) both by libraries and by end-users. Under such a regime, furthercosts are transferred to libraries and archives. Context of Digitally Preserved InformationKenneth Thibodeau of the National Archives expresses concern on behalf of future researchers about current digital preservation methods. The Archive’s responsibility is to “preserve and deliver authentic records to subsequent generations of users.” A connection needs to exist between an historical record and the activities in which they are made and received. If this link is broken, corrupted, or even obscured, the information in the record may be preserved, but the record itself is lost. This fundamental difference between records and documents can be readily illustrated empirically. For example, a map of Sarajevo is a document, but a map of Sarajevo known to have been used in making a targeting decision that led to the bombing of the Chinese Embassy is an essential record of that action. The key difference between the document and the record is the specification of the context of action in which the record was involved. To preserve authentic records entails preserving the documents themselves and also their connections to the activities in which they were used. [36] ConclusionTo conclude, our profession expresses bedrock principles that have become fundamental to our concept of reading and research:
As Wiegand notes, our admittedly biased and flawed classification schemes devised over centuries still “constitute one of the few bridges available to all who use them to help link the separate islands of discourse.... What we do constitutes [an inherent] challenge to that power when we facilitate access by organizing information.... Capitalism doesn’t necessarily appreciate this; democracy does.” [38] It is not enough to collect and save this output, we must make it available to people, to researchers, and to the future. That legacy is in some danger. A chilling report from a division of the American Library Association in 1977 stated that
Perhaps most famously, Nicholson Baker has blown the whistle on wholesale dumping of collections in the building of the new San Francisco Public Library, the disregard for the valuable and irreplaceable information (like usage, provenance if the item was a gift, and notations) contained in the discarded Harvard University Library (and other research library) catalog cards, and of course the dumping of the last copies of original 19th and early 20th century American newspapers. Baker has charged - credibly - that US. libraries have “abandoned their duty” to preservation.[40] Our profession’s uncritical, unthinking enthusiasm for technologies has led us to overlook significant problems with electronic resources in regards to the issue of preservation. The problem was stated by O’Mahony, whose specific concern was about electronic government information, but that concern certainly relates to other forms of digital information:
There are fundamental issues at stake for libraries and digitized archives. A true archive “shouldn't depend on duplication for preservation.” [42] While expressing gratitude to libraries for digital and microfilming preservation efforts, the Modern Language Association states that “the advantages of the new forms . . . cannot fully substitute for the actual physical objects in which those earlier texts were embodied at particular times in the past . . . . All objects purporting to present the same text . . . all carry different information, even if the words and punctuation are identical....”[43] Eugene Provenzo writes that “anyone who has used a word-processing system . . . knows how easy it is to transform information in a digital context. One word can be automatically substituted for another, a name changed, a date altered, an idea corrupted without any record of what the original source said. [This] represents a major problem in terms of the integrity of historical documents, and the extent to which we can trust the information from such sources in the future.” [44] One of the great ironies of the information age is that, while the late twentieth century will undoubtedly record more data than have been recorded at any other time in history, it will also almost certainly lose more information than has been lost in any previous era. A study done in 1996 by the Archives concluded that at current staff levels it would take approximately a hundred and twenty years to transfer the backlog of nontextual material (photographs, videos, film, audiotape, and microfilm) onto a more stable format.... There also appears to be a direct relationship between the newness of a technology and its fragility.... A librarian at Yale University has created a graph going back to ancient Mesopotamia which shows that, while the quantity of information being saved has increased exponentially, the durability of media has decreased almost as rapidly. [45] Consider once more the example of researching the American gay experience noted at the beginning of this paper. Personal communication and footnotes pointed toward both private and library archival collections, but if they existed originally in electronic form, where would they be today? Would an individual, organization, library or archive have taken the time to archive them, given the costs of constantly upgrading the archive to the newest digital format? And, even if this had been attempted, how would a researcher discover them? As researchers today persist in leafing through often disorganized boxes of print collections in an archive searching for clues, where would a researcher locate something perhaps considered to be ephemera at the time of its inception, yet an invaluable clue for a later historian? A colleague notes that mid-20th century hymn collections are less likely to be found in library collections than 17th century volumes. “In a century known for the ‘information explosion,’ when new technologies revolutionized printing, perhaps ephemera can only be valued in hindsight.” [46] Likely, no indexing to ephemera would exist, and most likely this particular documentation of gay or sacred music history would be invisible to the researcher if it did exist in electronic form. It may even have been deleted from electronic existence many years before. If the researcher is willing to take the time to locate information stored in digital form and access it in the particular electronic state that it is in, at what cost of time is the researcher missing the “opportunities for study and careful concentration” of the information discovered? One scholar suggests that “time devoted to finding comes at the expense of time for reading.” [47] We are nearing a time when we will bequeath a scholarly record that will be akin to the study art history only through the descriptions of the critical literature, but without the original artifact. Neal Postman has argued that we have “embarked on a great uncontrolled experiment which involves submitting all of our institutions to the sovereignty of these new media [and they are] winning the competition with typography for the time, attention, and cognitive predispositions” of people. [48] This process of redefinition - driven in large part by electronic resources - is not without serious problems for research, archives, libraries and our concept of research and reading. In the immediate sense, we are gravely concerned that the excitement of mere technical possibility and convenience is undermining the existence of important documentation in the future. Endnotes1. Darnton, R. (1984) The Great Cat Massacre and Other Episodes in French Cultural History (New York: Basic Books). 2. Whitman, W. (1982) Complete Poetry and Collected Prose (New York: Library of America), pp. 1352-54. 3. Wiegand W. (4-5 December, 1998) “Print Culture History in Modern America: Needs” paper delivered at the Book Studies Curriculum Development Seminars, University of Iowa. 4. Katz, J. (1985) Gay American History (New York: Harper Colophon, reprint of 1976 ed.). 5. Duberman, M. “Reclaiming the Gay Past,” Reviews in American History, V16 #4 , Dec. 1988, pp. 515-525; and Duberman, M. (1994) Stonewall (New York: Plume). 6. Schuchard, R (Winter 2002) “Excavating the Imagination: Archival Research and the Digital Revolution,” Libraries & Culture V. 37 #1, p. 59. 7. Winship, M. (1987) “Publishing in America: Needs and Opportunities for Research” in Hall, D. and Hench, J. (eds) Needs and Opportunities in the History of the Book: America, 1639-1876 (Worcester, Mass: American Antiquarian Society) 61-102. 8. Mitchell, A. (Dec. 30, 2001) “Ingenuity’s Blueprints, Into History’s Dustbin,” New York Times pp. A1, A22. 9. ALA Handbook of Organization 1996-1997 (1996) (Chicago: American Library Association) p. 42. 10. Lyons, S., ed. (2001). Staying Digital: Recommendations on Preserving New Jersey Government Information in the Digital Age. Report of the State Documents Interest Group of the Documents Association of New Jersey. Available: www.danj.org/DANJ, p.1. 11. Lyons, S., ed. (2001). Staying Digital: Recommendations on Preserving New Jersey Government Information in the Digital Age. Report of the State Documents Interest Group of the Documents Association of New Jersey. Available: www.danj.org/DANJ. 12. Lyons, S., ed. (2001). Staying Digital: Recommendations on Preserving New Jersey Government Information in the Digital Age. Report of the State Documents Interest Group of the Documents Association of New Jersey. Available: www.danj.org/DANJ. 13. Lyons, S., ed. (2001). Staying Digital: Recommendations on Preserving New Jersey Government Information in the Digital Age. Report of the State Documents Interest Group of the Documents Association of New Jersey. Available: www.danj.org/DANJ. 14. Wiggins, R. (2001, Spring). Digital preservation paradox & promise [Online]. Library Journal 126 (7), p. 12, 4 pp. Available: Academic Search Premier. EBSCOhost. Rider University Libraries. 18 October 2001 <search.epnet.com>. 15. Strategic Plan to Improve the Preservation, Collection, and Use of New Jersey Historical Records (October 2001) (Trenton: New Jersey State Historical Records Advisory Board/New Jersey Dept. of State). 16. Radcliffe, J. (December 8, 2001) “Orders to Purge Records Have Librarians Worried,” Fort Worth Star-Telegram [online edition]. 17. Stille, A. (March 8, 1999) “Overload” New Yorker, p. 42. 18. Stille, A. (March 8, 1999) “Overload” New Yorker, p. 42. 19. Stille, A. (March 8, 1999) “Overload” New Yorker, p. 42. 20. Wiggins, R. (2001, Spring). Digital preservation paradox & promise [Online]. Library Journal 126 (7), p. 12, 4 pp. Available: Academic Search Premier. EBSCOhost. Rider University Libraries. 18 October 2001 <search.epnet.com>. 21. E-mail communcation on colldv-l@usc.edu, October 5, 2001. 22. Winship M (1987) “Publishing in America: Needs and Opportunities for Research” in Hall, D. and Hench, J. (eds) Needs and Opportunities in the History of the Book: America, 1639-1876 (Worcester, Mass: American Antiquarian Society) 61-102. p. 93-94. 23. Schiffrin, A (2000) The Business of Books (New York: Verso). 24. Miller, M C (January 7/14, 2002) “What’s Wrong With This Picture?” Nation, pp 18-21; and Miller, M C (Summer 2001) “Reading in the Age of Global Media” Progressive Librarian. 25. Hunter, K. (2000). Digital archiving [Online]. Serials Review 26 (3), 3 pp. Available: Academic Search Premier. EBSCOhost. Rider University Libraries. 13 June 2001 <search.epnet.com>. 26. Bearman, D. (1999). Reality and chimeras in the preservation of electronic records [Online]. D-Lib Magazine 5 (4). Available: www.dlib.org/dlib/april99/bearman/04bearman.html [2001]. 27. Bearman, D. (1999). Reality and chimeras in the preservation of electronic records [Online]. D-Lib Magazine 5 (4). Available: www.dlib.org/dlib/april99/bearman/04bearman.html [2001]. 28. PADI (Preserving Access to Digital Information) (2001). Standards [Online]. Available: www.nla.gov.au/padi/topics/43.html [2001]. 29. OCLC/RLG Working Group on Preservation Metadata (date n.a.). Preservation Metadata for Digital Objects: A Review of the State of the Art. [Online]. Available: www.oclc.org/digitalpreservation/presmeta_wp.pdf [2001]; and Bearman. 30. Feeney, M. (1999). Towards a national strategy for archiving digital materials. Alexandria 11 (2), 107-122. 31. Butler, M. (1997). Issues and challenges of archiving and storing digital information: Preserving the past for future scholars. Journal of Library Administration 24 (4), 61-79. 32. PADI (Preserving Access to Digital Information) (2001). Migration [Online]. Available: http://www.nla.gov.au/padi/topics/21.html [2001]. 33. Hunter, K. (2000). Digital archiving [Online]. Serials Review 26 (3), 3 pp. Available: Academic Search Premier. EBSCOhost. Rider University Libraries. 13 June 2001 <search.epnet.com>. 34.Lyons, S., ed. (2001). Staying Digital: Recommendations on Preserving New Jersey Government Information in the Digital Age. Report of the State Documents Interest Group of the Documents Association of New Jersey. Available: www.danj.org/DANJ. 35. Stoll, C. (1995) Silicon Snake Oil: Second Thoughts on the Information Highway (New York: Doubleday), pp. 180-181. See also Stearns, D. P. (10 March 2004). Along for the ride, or random road kill? Philadelphia Inquirer, F1, 9; and Tenner, E. (1 April 2002). Taking bytes from oblivion: can we turn fragile digital information into an enduring record? U.S. News & World Report. 66-67. 36. Thibodeau, K. (2001, February). Building the archives of the future: Advances in preserving electronic records at the National Archives and Records Administration [Online]. D-Lib Magazine 7 (2). Available: www.dlib.org/dlib/february01/thibodeau/02thibodeau.html [2001]. 37. Library Bill of Rights and Freedom to Read reprinted in Gates J (1976) Introduction to Librarianship 2nd ed. (New York: McGraw-Hill). 38. Wiegand, W. (1999) “The Structure of Librarianship: Essay on an Information Profession”, Canadian Journal of Information and Library Science (24:1) 17-37. 39. Harris, M. and Hannah, S. (1993) Into the Future: The Foundations of Library and Information Services in the Post-Industrial Era (Norwood, NJ: Ablex). 40. Baker, N. (2001) Double Fold: Libraries and the Assault on Paper (New York: Random House); “The Collector” (April 15, 2001, interview) New York Times Book Review, p. 9. 41. O’Mahony, D. P. (1998). Here today, gone tomorrow: What can be done to assure permanent public access to electronic information? Advances in Librarianship 22, 107-21. 42. Stoll, C. (1995) Silicon Snake Oil: Second Thoughts on the Information Highway (New York: Doubleday), p. 181. 43. Modern Language Association (1995) “Statement on the Significance of Primary Records”, Profession 95-96. 44. Provenzo, E. (1992) “The Electronic Panopticon: Censorship, Control and Indoctrination in a Post-Typographic Culture” in Tuman, M. (ed) Literacy Online (Pittsburgh: University of Pittsburgh Press) 171-180. 45. Stille, A. (March 8, 1999) “Overload” New Yorker, p. 42. 46. Wicklund, N. (forthcoming) “The Erik Routley Collection of Books and Hymnals at Talbott Library, Westminster Choir College of Rider University” The Hymn. 47. Marchionini, G (1995) Information Seeking in Electronic Environments (New York: Cambridge University Press), p. 172. 48. Postman, N. (1988) “The Contradictions of Freedom of Information” in Berman, S. and Danky, J. (eds) Alternative Library Literature, 1986/1987 (Jefferson, N.C.: McFarland) 37-49. |