Library Philosophy and Practice 2011
Status of Institutional Repositories in Asian Countries: A Quantitative Study
Dr. Bhaskar Mukherjee
With the development of information communication technologies, a number of alternative strategies to the traditional scholarly publishing system have been evolved. Among these, Open Access (OA) model which promise to be extremely advantageous to peers everywhere, especially to those who have acute shortage of resources for purchasing scholarly literature. The impetus of OA was boosted by the Open Society Institute (OSI) in a small meeting convened in Budapest on December 1-2, 2001. The purpose of the meeting was to accelerate progress in the international effort to make research literature in all academic fields freely available on the Internet (OAIS, 2002; Hirtle, 2001). The first major international statement on OA, which includes a definition, background information and a list of signatories, is the Budapest Open Access Initiative. The other two leading statements are the Bethesda Statement on Open Access Publishing and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. The conception of open access in these three statements, which is often called the BBB (Budapest, Bethesda and Berlin) definition, launched, inspired, and continues to guide the open access movement.
Although institutional-based, or more typically departmental, 'archives' were known before this, especially in areas such as computer science and economics that were served by NCSTRL and RePEc, respectively, OAI introduced the Protocol for Metadata Harvesting (OAI-PMH) to provide common services that could operate over more general, independent sites (Lynch 2001). Institutional Repository (IR) adopt the same open access and interoperable framework as e-print archive, but rather than being discipline-based, represent the wide range of research output of a given university or research organization. The term was coined by Scholarly Publishing for Academic Resources Coalition (SPARC), and has been defined by SPARC as “digital collections capturing and preserving the intellectual output of a single or multi-university community” (Crow, 2001). Crow argues that institutional digital repositories will lead to significant increases in the prestige of the institutions that build them (Crow, 2002). Stephen Harnad also cites institutional prestige: “Distributed, institution-based self-archiving benefits research institutions in three ways. First, it maximizes the visibility and impact of its own refereed research output. Second, by symmetry, it maximizes their researchers’ access to the full refereed research output of all other institutions. Third, institutions themselves can hasten the transition to self-archiving and so more quickly reduce their library’s annual serials expenditures to 10% (paid to journal publishers for refereeing their submissions)”(Harnad, 2002). Pinfield, Gardner, and MacColl also argue that an e-print archive can raise the profile of an institution (Pinfield, Gardner, & MacColl, 2001).
Defining Institutional Repositories
An IR may be defined as an on-line locus for collecting and preserving – in digital form- the intellectual output of an institution, particularly a research institution (wikipedia). According to Lynch (2003) an institutional repository is a “set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution."
For a university this include materials such as research journal articles before (preprint) and after (post prints) undergoing peer review, and digital versions of theses and dissertations, but it might also include other digital assets generated by normal academic life, such as administrative documents, course notes or learning object. An IR is a collection of digital research documents such as articles, book chapters, conference papers and data sets. E-prints are the digital texts of peer-reviewed research articles, before and after refereeing. Before refereeing and publication, the draft is called a "preprint". The refereed, accepted, final draft is called a "postprint". The term e-prints include both preprints and postprints.
With the increasing use of ICTs and availability of open sources software packages most of the institutions are maintaining such repository or archive to collect, preserve, and make accessible the entire intellectual product created by the scholarly communities of that institutions. Main objectives for having an IR are:
IRs are now become an important new player in the field of academic information management and publishing. The development and growth of IRs arose in response to the major changes in scholarly communication. The new form of scholarship - that is born digital - constitutes an important source for present and future research and teaching. With the emergence of the World Wide Web as an effective vehicle for publishing and distributing, the born-digital form of scholarly objects becomes more popular. Additionally, the rapid rise in the cost of commercial scholarly journals was another major impetus in developing new models in scholarly publishing. IRs benefit scholars by providing free access to all scholarly works which are published or likely to be published in near future. It reduces the gap of ‘backlog’ by bringing timely access, and increases visibility through freely accessible Web.
Growth in the number of IRs has accelerated since 2002. Despite some lag in time, there has been corresponding growth in terms of number of digital content in IRs, as revealed by the Registry of Open Access Repositories (ROAR, http://roar.eprints.org/). Among repository directories, "on December 31, 2006, OAIster (launched in 2002) listed 726 OA, OAI-compliant repositories worldwide; with an increase of 25% than previous year. In 2006 OAIster listed a total of 6,255,599 records from the repositories it covered (Suber 2007). ROAR is one of the authentic sources that identify repositories worldwide. With the increasing popularity of open access materials from world-wide, number of IRs are increasing continuously. At the end of November 2010, there were more than 1800 IRs world-wide as listed in the OpenDOAR (Directory of Open Access Repositories). Out of the total IRs, more than 50% are in USA, UK, Germany and Spain. (OpenDoar, www.opendoar.org)There has also been extensive investigation of the role of various types of repositories in the scholarly communications process, particularly in the context of e-prints and author self-archiving, and even, more recently, with respect to institutional archives policies about author self-archiving; however, these studies really don't illuminate the full range of developments surrounding IRs planning and deployment (Lynch & Lippincott, 2005). To the best of our knowledge, there has been relatively little systematic examination of the actual state of deployment of IRs in Asia. It is, therefore, important that a study be undertaken with the sole purpose of identifying the present status of IRs in the countries of Asia.
Objectives of the Study
The major objectives of the present paper are:
Since the study was planned to analyse the growth and present status of IRs, survey method was found suitable. Our investigation began with the one of the most authoritative online directories: Registry of Open Access Repositories (ROAR). Additionally, we also looked Directory of Open Access Repositories (OpenDOAR, www.opendoar.org) and OAIster to identify other IRs which are not covered by ROAR. The access policy for all the directories was checked to know whether all the materials of the aforesaid directories were available free, or partially free. The factual data in terms of number, country of origin, document types, subjects, software used, language, host domain and policy of individual IRs were noted for further analyses.
Growth of IRs
First, we were tried to find out the growth in number of IRs over the years. This data is based on date of registration in the aforesaid directory. Since the directory was created in 2006 the previous data of growth was not available.
Table 1: Growth of IRs: 2006-15th December, 2010
So table 1 shows the growth of IRs during last five years. As on 2006 there were 40 IRs existed which rose to 96 in 2007, 142 in 2008 and 183 in 2009 with an addition of 56, 54 and 41 IRs between 2006-2007, 2007-2008 and 2008-2009 respectively. At the time of writing this paper, this number touched 296 with an addition of 113 IRs between 2009- 15th December 2010. So, more phenomenal growth is expected up to the end of the year 2010.
Country-wise distribution of IRs and records
In the next step attempts were made to identify number of IRs and objects hold by repositories in each country. As mentioned in table 2, highest number of IRs are now in the Japan (129) followed by Taiwan (50) and India (42). A total of 296 repositories were identified in Asia region which are distributed among 25 countries. Here we have listed and presented the status of only top 10 countries having more than 3 or more IRs individually. Among the remaining 15 countries, Bangladesh, Iran, Israel, Kirgizstan, Pakistan, Philippines and Singapore have two IRs each; whereas Afghanistan, Azerbaijan, Georgia, Kazakhstan, Nepal, Qatar, Sri Lanka and Vietnam have one IR each.
Table 2: Status of IRs in Asian countries-December 2010
As we look in the details of the size of IRs, we also witness strong differences per country and within countries in terms of number of objects from a few to hundreds of thousands of records per repository. Overall, there are as much as 1414960 records available in all these IRs with an average of 4780 records per IR. The number of IRs per country differ widely both in numbers of IRs and in average number of documents. As table 2 shows, Japan is the leader in terms of total number of IRs but dropped to 2nd position in terms of total number of records and 7th position in average number of records. Similarly India secured 3rd position in terms of number IRs but its position in number of records per IR is 9th. On the other hand, it is important to note that Republic of Korea and Saudi Arabia get 9th and 10th position in terms of number of IRs but the average number of records per IR is the highest in all ten countries listed in the table as 2nd and 1st position respectively. Rest of the countries in the continent has 22 IRs with 3.14 % of the total number of records and 8th position in terms of average number of records.
Types of objects
Table 3 shows the types of objects currently stored in IRs. It may be observed from the Table 3 that although various categories of objects are archived in IRs, the main focus of the holdings is on journal articles (37%) followed by conference and workshop papers (19%), unpublished reports and working papers (13%) and books or chapters/section of books (11%). These unpublished records includes electronic theses and dissertations, digitized special collections materials, course materials etc.
Table 3: Type of objects
*Number of Institutional Repositories exceeds with the actual number (296) due to most archives hold several types of objects
The country-wise coverage of IRs related to type of objects (number of repositories devoted to each type of objects) is shown in table 4. What comes across clearly from the table is that, in the countries listed, the main focus of the holdings of current IRs is on journal articles. However, within this type of material we witness strong differences per country e.g. in Malaysia most of the IRs (10 IRs) are currently devoted to hold conference and workshop papers, while in Japan it observed that most of the IRs (117) are for journal articles. It is also worth noting that in China almost all the repositories are for ‘patent’ category of records, whereas, other countries (except Taiwan and Saudi Arabia) have no IR for this category of records.
Table 4: Country-wise coverage of IRs related to type of objects (Number of IRs devoted to each type of objects)
Subject Coverage of IRs
In the next step we tried to identify the disciplinary coverage of the IRs. The subject coverage of the IRs is quite interesting. We identified 26 broad subject categories. The same is shown in Table 5. Most large institutions effectively hold all subjects in their repositories. They are, therefore, categorised as ‘multidisciplinary’. On the other hand, specialist institutions and disciplinary repositories only cover a few subjects, and these have been indexed individually. As indicated in table 5, the most prominent unique subject under which most of the records archived was ‘health and medicine’ (5.47% of the total), followed by ‘technology’ (4.22% of the total). Although, the number of IRs under heading ‘multidisciplinary’ is quite high, the result does not represent any conclusion. Because the subject ‘multidisciplinary’ is the combination of number of subjects, and to calculate the proportion of unique subject to the total was a complex task. It is interesting to note that the number of IRs in the field of arts, social science like ‘history and archaeology’ (8 IRs), ‘social science general’ (4 IRs), ‘law and politics’ (5 IRs) etc. were quite low. Whereas, number of IRs in science, medicine and technology disciplines are quite high than social science disciplines. This is the clear indication that movement of open access to scholarly literature is started with the scientific disciplines and slowly the scholars of others fields are taking interest.
Table 5: Disciplinary coverage of IRs
Note: Number of IRs put in more than one subject category, as a result total number exceeds to the real number (296) as mentioned in table 5.
As mentioned in table 6, DSpace is widely used (200 IRs) software package. This was followed by E-prints (34 IRs), XooNTps (10 IRs) and HiTOS (6 IRs). A large number of IRs (41) did not mention the name of software they used for their archives. Besides the list of software packages shown in table 6, there are a few institutions with locally developed systems or content management systems that are used to set up an IR.
Table 6: Software packages used for IRs
In examining the software packages used to support IRs, we found considerable variation in the level of software diversity from one nation to the next; looking across nations, only a few packages saw use in many different countries, most notably the DSpace software, which is used in all the countries listed, and EPrints which is used in 6 of the 10 countries listed for the study. Besides the list of software packages shown in table 7 there are many institutions with locally developed systems or content management systems that are used to set up an IR. It is worth noting that in Turkey the HiTOS software has a large base, with 6 sites out of the total 10 IRs and in Japan the XooNTps package was used in 10 institutions.
Table 7: Software packages used for IRs in different countries
IRs by Language
The study then identified the language of the interface of the IRs. It is observed from the table 8 that the interface of IRs has been built in various languages to support the users of their respective language. However, English becomes the most prominent language of interface among all. Of the total 296 IRs, 229 IRs were in English language, suggest such proposition. English is followed by Japanese, Chinese and Turkish with 128, 64 and 10 IRs respectively.
Table 8: IRs by language
Note: Total number exceeds with the actual number (296) due to interface of some IRs in more than one language.
IRs according to Host Domain
This study also distinguished IRs on the basis of their nature of host organization. All IRs grouped into the four categories: Aggregating i.e. an archive aggregating data from several subsidiary repositories; Disciplinary i.e. across institutional subject repository; Governmental i.e. a repository for governmental data; and University-based Institutional i.e. an institutional or departmental archive. It was observed that maximum number of IRs (276) were university based institutional, followed by disciplinary (12) and aggregating (6).
Table: 9 Types of IRs
There is no significant difference in the types of IRs in different countries as shown in table 10. Most of organizations of IRs are institutional in all nations of Asia.
Table 10: Country wise distribution of IRs by types of Host Domain
An IR is driven and directed by its policies which determine its identity, quality and direction. It is not sufficient to create a repository merely by putting software on a machine. An archive's organisational model is the sum of its policies and an archive without policies is like as library without a librarian (Robbio & Coll, 2005). The principal policy concerns of IR, which are important to know, are its:
We tried to find out above mentioned policies for every IR in terms of content policy, submission policy and preservation policy. The following parameters were identified to know the status of IRs policies:
It may be observed from table 11A that 91.89% of IRs does not have a well defined policy for the types of records to be deposited in these IRs. Only 2.72% of IRs has defined policy regarding types of material to be submitted, whereas around 3.04% found unstated. Similar condition was found in all the countries.
Table 11A: Policy for type of objects
Similarly, it is important for an IR to make it clear that who will authorize to submit material to an IR and what are the term and conditions of submission of an item. Again it can be seen from the table 11B that around 90% of IRs does not have a defined policy for the submission of documents. Only 7% of IRs has defined policy for the same. It is unstated for 3% and unknowns for 1%.
Table 11B: Records submission policy
As table 11C shows, only 2.36% of IRs have a defined policy for the preservation of documents, whereas 83% of IRs do not make a clear policy for the preservation of documents. 14.19% of IRs does not give any information regarding preservation policy and the status is unstated.
Table 11C: Preservation policy
It may be observed from the above results that there are not more than 10% IRs that have made clear policies for type of content, submission and preservation. But we can say it is good start and in near future this gap would be reduced.
Comparing the size of IRs between institutions of various countries in Asia is clearly a very complex problem, probably intractable in the short term, it would be relatively easy to collect estimated rate of repositories growth, and this would be helpful in understanding the landscape. From the growth of IRs since last five year, one may visualize the professionals’ growing eagerness towards making their scholarly research openly accessible. Only in few years span, the volume of literature has already increased manifold and this explosion still continues. So it is a great challenge to an e-publisher to archive these huge electronic data for future. At the same time, based on the number of institutional repositories established over the past few years, the IR service appears to be quite attractive and compelling to institutions. IRs provide an institution with a mechanism to showcase its scholarly output, centralize and introduce efficiencies to the stewardship of digital documents of value, and respond proactively to the escalating crisis in scholarly communication (Gibbons, 2004).
The phrase "if you build it, they will come" does not yet apply to developing countries in context of establishing IRs. While their benefits seem to be very persuasive to developing countries, most of IRs are still in developed countries. An overwhelming number of items from developed countries may need to put critical insight into the ways in which various nations are thinking about the role of institutional repositories. In fact, the problem, ‘resource-crunch’ is more acute in developing countries than developed countries. However, the efforts from developed countries are appreciable than other countries.
When we analysed these IRs according to types of materials it includes, the result of our findings suggest that currently the institutional repositories mostly house traditional (print-oriented) scholarly publications and grey literature: journal and conference articles, books, theses and dissertations, and research reports. From this we can at least speculate that open access issues in scholarly publishing may well be the key drivers of institutional repositories deployment, at least in the very short term, rather than the new demands of scholarly communications related to e-science and e-research. On analyzing the distribution of subjects in these repositories, it may be concluded that the institutes in the fields like health and medicine, chemistry and chemical technology, biology and biochemistry etc. are more interested to disseminate their findings to the wider audience. Due to that, a large number of materials were opening accessible to the IRs. It is quite evident that the field ‘science’ is changing very fast than other discipline and obsolesce of concepts are more prevalent in science discipline. Additionally, the traditional journal system is heavily affected by the problem of back-log. So, submitting materials to IRs before actual publication helps author to disseminate their findings at faster rate. The relatively low-quantity of IRs and documents in the field of social sciences & humanities may be an indication that awareness about submission of scholarly text in open access archives amongst humanities/social science academics is not enough, or, they do not find it worthwhile to submit their scholarly text in IRs. However, they perceive many advantages to depositing their work in institutional repositories, especially for the reader, not for themselves.
The use of DSpace as one of the leading software in these IRs may be due to the fact that DSpace code already supports self-publishing and self-archiving features. One can rely heavily on DSpace for preservation, metadata, persistent URLs etc (DSpace, 2007). Similarly, most of these materials were of English language, is a clear indication that English is the major language in scholarly communication.
When IRs were analysed according to host domain, it was found that university-based institutions are the leading type of domains. The finding of the present study may supports the vision of Stephen Harnad that: ‘Universities need to mandate the self-archiving of all peer-reviewed research output in order to maximise its research impact for exactly the same reasons as they currently mandate publishing it.’ (Harnad, 2003). He also argued that OA self-archiving to be mandated by research funders and institutions so that the self-archiving of published, peer-reviewed journal articles (Green) can be fast-forwarded to 100% OA. On the other hand, analyzing the policy of IRs it became clear that still the policy of content inclusion, submission, and preservation are not well defined. There is a need to establish standard policy so that further these IRs can be used for information exchange worldwide.
Institutional repositories are being recognized as essential vehicle for scholarship in the digital world. This is evident based on the continuous growth of IRs around the world. However, this growth is more prevalent in developed and western countries as more than fifty percent IRs existed only in four countries (USA, UK, Germany and Spain). The contribution of all Asian countries is less than USA alone as it contributed about 400 IRs. Even in Asia, Japan and Taiwan have more than fifty percent share in the total number of IRs. Out of total Asian Countries only 25 have created IRs of which only five touched the figure of two digits. This is a clear indication that the movement of green road to open access through institutional repository in Asian region is in the age of infancy. So, it is now time to rethink the universities or institutes of the Asian countries, particularly developing countries, to establish such repository to make available permanently all digital collections of that institution and simultaneously to overcome the access barriers within the particular language periphery. At the same time researchers, academicians and practitioners within institutions need most of all to become convinced of their value and their immense potential. It may be expected that the next few years will see growing connections between institutional repositories as infrastructure and the broader issues that are emerging about strategies and infrastructure necessary to support the management, dissemination and preservation of research data (at the national, disciplinary and institutional levels).
Crow, R. (2002). The case for institutional repositories: A SPARC position paper, The Scholarly Publishing & Academic Resources Coalition, Washington, D.C., August 2002. Available at: http://www.arl.org/sparc/IR/ir.html.
Foster, N.F. & Gibbson, S. (2005). Understanding faculty to improve content recruitment for institutional repositories. D-Lib Magazine, 11(1), Available at: http://www.dlib.org/dlib/january05/ foster/01foster.html.
Gibbons, S. (2004). Establishing an Institutional Repository. Library Technology Report. 40:4: 11-14.
Harnad, S. (2002). The Self-archiving Initiative. Nature: Web debates. Available: http://www.nature.com/nature/debates/e-access/Articles/harnad.html.
Harnad, S. (2003). Open access to peer-reviewed research through author/institution self archiving: maximising research impact by maximising online access. Journal of Postgraduate Medicine [online] 49 (4). Available at: http://www.jpgmonline.com.
Hirtle, Peter (2001). OAI and OAIS: What's in a name? D-Lib Magazine, 7(4), April, 2001. Available at: http://www.dlib.org/dlib/april01/04editorial.html.
Hitchcock, S., Brody, T., Hey, Jessie M.N. & Carr, L. (2007). Digital Preservation Service Provider Models for Institutional Repositories: towards distributed services. Available at:
Lynch, Clifford (2001). Metadata harvesting and the open archives initiative. ARL Bimonthly Report, No. 217. Available at: http://www.arl.org/newsltr/217/mhp.html.
Lynch, C., & Lippincott, J. K.(2005). Institutional repository deployment in the United States as of early 2005. D-Lib Magazine, 11(9).
OAIS (2002). Reference model for an Open Archival Information System (OAIS), Consultative Committee for Space Data Systems, CCSDS 650.0-B-1, Blue Book, Issue 1, January, adopted as ISO 14721:2003. Available at: http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/documents/pdf/ CCSDS-650.0-B-1.pdf.
Pinfield, S., Gardner, M., & MacColl, J. (2002). Setting up an institutional e-print archive. Ariadne, 31. Available at: www.ariadne.ac.uk/issue31/eprint-archives/intro.
Wikipedia. Institutional Repositories. Available http://en.wikipedia.org/wiki/Institutional_repositories. (Last accessed on 15.11.2010).