The Active Repository Pattern

(This is the first of two posts forming my contribution to Open Access Week 2015.)

Context

Institutional repositories

It is easy to overlook, or take for granted, the way in which the drive towards open-access (over the last decade or more) has succeeded not only in creating several viable "institutional-repository" software packages, but also in encouraging libraries and IT departments in universities to deploy them. It should be recognised that individual universities have shown, and continue to show commitment to maintaining their repositories in spite of shrinking budgets.

While these repository systems are various, they mostly adhere to certain standard protocols, common metadata formats and conventions, allowing for a degree of potential interoperability. It is this potential for interoperation which elevates the institutional repository from a local system, to a networked system.

This achievement should be celebrated!

Repositories as infrastructure

Since institutional repositories have generally been developed with a degree of interoperability, we can consider their potential role in a wider infrastructure. Currently, the interoperability of institutional repositories is most clearly realised in the way in which they expose metadata records and (sometimes) content in a standard way, so that this information can be 'harvested' by an external process. The use of a standard protocol, OAI-PMH as well as standardised metadata-profiles such as OpenAIRE and RIOXX, allow institutional repositories to be first-class components in a distributed infrastructure. Since institutional repositories are where open-access metadata and content is created and managed, their role in this distributed infrastructure is both vital and fundamental. And because institutional repositories are controlled by their host institutions, they are collectively less vulnerable to political or business decisions made by any single organisation.

Centrally-provided services supporting open-access

In parallel with the rise of the institutional repository, there has been significant investment in centralised services which support open-access by interacting with institutional repositories. There can be valid technical reasons for providing such services from a centralised platform. For example, it has for some time been generally accepted that in order to search for open-access papers, the metadata records of institutional repositories first need to be harvested and aggregated into one database, which can then serve a centralised search-portal or similar online service. However, searching is not the only way in which open-access papers can be discovered (as is discussed later).

Many such services have operated at a national or regional level, as they have been paid for with public money. This creates a paradox: academic research (and therefore open-access to scholarly publications) is not an activity which is comfortably bounded by geo-politics. While services created in this way are often deployed openly, allowing global use (for example Sherpa RoMEO), such global access is vulnerable to being withdrawn, since the service provider bears no commitment to users beyond its own context.

An alternative to nationally-funded services are those provided by private corporations. These can, on the face of it, appear more sustainable: after all, if there is a profit to be made (even indirectly) that is increased by the provision of such services then support and investment is likely to continue. Of course this comes with its own risks, not least of which is that the corporations most likely to develop and support such services might be ambivalent about the goal of ubiquitous, global open access.

So, while the centralised provision of services, whether publicly or privately financed, might prove to be effective in some circumstances, it incurs the risk of dependance on a single organisation, the service-provider. Moreover, in a de-centralised infrastructure based heavily on the presence of institutional repositories, this centralised model of service-provision might not be the best fit, architecturally.

Institutional repositories as active participants

One curious side-effect of the architecture of infrastructure that has evolved to support open-access is that institutional repositories currently play a largely passive role. Essentially, institutional repositories act as databases of metadata and papers, and are not even especially Web-friendly.

This need not be the case. Other approaches to distributed online infrastructure have started to mature in recent years. In particular, strategies which depend on active notification are increasingly interesting in this space. We can conceive of repositories as active components in an open-access infrastructure, rather than passive data-silos. With a modest amount of development (in many cases the deployment of a 'plugin' or similar would be sufficient) institutional repositories could become systems which actively send notifications triggered by events such as, for example, the addition or modification of a metadata record or paper. Standard protocols (for example PubSubHubbub) to send notifications are already in mainstream use. And when it comes to conveying the detail of the repository event, mechanisms more sophisticated than OAI-PMH exists already: indeed ResourceSync would serve as a successor to OAI-PMH in this respect.

Open-access infrastructure would likely retain the need for some centralised services. Even many types of peer-to-peer systems retain the need for a central directory of participating peers for example. However, the idea is to reduce the dependance on central services, by moving more of the responsibility (and therefore functionality) out to the distributed institutional repositories. The centralised services required to support an infrastructure of distributed, active repository components could be modest, inexpensive and easily replaced.

Peer-to-peer systems work when the peers have a vested interest in participating, and when enough of them are sustained. Our institutional repositories fit this model. Increasingly, higher education institutions are committed to providing open-access to the scholarly. They also have a vested interest in gathering papers papers authored by each of their researchers, even when that researcher was not the lead author. This means that institutional repositories have an incentive to actively share papers, rather than simply making available what they already have. A distributed, peer-to-peer architecture of events and notifications would serve this purpose well.

I believe we need to do more to exploit the latent value in our institutional repositories. From the point of view of the network, they can and need to be much more than passive databases, and with a very modest technical investment they can start to be active components in the global infrastructure.

I call this the Active Repository Pattern. In my next post, Cooperative Open Access eXchange (COAX) I offer a proposal setting out an approach to this in more concrete terms.

Please feel free to leave any thoughts or comments below!