During an interesting session called the 'Great Global Graph' at the CETIS conference this week I formed the opinion that, in the recent rush of enthusiasm for 'linked data', three 'memes' were being conflated. These next three bullets outline my understanding of how these terms have been used in recent discussions, including the CETIS session:
- Open data: I see this as something expressed as a philosophy or, in more concrete terms, as a policy, such as that espoused by the UK Government. There are aspects of public ownership in this, but also a philosophical approach based on 'openness' and a rejection of the economic idea of value in scarcity of information. I think that specific technology does not come into this really: for example one concrete realisation of this policy in the UK is the Freedom of Information Act under which it is perfectly permissible for a data owner to supply data in any reasonable format and medium. Essentially, I generally take 'open' to mean accessible to all, notwithstanding conditions of use.
- Linked data: This one is trickier, as the term is used in quite a precise way by some proponents, based on the principles of linked data form the W3C. There are others who prefer a looser definition. There have been some well-reshearsed arguments about this, which generally come down to whether or not RDF is a pre-requisite of linked data. I've become inclined to use the term in its more precisely defined sense, in recognition of the efforts going on in this space.
- Semantic Web: This term introduces 'semantics' into the mix, by layering on ontologies allowing inferences to be made from the data itself.
It seems that these terms are often used together in the same discussions, and I suspect I could benefit from some separation of concerns in some of these discussions. It seems to me that the following are true:
- data can be open, while not being linked
- data can be linked, while not being open
- data which is both open and linked is increasingly viable
- the Semantic Web can only function with data which is both open and linked
Option 1 satisfies, in part at least, the drive to make available to the public data which has been paid for by the public and which might be useful to it. There are those (and I count myself among them) who generally believe that at present, for example, it would be better to quickly make the data open in some useable form than to delay this unduly while it is processed into RDF. However, there is a reasonable case to be made for not polluting information spaces with poorly prepared datasets.
Option 2 is an approach for organisations which want to take a more resource-oriented approach to managing and exploiting internal information assets. In the CETIS session an interesting idea was floated around how such an approach might go a long way to helping organisations address data-quality issues.
Option 3 seems increasingly viable. There is value in the 'linked' aspect, regardless of whether or not semantic layers are introduced. This is how the Web works after all, and much of the impetus behind Web 2.0 seems, to me, to have come from a healthy mixture of addressable and accessible information and human-mediated convention (e.g. 'hackable URLs). Perhaps this is the 'Great Global Graph' and it's just a matter of scale?
I'm very open to comment and argument on any of this. Perhaps I'm worrying unduly about these things being mixed up, but I do sense that this space could benefit from some clarity to match the excitement and endeavour.