During an interesting session called the 'Great Global Graph' at the CETIS conference this week I formed the opinion that, in the recent rush of enthusiasm for 'linked data', three 'memes' were being conflated. These next three bullets outline my understanding of how these terms have been used in recent discussions, including the CETIS session:
It seems that these terms are often used together in the same discussions, and I suspect I could benefit from some separation of concerns in some of these discussions. It seems to me that the following are true:
Option 1 satisfies, in part at least, the drive to make available to the public data which has been paid for by the public and which might be useful to it. There are those (and I count myself among them) who generally believe that at present, for example, it would be better to quickly make the data open in some useable form than to delay this unduly while it is processed into RDF. However, there is a reasonable case to be made for not polluting information spaces with poorly prepared datasets.
Option 2 is an approach for organisations which want to take a more resource-oriented approach to managing and exploiting internal information assets. In the CETIS session an interesting idea was floated around how such an approach might go a long way to helping organisations address data-quality issues.
Option 3 seems increasingly viable. There is value in the 'linked' aspect, regardless of whether or not semantic layers are introduced. This is how the Web works after all, and much of the impetus behind Web 2.0 seems, to me, to have come from a healthy mixture of addressable and accessible information and human-mediated convention (e.g. 'hackable URLs). Perhaps this is the 'Great Global Graph' and it's just a matter of scale?
I'm very open to comment and argument on any of this. Perhaps I'm worrying unduly about these things being mixed up, but I do sense that this space could benefit from some clarity to match the excitement and endeavour.
[…] in an unhelpful way. Paul Walk, a colleague of mine at UKOLN, recently wrote a blog post, ‘Linked, Open, Semantic?‘ that helps to clarify the […]
[…] posts which I will site (and in doing so hope to mediate the effects of link rot). Paul Walk[ref 4] provides this […]
[…] Linked, Open, Semantic? (source: Paul Walk, […]
[…] usefully, ensured that his thinking was not just trapped in the space and time of the session but published on his blog (with the benefit of subsequent […]
[…] http://bit.ly/2NrNQn – My understanding: #tdwg ontology should help option 1 (open) data become option 3 (open-linked) data – option 4 later […]
Clear thinking and good definitions. I have sent this around broadly in my data center to help people understand what I've been rambling about the last year or two. Thanks. The Polar Information Commons also seeks to adopt this approach http://polarcommons.org
Adrian, thanks for the comment - valid and useful. I take your point, although I was trying to make the point that open!=linked. Perhaps it is counter-productive to use 'open' to mean (as I did) 'not completely closed', as this kind of 'open' can conceal all kinds of potential problem.
You say: "Essentially, I generally take ‘open’ to mean accessible to all, notwithstanding conditions of use." I think this is a too broad conception of "open". Conditions of use and reuse play a major role in the concept of open data. Otherwise, all data (e.g. amazon or google-books) which is accessible and searchable via an interface would be open even if you cannot reuse this data.
Do you know http://opendefinition.org by the Open Knowledge Foundation (http://okfn.org/)? They put some effort into defining "open". The short version goes: "A piece of knowledge is open if you are free to use, reuse, and redistribute it". The OKFN also created CKAN (http://www.ckan.net/), a registry of "open" data projects and packets and their alignment with the open definition.
Best topics in altc2009 for 2009-11-13…
Best topics in altc2009 for 2009-11-13…
Best topics in altc2009 for 2009-11-12…
Best topics in altc2009 for 2009-11-12…
So long as the rest of the world can see this stuff, it probably isn't succeeding ;-)
So what do you think the rest of the world will have to say about this stuff once it starts to get used by more than a small group of Web geeks? <– proud to be one of course.
Hi Paul - nice post and like your 4 point synopsis. So queuing off point 4 allow me to introduce a point 5.
also here's a nice reference that I'm sure most have read: PWC’s report on importance of ontologies and semantic information via linked data in Technology Forecast: A Quarterly Journal, PricewaterhouseCoopers, Spring 2009,
[…] Walk seems to be the first of the session participants to blog. He captures one of the axes of discussion well and provides an attractive set of distinctions. It […]
Hmmmm - scenarios arising from a directed-linked-data Web…. interesting to speculate. Your latter scenario smacks to me of a kind of environment where 'market-forces' will force the issue. Chris also alludes to the possibility of more sophisticated protocols/technologies in future which might, I suppose, negotiate the directed-linked-data-graph in a more sophisticated way?
Of course, the take-away from both your's and Chris's comments is that 'open' is not a simple, precise concept in all this!
ooh me again….
Just wondering.. in "The semantic web" is a danger of using linked but not open data (Where you link to other peoples resources, but don't expose your own) that third parties will provide a proxy for your resource (/research/dataset/ideas) for others to link to and say things about? Just wondering if it's useful to expand upon each of the scenarios and see what might happen in each…
I'll shutup now :)
As usual.. last on the bandwagon :)
Really good stuff Paul, think they are useful definitions…about the linked but not open bit tho: Think this is mostly true but in terms of the old AAA slogan "Anyone can Say Anything about Any topic" in order for someone to express an opinion about your resources it has to be at least partially open. I guess this is the distinction you are drawing - that of a directed openness (Linking out but not allowing linking in) but I'm not entirely convinced that directed linked data is really linked data. I've confused myself already. In the end tho, this sounds like exactly your conclusion, that the semantic web needs open linked data. I'd be tempted to try and clearly define the "Open" that chris talks about, as I think that might be a subtly different to the open that you're talking about.
Good stuff tho!
From the (much appreciated) comments above I conclude that my distinction between Linked Data might be useful, but is probably a distinction based on nuance rather than principle.
I accept that my assertion that the Semantic Web can only function with data which is both open and linked is too strident (in terms of the 'open'). That is to ay all of the benefits which can come from links and semantics can be derived in a closed or restricted environment.
I do however think it fair to say the that the Semantic Web has more chance of growing and being successful the more data is open!
I just couldn't agree with your last conclusion: "the Semantic Web can only function with data which is both open and linked", especially since the only additional condition you imposed above linked data was bringing in ontologies. I would agree that it is much preferable in terms of the whole Semantic Web for the data to be open, and I suspect that the technologies are likely not available for making the semantic web work when parts of it are limited access. But data can be linked, re-usable, semantically structured, supported by ontologies, but necessarily (legally and ethically) of restricted access.
If we keep insisting that open is the only option, we'll leave this important group (which includes a LOT of research data) out in the cold!
I pretty much accept your distinctions and agree with the conclusions. I think my approach which is "data should be open. Open data should be linked" is compatible with your comments about Option 1.
There's a further distinction in your Semantic Web definition, perhaps, between "that which is linked and open" and "that which is linked, open and as consistent with some ontology".
I think your tri-part 'open', 'linked', 'semantic' is useful but in the actual labels you've used you draw a firmer distinction between 'Linked Data' and the 'Semantic Web' than would be accepted by everyone.
Bare RDF carries some 'semantics' (and is firmly part of Linked Data, as you acknowledge) so I think you are probably trying to draw a distinction between the use of OWL-based ontologies (your view of the Semantic Web?) in order to support inferencing vs. use of raw RDF (your view of Linked Data?) where graphs can be merged but inferencing is not possible?
Somewhat tangentially… I've done a couple of talks recently where I use 'open', 'social' and 'linked' and in which I have tended to roll 'linked' and 'semantic' into one. On balance I think you are right to caution against this… but in practice I think the divide between the two is perhaps more fuzzy than we might like.
I like this separation :) Just like to add that I do feel there is a lot of merit to using the notion of 'Linkable Data' internally - ie from "2. data can be linked, while not being open".
Giving things you are trying to curate or research a label, something more visible and more 'global' than just a database primary key, is a very powerful thing. You can have stores of data that describe the item, without that store having to hold anything more than the label. You can then have stores that talk about the stores' data - 'This set of information consists of comments and trackbacks from the outside world' and 'This data was added by researcher X and is based on the following stores of evidence….'
And you never have to alter the original item when using that pattern, which (speaking from a preservation/versioning point of view) is a Good Thing.
/me steps off the 'I <3 named graphs' soapbox
Excellent and very useful disambiguation of the terminology Paul. Your comments are highly relevant to the discussions we had in the Find and Seek session #cetis09find where there was considerable enthusiasm for using linked data approaches to manage open educational resources. Something we hope to explore further at our Semantic Technology Working Group in December.
Continuing debate re linked open data…
Making linked open data sound more complex than it needs to Share on Facebook……
Yeah. Downloaded the entire Semantic Web to it yesterday, all 89kb of it! :-)
Thanks Pete. There was some interest in the CETIS 09 session about using these techniques in closed environments - it would be interesting I think to get some of these ideas and motives recorded. Will have a ponder….
Let us know once your laptop is completely ready and we'll all come over and build the Semantic Web there. Hope you've crammed plenty of memory into it! ;-)
I think you are right - there does need to be some separation made to clarify things. Time to take stock. I like the four premises ("options"):
Option 1: I agree that "Open" data now is preferable to "Open" "Linked" data later - mostly because "Open" data can be manipulated to behave like "Linked" data and eventually converted to whatever format linking will require - probably RDF.
Option 2: Is the approach we're looking into here at BEAM - "Open" is not really an option, at least not for now (confidentiality, protecting the innocent, those kinds of things) but I do not see why a "Linked" and "Semantic (Micro) Web" approach to processing and using it cannot work even when the data is not "Open".
Option 3: I hope that is true. People have been talking about it long enough that if it doesn't happen soon you have to ask if people are voting with their feet!
Option 4: Agreed that the "Semantic Web" requires "Open" "Linked" data, but that does not mean we cannot learn from and use those ideas in closed environments. For reasons too political to dwell on the Bodleian makes digital resources available offline on a laptop. I've configured this laptop to replicate the Web - it has DNS, DHCP, a Web server and a "Viewer" (all virtual machines) - and the next step is to use "Semantic Web"-like tools and techniques to provide interfaces and queries over the collections.
That is probably a way off yet, but I just wanted to highlight (and you don't suggest that you think the opposite) that "Semantic Web"-like ideas can be used in non-"Open" (in the sense you outline) ways.