Any any any old data

Over on ZDNet, Paul Miller has blogged some thoughts about what he calls the 'Data Cloud'. He points out that in the evolution of the 'cloud computing' paradigm, the:

...emphasis for much of this wider discussion remains firmly rooted in the realm of computation and storage. On many levels it’s about offloading the costs of scaling and maintaining local infrastructure, and ‘data’ doesn’t really enter the conversation at all. Something is ‘stored,’ but it’s a nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

Initially, Paul posted the germ of this idea to Twitter, where I responded with a degree of scepticism. Having given it a little thought, I remain sceptical. However, I have realised that my own, internal, ideas of what the 'Cloud' entails has informed my scepticism, so I figure it might be worthwhile externalising these ideas. (Note that Paul has helpfully included in his post a variety of definitions from good sources, so I won't revisit these here. Like such celebrated memes as 'Web 2.0', the meaning of 'cloud' in this context is delineated by broad consensus, rather than strict definition. Also, I suggest that the cloud is highly connotative - depending on the exact context within which it is used it can imply much.

The word itself must surely have come from all those network diagrams which included a cloud to denote the 'great outdoors' - i.e. the stuff beyond the local area network. (I actually remember seeing such a diagram years ago with "here be dragons" written inside the cloud).

Anyway, for what it's worth, here are some of the characteristics which I think are important, and why I disagree (perhaps not very strongly) with Paul.

Remotely hosted

In a literal, basic sense, if services or data are in the cloud, then they are hosted remotely, on someone else's infrastructure. The immediate implication might be that the user also doesn't particularly care, or even know about the details of this arrangement. At one level, this is nothing new - and if the data cloud is just meant to signify data out there, then OK - but this notion is almost as old as computer networking itself, and was certainly present at the birth of the Web. However, the reason that the cloud meme has gained such traction over the last two years lies in the new possibilities for moving not just data, but applications, services and even infrastructure onto remote servers. Closely aligned with the Cloud in this context is Software as a Service (SaaS), which in contemporary terms means the delivery of application-specific functionality from a remote source, typically to a modern browser.


If it's in the Cloud, then it is available anywhere. There are many examples of where this statement could be challenged but there is, nonetheless, an expectation that if an application is delivered to me from the Cloud then I ought to be able to access and use it from any connected device with the requisite software. There is a weaker assumption that the requisite software might be simply a modern web browser.


One of the really interesting developments of recent years has been the introduction of infrastructure services to the Cloud. This moves an important aspect of computing services closer to the 'utility' model. I know which company 'supplies' my electricity because they take large amounts of money off me and regularly send me 'advice' on how to reduce my bill (in case you're wondering the best advice is to, "switch off things which are powered by electricity when you're not using them"). However, I don't know where that electricity is being generated, and frankly, one lot of electricity is much like another, regardless of who supplies it (in the UK at least!). So, I suggest that commodification works best where the commodity is undifferentiated. The history of computing is filled with examples of evolution towards undifferentiated supply of functionality - abstraction is the method used to achieve this. For example, if I want to run Linux on my servers, then I can use a variety of hardware, without much having to worry about this. If I pay someone else to provide me with Linux servers in the Cloud (this blog is running on one such), then I can get away with not even knowing the specifics of the hardware which hosts my system. To an extent, in trusting your infrastructure to a third party, you are saying "I trust you, look after this lot for me please and don't bother me with the details". In fact, we have now reached the point, with services such as Amazon's EC2 service, where we can say, "I'd like some computing power please - any old cycles will do". And right here is why I think I disagree with Paul. If you believe, as I do that the Cloud implies a move towards undifferentiated, commodified hardware and services, then I don't see how to include data, at least most data. How often do you hear a user say, "I'd like some data please - any old data will do". The value of data is often measures in terms of scarcity, provenance, authority, quality. When Paul describes data as a:

nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

I think he's right - this is how data is represented in the Cloud. Where we differ, I guess, is that I think that this is a reasonable and useful way for the Cloud to treat data - it allows the Cloud to become ubiquitous and undifferentiated, feeing up the our time to concentrate on what we really care about - our data. I'll end with a song...... Any old iron, any old iron, Any any any old iron....