professional reflections

Posts [page 1 / 6]

BL Labs and AHRC Digital Transformations

BL Labs LogoLast week I attended an event, organised by BL Labs working with the Arts and Humanities Research Council, at the British Library's Centre for Conservation. The event, described as a showcase event for British Library Labs and AHRC Digital Transformations, consisted of a packed series of presentations - I won't describe them all (and I missed Bill Thompson's talk anyway) but will, instead, pull out some snippets which interested me in particular.

BL Labs

In introducing the event, Caroline Brazier (Director, Scholarship and Collections at the British Library) described the purpose of BL Labs as being an investigation into what happens when the British Library's large digital collections are brought together with enquiring minds in pursuit of digital scholarship.

Later on, Mahendra Mahey (a former colleague of mine) who manages the BL Labs project added some details - the BL Labs is a two-year Mellon-funded project to conduct research and development "both with and across" the British Library's digital collections. It does not do any digitisation, only working with existing digital collections. BL Labs has worked with a variety of people already, from those who were winners in a competition designed to solicit new ideas, to others who have simply turned up with an interest in working with the data. BL Labs is very interested in figuring out how the British Library can work with digital scholars and in pursuing this, also aims to discover new approaches to digital scholarship.

An interesting aspect to BL Labs has been their approach to selecting which of the 600+ digital collections they would work with. Limited resources meant that it would have been impractical to try to support people working with any and all of these. The criteria used to filter these to a more manageable subset were, in order:

  • if the collection was copyright-cleared or not
  • whether or not the collection was or had been curated (the project team felt the need to understand how the collection had been formed/developed)
  • the state of available metadata for the collection
  • the accessibility of the collection

Mahendra emphasised two important lessons that the BL Labs team have learned as a result of their activities:

  • the importance of getting the curators of the collections involved in the BL Labs competition and events in order to properly understand and present the data
  • the importance of releasing the data as early as possible since, as researchers start to engage with the data their research questions tend to change.

Later on Adam Farquhar (Head of Digital Scholarship at the BL) emphasised how the BL now has Digital Curators.

An aspiration of the BL Labs team is to develop a'top-level' gateway to BL digital collections at http://data.bl.uk, but this does not yet exist.

Mahendra's colleague, Ben O'Steen (Technical Lead at BL Labs) briefly demonstrated his innovative Mechanical Curator. Some years ago, Microsoft partnered with the BL to digitise 65,000 books. The majority of this collection is from the late 19th century, and the data is now public domain. The Mechanical Curator extracts "small illustrations and ornamentations" from these books and publishes them on a blog, creating a collection of often overlooked resources. You can read more about how this works on the BL's Digital Scholarship blog.

Scholarship driving the development of new tools

Biblical concordance Professor Andrew Prescott gave a interesting talk on how arts and humanities are being transformed through digital scholarship. He used the example of a "biblical concordance" from the fourteenth century - essentially an early example of an index allowing one to look up terms and find occurrences of them throughout the text (apparently this innovation helped drive the adoption of numbered 'verses' in the bible). The point that Andrew illustrated is that this represented an enormous advance in scholarship. He invited us to consider how we ought to characterise such intellectual output, describing it as:

an enormous scholarly achievement in itself, but seen as a tool - wide ranging in its impact, but difficult to pin down.

Andrew went on to suggest that as the humanities and arts embrace digital scholarship, we should anticipate and encourage the development of new tools and approaches. He contrasted the sort of approaches taken to studying a letter from Gladstone to Disraeli, with the available archive of email messages - some 200 million of them - in the George W Bush Presidential Library. According to Andrew, there is no longer an easily identifiable set of methods that one might apply to scholarship in the digital humanities and arts: there is an increasing variety of formats, other disciplines are being introduced, and techniques are now ad hoc. He also pointed to a stronger connection to "practice-led" research, especially in the arts.

Andrew's presentation was illustrated with plenty of interesting projects. I'll briefly note a few (those which piqued my interest) here:

Andrew's slides are available on Slideshare.

The Digital Panopticon

Sharon Howard (University of Sheffield) introduced this new 4 year international project which will examine the global impact of London punishments between 1780 and 1925. Using Jeremy Bentham's Panopticon as a neat metaphor for the project, Sharon described how two systems of punishment were run concurrently in this period: criminals could expect to be sentenced to transportation for seven years to life, or they could face a shorter incarceration in a London gaol. The Digital Panopticon will create a digital laboratory for investigating "power and human response", using criminal records.

Sharon quoted Michel Foucault, who said:

The Panopticon may even provide an apparatus for supervising its own mechanisms…

I'm not quite sure what this will mean in practice, but I am intrigued by the idea of applying the all-seeing, monitoring eye of the Panopticon to human behaviour in this way, using historic data rather than real-time surveillance.

Conclusion

It was interesting to see how the British Library is experimenting with digital curation. I have been aware of the growing interest and momentum in the so called digital humanities for some time. The projects described at the showcase event illustrated the range, extent and potential of the exploitation of structured data in digital scholarship in the humanities and arts. I think the significant message for me would be that data changes scholarship very directly - that as data becomes available, so the research questions themselves are changed.

Thanks to Mahendra, Ben and the rest of the BL staff who helped put on an interesting event.


Joining EDINA

I'm delighted to announce that I will be joining EDINA in August as Head of Technology Strategy and Planning

I have long admired EDINA, having had several opportunities to collaborate with them in the past few years. EDINA is a powerhouse of technical service delivery and innovation, and has carved out an enviable national and international reputation in several fields. I'm excited to be joining this successful and innovative organisation, and am looking forward to what I have no doubt will be a challenging role.

Rather than moving to Edinburgh, I'll be based in a Bristol office, but I expect to be visiting the mothership on a frequent basis.

The past year at UKOLN has been a difficult one for all of us there. Like me, many of my colleagues have found new positions, while others are actively searching for opportunities. Although UKOLN is greatly diminished by the loss of many talented and committed staff, the deep expertise and knowledge remains with them individually, and I hope that I may be able to collaborate with some in future. I have benefitted enormously from six years of association and collaboration with such colleagues. I would like to thank them for their support, and to wish them well for the future, wherever that may be.


Call for feedback to the ResourceSync specification for synchronisation of web resources

Resourcesync logo I have been slightly involved (through Jisc funding) with the ResourceSync specification project, being led by Herbert Van de Sompel of the Los Alamos National Laboratory. The project has just released a draft specification, which is available at http://www.openarchives.org/rs/.

The draft will be available for public comment until March 15th 2013 - you are invited to comment via the ResourceSync Google Group. Group discussions are openly accessible; posting requires group membership.

In Herbert's words:

The ResourceSync specification describes a synchronisation framework for the web that consists of various capabilities that allow third party systems to remain synchronised with a server's evolving resources. The capabilities may be combined in a modular manner to meet local or community requirements. The specification also describes how a server can advertise the synchronisation capabilities it supports and how third party systems can discover this information. The document formats used in the synchronisation framework are based on the widely adopted Sitemap protocol.

ResourceSync is a collaboration between the National Information Standardization Organization (NISO) and the Open Archives Initiative (OAI). It is funded by the Alfred P. Sloan Foundation and Jisc.


RIOXX application profile - draft 1

Together with Sheridan Brown, I have been tasked with developing some guidelines and a metadata ‘application’ profile for institutional repositories (IRs) in the UK. We are calling this work RIOXX. This post focusses on the application profile more than the guidelines, and describes phase 1 of the project, which aims to deploy this application profile across IRs in the UK by the first quarter of 2013.

Objectives

  • to develop an application profile which enables open access repositories to expose metadata more consistently and which, in particular, conveys information about how the item being described in the metadata was funded
  • to develop general guidelines for repositories which support the use of the application profile
  • to support such technical development as is necessary to implement these recommendations and the application profile in common repository platforms
  • to develop these such that they pave the way for a likely CERIF-based solution in the medium-long term.

Scope and approach

Funder policy regarding Open Access (OA) is being actively developed and the OA landscape is shifting. The emphasis in this phase of RIOXX is to do something which is adequate and able to be quickly implemented. This work will provide an application profile and guidelines which are inherently an interim solution. Broadly speaking, the approach we are taking is as follows:

Develop the simplest possible application profile, based on Dublin Core (DC).

Pretty much all repositories support DC, as another application profile of DC, OAI-DC, is a mandated minimum metadata format for the ubiquitous protocol for harvesting metadata from repositories (OAI-PMH). If all goes well, the development work needed for repository systems should be minimised.

We have examined two related initiatives: the OpenAIRE guidelines (and the Driver guidelines which preceded these), and the EThOS Toolkit which developed an application profile of DC for eTheses.

Consider a CERIF-XML expression of this application profile

The interest in CERIF as the de facto standard format for exchanging this kind of information between systems is growing steadily. We are liaising with the CERIF Support Project and ensuring that a transition towards a CERIF-based approach remains viable.

Develop a modelled, expressive application profile

In later phases of RIOXX, we hope to develop the application profile more fully. This will take into account such things as: * greater use of controlled vocabularies * a move away from DC and towards CERIF * greater involvement of systems other than repositories - notably Current Research Information Systems (CRIS). * modelling of ‘access-level semantics’ - i.e. describing how, where and under what license or conditions the resource might be accessed and used

Rationale for some decisions in phase 1

Keeping things very simple

Timescales are very, very tight. From a pragmatic, technical point of view we have restricted ourselves in this phase to developing an approach which allows the repository to emit RIOXX records based on information properties already catered for in the repository system (that is, the placeholders for Sponsor and ProjectID already being there, even if the actual data has not yet been entered). We have deferred a more complete and complex approach to a later phase because the capacity to deliver this kind of information from institutional systems is developing rapidly.

The ProjectID property

We found ourselves unable to simply adopt the OpenAIRE guidelines as these mandate a particular syntax for the ProjectID (designed for EC funded projects) which would preclude certain UK funders. In any case, we consider it to be a mistake to embed semantics into this property and believe it is best provided as a globally-unique, opaque identifier. To this end, we are actively looking at the possibility of funders minting DOIs for the ProjectID. In the meantime, we will be requiring that the ProjectID be whatever identifier is provided by the funder of the output being described in the record. We have chosen the term ProjectID rather than, for example, GrantID, as we have been advised that the former is the more widely used term in common usage in the UK.

The Sponsor property

For phase 1 we are mandating this property, but specifying only that a recognised form of identifier for the funder/sponsor be used. This will mean a free-text string for now. We are actively exploring possibilities for identifying and then mandating a particular authority list of funder names, such that this property becomes underpinned by a controlled vocabulary. However, this will not make it into phase 1. This property, while essential in the short term, might become more of a convenience than a necessity, as the ProjectID becomes more reliably ‘actionable’. In the medium-term, we would anticipate being able to reliably derive the sponsor/funder from the ProjectID. For this reason, we have not modelled the relationship between these two properties closely - except insofar as they exist in a particular record. This means that some records may contain more than one Sponsor and more than one ProjectID with no direct way to relate a given ProjectID to a given Sponsor. While it would be possible to model this relationship, we have chosen not to do so in this phase, because:

  • it is not the common case that a record would have more than one Sponsor
  • it is more likely that a record might have more than one ProjectID, but only one Sponsor. This happens where a project has multiple versions - such as when the PI moves institution during the project.
  • it is unlikely that current repository systems will be able to provide more richly modelled relationships between these properties without further development
  • it is the common case that a record will have one Sponsor and one ProjectID.

We anticipate that this will need to be modelled more thoroughly in future phases.

Deferring the ‘access-level-semantics’ question

In order to convey the precise nature of the open-access ‘state’ of resource, RIOXX will need to develop a richer way of describing such concepts as ‘green’ or ‘gold’ open access, embargoes, licenses etc. The use-cases and operations which will depend on such information are not yet clear and, while the time has now come to model these, this should not be done in a hurry.

The following is a table of proposed elements and recommended formats. We propose to use extend the Dublin Core elements with two new elements under the rioxterms namespace.

  • M: Mandated
  • R: Recommended
  • O: Optional
Element Inclusion M/R/O Format Format M/R/O
dc:title M Free text. It is recommended to use the form: Title:Subtitle R
dc:creator M Free text. Recommended practice is to either use the form Last Name, First Name(s) or a unique identifier from a recognised system. Each creator should be given a separate dc:creator element R
dc:identifier M A globally unique identifier. It is strongly recommended to use a URI which can be de-referenced (i.e. is 'actionable') where this is appropriate R
dc:source M Journal title, reference or ISSN M
dc:language M Use ISO 639-3 language codes M
rioxxterms.projectid M Use the identifier provided by the funder to indicate the project within which this output has been created M
dc:coverage O The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date or date range) or jurisdiction (such as a named administrative entity).  
dc:rights O No agreed vocabulary or semantics exist for this in the context of Open Access papers, and it is common practice for this to be ignored by repositories currently. Some work is being funded to look at this area for the next phase of RIOXX. For now, this element has to be optional.  
dc:audience O Free text.  
dc:format R It is recommended to use the IANA registered list of Internet Media Types (MIME types) M
dc:date M One date using ISO 8601. Published date is the default and recommended interpretation. M
dc:type O This is currently free text and an optional element. However, RIOXX phase 1 will be recommending that a vocabulary be adopted or developed for this element. O
dc:contributor O (as for dc:creator)  
rioxxterms.sponsor M Free text - Funder name using the funder's preferred format O
dc:publisher R Free text indicating the name of the publisher (commercial or non-commercial) O
dc:description R Best practice is to use an English language abstract. O
dc:subject R Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. E.g. LOC, MESH. O

I would appreciate any comments people might have about the technical aspects of this.


Web preservation - a minor anecdote

I have recently resurrected a domain I used to use actively - sockdrawer.org. I started blogging on this site in about 2003 and stopped using it in about 2007. I only started using it again this week because I needed a free domain name and I discovered I was still paying for this one…. Anyway, having installed a new web-server which listens on www.sockdrawer.org, I had cause to examine the server logs. I was surprised to find this line:

[Sun May 20 09:06:21 2012] [error] File does not exist: /opt/web/sockdrawer.org/public/blog, referer: http://www.jroller.com/rickard/entry/word_to_html_in_java

As my server has only been up for 24 hours after a 5 year hiatus, this suggests immediate evidence of some interest, however limited, in some of the content that was once here. And I don't have an archive :-( I have actually found this missing resource - a blog-post - archived on the Internet Archive, and there are a few other resources there from the same blog/website (the majority are gone for good I imagine). I could, if I thought it worth the effort, rebuild the original resources from the Internet Archived versions. All of this has, at least, made me think a little about web preservation at a personal level.


Library systems of the future

Edit: The presentation I gave to accompany this post is available on Slideshare

I was asked by Ben Showers of the JISC to write a ‘challenging and provocative vision’ for library management systems, for a joint JISC / SCONUL workshop. I was given a free hand with this - the only parameters were that the piece should be non more than a side of A4 paper in length, and that it should use 2020 as its target year for prediction. I think I ignored both of these restrictions, but I had fun and it did provoke some discussion….

Dramatis personae:

  • Alby, a young student & researcher in full time employment
  • Charlotte, a venerable librarian
  • Bob, Dan and Eva, semi-autonomous software agents

Following the unprecedented Conservative ‘walk-over’ election victory of 2015 and the subsequent consolidation in 2019, the landscape of higher-education in the UK is all but unrecognisable. The free market dominates the buying and selling of courses, and the provisioning of learning and research resources has, in the end, simply had to follow suit. Copyright has mostly been ‘fixed’ in the virtual world through a combination of an adjustment to more modest expectations of compensation for copyright holders, workable systems to control distribution, and global agreements allowing extradition and prosecution.

The student researcher (1)

Alby works, full time, as a software engineer. As part of his job, he is given some time to pursue research topics of interest to him and to his employer. His firm gives him a small budget to support this. In the evenings he studies part-time for the new Masters++ qualification. He is enrolled at three universities, visiting one of the these - the local George Osbourne University (GOU) - every Thursday evening. He finances all of this himself.

On Monday evening, when Alby gets home, he goes straight to his laptop and works through all the notes he has dictated into his smart phone during the day. He has become interested in the evolution of library systems and wants to register this interest on the Research Interest Grid (RIG). While recording notes into his phone, he has also published some of these into StreamingConscious, the latest social network to become popular with researchers, and has gained a few new connections from people with aligned interests, including a promising one with a subject librarian at GOU.

Alby then invokes his Foraging Agent, ‘Bob’. A license for Bob was given to him by a publisher, Coyote, which specialises in resources for software engineers, in return for sending him a steady stream of advertisements. Alby adopted Bob because he liked its interface, but he suspects it has in-built biases towards certain, commercial information sources. He believes that he compensates for this by carefully defining his research questions in Research Question Format (RQF) and filtering the results.

Bob runs constantly on Alby’s ‘slice’ - a portion of Personal Cloud (PC) infrastructure provided by a well-known supermarket chain. After a series of questions and answers, Bob is armed with three carefully RQF research questions, and a set of parameters, such as when to report back, and how much of Alby’s research budget to spend on a single transaction before asking him for approval. Bob has learned through observation how Alby likes to work. It knows him in a sense, enough to represent his interests when dealing with other agents. Alby then instructs Bob to begin searching, negotiating and shopping for answers, leads and recommendations, while he gets on with some reading. Alby has grown to trust Bob.

The Librarian

Charlotte is a subject librarian with many years’ experience (she tried to retire 3 year ago but has been forced to come back to work), specialising in software & systems engineering, and currently working for George Osbourne University. On Tuesday morning she checks the reports from her Listening Agents over breakfast. She controls several agents running on the library’s slice of the GOU cloud.

Bob, an agent representing someone called Alby has made contact, coincidentally, with two of her agents - one which represents GOU and which reports to her, and her own personal agent, Eva. Only yesterday, BirdSong (a social network monitoring agent) had suggested that she connect with @alby on StreamingConscious based on their mutual interest in the history of LMS systems. Charlotte’s interest in LMS systems is partly fuelled by nostalgia - she has been working with such systems for more than thirty years.

She sees that Dan, the GOU agent, has supplied Bob with material to which Alby is automatically entitled, and has automatically reserved two books from the local GOU collection for him. In so doing, Dan recommends to Charlotte the purchase of a newer edition of one of these textbooks.

Dan has also made a number of offers to Bob of more restricted material which can be supplied at a cost, including 3 inter-library-loans. Bob has accepted one of these paid-for items on Alby’s behalf and Charlotte is happy to see that it has also observed the protocol of explaining why it has not accepted the others. In one case, she sees that Bob was successful in bidding on eBay for a second-hand copy of a book which Dan had offered as an ILL. Bob has also made an offer to Dan for ownership of the book, once Alby has finished with it, in return for one free ILL. Dan needs Charlotte to approve this. However, she declines, knowing the book to be flawed, despite its 4 star popularity rating. Dan registers this decision, quietly blacklists the book against any future recommendation, and reports this decision to Bob.

Dan notes that Bob has also registered a second book on Alby’s personal virtual book-shelf and indicated a willingness to make this available to the GOU circulation agent for loan to other GOU students as part of the ‘Support Your Library’ protocol, in return for one free ILL token. Charlotte accepts this offer.

Charlotte instructs Dan to negotiate with Bob to arrange a meeting over coffee for Alby and herself. She does this partly because Eva has separately registered Alby’s interest on the RIG and it seems worthwhile meeting with Alby in person to discuss his research. She decides to investigate a couple of other suggestions thrown up by Dan in the meantime. She also notes that Dan has suggested a couple of other contacts to Bob - other people who are enrolled at GOU and whom Alby may wish to befriend on StreamingConscious - - as part of a strategy to reinforce the local GOU social network of students and researchers.

The student researcher (2)

Later on Tuesday morning, Alby wakes to find an interesting report from Bob waiting for him. He discovers he is the proud owner of a new book on LMS system design and is pleased to note that it has a four star rating - one star above the threshold he has set in Bob’s book-buying decision parameters.

Bob has, inevitably, also turned up a few offers of information and resources from the ‘invisible market’. He knows that if you have the right connections, you can get just about any book in ePub5 format. The penalties for possession of an illegally obtained, copyright resource are stiff however. Although it is not illegal, he is also a little wary of using Turpin, the global federation of Open Access papers and other resources, as he has been culturally conditioned to be suspicious of things which appear to be ‘free’.

He also finds a tentative appointment in his diary for coffee with @charlotte, the subject librarian with whom he connected yesterday on StreamingConscious. As he works close by the university, he accepts the appointment. He can pick up his reservations while he’s there.

Face to face, later that morning

Alby finally puts his pen down, and takes a swig of his coffee. He has been writing furiously for half an hour. Charlotte has just taken him on a whirlwind tour of the evolution of the LMS.

She has described how the library has learned, over the last decade, that client relationship management (CRM) is crucial to its mission. Adjusting to the new realities of social networking and global search, the LMS has become a distributed and loosely-coupled collection of processes, all designed to help connect people with resources and with each other.

Alby learns how the rapid introduction of semi-autonomous software agents into research practice took many by surprise. Although the concepts were not new, and much of the technology existed in one form or another, it took the confluence of a number of factors to finally introduce agent-mediated research:

  • the cultural acceptance of an ‘always online’ culture brought about through the ubiquity of smart phones, the prevalence of global social networks and move from the desktop to cloud-based processes
  • the utter complexity of negotiating through ‘permission stacks’ to determine whether or not an individual has the rights to access a given resource in a given context
  • the complexity of relationships between individuals and institutions

Charlotte explains how, from having been a destination for local researchers, the LMS has dissolved into the fabric of a vast, distributed network of research interests, library collections, national, private and open resources.

While the curation of local collections remains important, the facilitation of networking, and the handling of transactions, both social and financial, has taken over as the focus of the LMS. She points out that where once it was quite easy to point to the LMS - at least as a line in a budget sheet - it has become somewhat nebulous on recent years. The LMS has become the coffee-shop of cyberspace, where software agents meet to compare notes, register interests, make deals….

Taking a sip of her peppermint tea, Charlotte sighs as she remembers how simple it all once was.


Responsive innovation - change management in a recession

Back in August I gave a short presentation to the JISC Innovation Group about the DevCSI project, introducing some ideas about possible future directions. The DevCSI project is a JISC-funded initiative designed to work directly with (software) developers in Higher Education through the general approach of encouraging them to establish a community or peers, sharing knowledge, experience, code etc. An aspect of this which has emerged during the first year of the project is the potential value in peer-training - where one developer trains a few of their peers. By supporting this kind of activity as an 'add-on' to larger events, we seem to have hit on a way to deliver extremely cost-effective training to (and, importantly, by) the sector's developers (we've done some work to calculate the financial value of this). DevCSI, then, provides a channel through which the sector, represented by JISC and UKOLN, can invest in its developers. In recent years, JISC has invested in some development programmes based around an approach labelled Rapid Innovation. Rapid Innovation, in this context, described an approach of investment in small, short, cheap development projects designed to 'scratch an itch'. There was more than an echo of the Agile Manifesto in this approach. The Rapid Innovation projects tended to show the following characteristics:

  • they brought developers more to the fore
  • they produced lighter, more frequent documentation
  • they produced working code very early in the process
  • they involved end-users directly, and throughout the project

The early work of DevCSI has been informed by this work - notably in the increased awareness of adoption of agile development methodologies. So why is this important? The radical changes currently being introduced to the economic and political landscape around higher education in the UK are forcing universities and colleges to re-examine themselves as 'businesses'. With the growing interest in commodified hardware and software and remote software as a service (SaaS) options for service delivery, HEIs need to examine how they can best exploit these opportunities. (The JISC's Flexible Service Delivery Programme has been established to help institutions in this). While HEIs will have differing levels and types of interest in what are being referred to as cloud services, they are generally going to be searching for efficiency-based savings. The value proposition of financial cost-reduction from using shared services is something which cannot be ignored by HEIs - but it seems to me that there are some things which need to be born in mind:

  • the biggest saving in cloud-based services is to the supplier, not the customer (although the supplier will pass on some of this saving)
  • this whole approach is not yet well understood - especially how SaaS sits with an 'enterprise' service oriented architecture (SOA) approach which is also of interest to some HEIs
  • some services can be outsourced more easily, or to greater benefit, than others

In The role of the central IT Services organisation in a Web 2.0 world, Joe Nicholls and David I Harrison introduce the useful characterisation of services being either chore or core. Making use of SaaS is a form of outsourcing, and outsourcing is a tricky thing to get right. There are arguments for outsourcing those things you have to do but have no special interest in (e.g. HEIs frequently outsource their catering operation). In the ICT service context such services might include the various administration systems which all HEIs need to operate (e.g. finance). These we might call chore services. However, another reason for outsourcing is a lack of capacity or expertise to deliver a service internally - whether or not that is the preferred option. Services which are core to the HEI's business might fall into this category occasionally - even if this is not ideal. In a recession, with drastically reduced funding, HEIs might see more core services become unsustainable - or indeed need to reconsider what is core in the first place. Normally, business decisions of this sort are not so simply binary, and some complex judgement will need to be made. Inevitably, the growing opportunity for outsourcing ICT services will be appealing to many HEIs - whether those services are outsourced to generic or specialist commercial suppliers, or to HE-sector-based consortia such as the Kuali Foundation. But outsourcing can introduce hidden costs. A lessening of control is one obvious concern. But a more insidious risk introduced by an enthusiastic embracing of outsourcing services is a temptation to start to regard the maintenance of local development expertise as a luxury. After all, if we're going to outsource our ICT, why do we need to retain technical staff and, especially, developers. ICT is just a  commodity, right? Well, no. I think it is a mistake to lose sight of the advantages that come from a local capacity to perform and deal with technical innovation. A local or 'in house' development capacity is a valuable resource in the normal run of things. In a recession, it is vital. The successful organisation will use a recession to examine its business and to change in order to be ready to fully exploit the economic recovery, when it comes. And large organisations are getting better at preparing themselves to be able to innovate internally or locally. Scott Anthony, who has worked with Clayton Christenson who coined the expression "disruptive innovation", lists some principles which inform an organisation's ability to engage in innovation:

  • Put the customer, and their important, unsatisfied job-to-be-done at the center of the innovation equation
  • Embrace the power of simplicity, convenience, and affordability
  • Create organizational space for disruptive growth businesses
  • Consider innovation levers beyond features and functions
  • Become world class at testing, iterating and adjusting

(I'm not entirely enamoured of the 'disruptive innovation' label - as my colleague Brian Kelly pointed out at the recent CETIS Conference, the HEI sector is receiving plenty of 'disruption' right now from political forces - certainly enough to encourage innovation!) In Whither Innovation, Adam Cooper of CETIS asks: "Could we leave innovation to the commercial sector and buy it in?". Answering his own question, he quotes  Cohen and Levinthal (1990) who introduce the term absorptive capacity, describing :

…a model of firm investment in research and development (R&D), in which R&D contributes to a firm’s absorptive capacity….

I see a direct parallel between outsourcing too much, and losing the absorptive capacity necessary to respond to change and to innovate to meet new challenges. In my talk to the JISC Innovation Group, I presented this diagram: change_management.jpg This diagram tries to express the role of the local developer to act as an agent enabling and supporting change in an HEI. The developer deals with the remote, outsourced ICT system at a technical level, becoming one route through which the HEI ensures it gets the best possible value out of this arrangement. Remote services are, nowadays, guaranteed to offer some sort of application programming interface (API) which allows the more technically capable customer to tailor the service to their needs, rather than simply being obliged to use an undifferentiated, default user-interface for example. Local developers are increasingly networked with their peers in other HEIs (not least because of the efforts of the DevCSI project), so they become quite powerful in being able to exploit commonly used remote services through the free sharing of knowledge, technique and even code. And because local developers are, in some case, embracing a more agile approach to development, they become the conduit through which the end-user expresses their needs to make the remote, shared service better fit their local, idiosyncratic needs. Developers can become surprisingly aware of 'business' processes and information flows through an HEI, as they have to deal with them at several levels (I wrote about experiences of this sort in a previous post, SOA and reusable knowledge). I see an opportunity for the DevCSI project to focus its efforts on this aspect of change within our HEIs. Change management is going to be crucial for HEIs as they redefine what is core and what is chore, as they decide what they can do best, and what can be best done for them by others. They are going to need a capable, knowledgeable and above all agile capacity to innovate to meet new business challenges and a changed ICT environment. I've taken to using the label responsive innovation to describe the act of dealing with or instigating technical change in a manner which advances the core mission of the institution. Developers are not the only part of the solution, but they are a vital part. Not only do HEIs need to hang on to their best developers, they need to invest in them, if they are to manage change and not be managed by the changes being imposed on them. Developers are core.


Institutions and the Web done better

Introduction - (warning - old-timer indulgence)

From the mid-nineties through to the end of 2006 I earned my living as a developer of Web applications, or as someone managing Web application development projects. I like to think I was quite good at it, and I certainly have a lot of experience. I worked with CGI writing in Perl and a little C, moving into ColdFusion and Java (via JServ - anyone remember that?), did the whole Java EE thing, undid it again, did SOAP because it was better than J2EE, undid that when we realised it actually wasn't…. In about 2002/3 I adopted a RESTful approach to building Web-based intranet applications - and some of those applications are, I believe, still being used. The idea that Web applications should be designed such that the functions flowed around the resources being manipulated, rather than the resources being moved about to enrich the functions, made absolute sense to me. I have not deviated from this general approach since then. In 2006, just before I joined UKOLN, I came across Tom Coates's ' Native to a Web of Data' presentation. ## A Web of data One slide in Tom's presentation really appealed: native_02.jpg Very recently I had cause to revisit this, and I began to wonder how this stacked up against current thinking. Over the last couple of years there has been a push to get Linked Data accepted by the mainstream, and there have been arguments over the extent to which this does, or does not, represent a tactic in advancing a Semantic Web agenda. I remain very skeptical about the likelihood of us realising a 'semantic' Web through the application of more and more structure, metadata, ontologies etc., and the aspiration toward a 'giant global graph' of data interests me little. However, even leading figures in the Semantic Web can be pragmatic - Tim Berners-Lee's ' 5 Stars of Open Linked Data' as reported by Ed Summers are somewhat less ambitious than the nine instructions of Tom Coates's Native to a Web of Data. I'm also jaded by the notion of the '(Semantic) Web Done Right'. The 'Web done right' is… the Web we have. That's the beauty of the Web - it works where many distributed information systems have not worked by taking a 'good enough' implementation of a really good idea and runs with it, at a massive scale. But, as ever, there is room for improvement - we can, and certainly should, aspire to a Web 'done better'. ## From documents to data The Web to date has been largely oriented towards humans manipulating documents through the use of simple desktop tools. Until relatively recently, this was mostly a read-only experience. However, it has been clear for some time that, when content is made available in some sort of machine-readable form, it lends itself to being re-used, especially through being combined with other machine-readable content. This echoes the experience of the document-oriented Web, where it soon became apparent that there was much value to be added by bringing documents together through linking. The data-oriented Web takes this a step further: the linking is still very important, but with machine-readable content in the form of data, the possibility exists to process the content remotely, after it has been published, to merge/change/enhance/annotate/re-format it. Recent years have shown how the Web can function as a platform for building distributed systems through the rise of the 'mashup' as an approach to building simple point-to-point services. ## The institutional context So, what does this mean for the Higher Education Institution (HEI)? The HEI tends to already have a large amount of Web content. An HEI of any size will also maintain significant databases of structured information. In more recent years, HEIs have adopted content management systems (CMS) of one sort or another, to manage loosely structured content. In some cases, such CMS systems are also used to expose structured information from back-end databases. It is still rare, however, for a typical institutional Web Team, using a standard CMS, to pay much attention to the sort of instructions listed above. HEI Web Teams tend to work in terms of 'information architectures', which often follow organisational structures primarily. Their tools, processes, and expectations from senior management make this the sensible approach. However this tends to mean that, periodically, the institutional Web site will be re-arranged to re-align with organisational changes. This approach to building an institutional Web site is driven by the imperatives of the document-centric Web. It's about trying to turn a large set of often very disparate documents into a coherent, manageable and navigable whole. The data-oriented approach demands a different approach. The following are a set of pointers to the shift in emphasis that is needed to allow HEIs to participate in the Web of data. These pointers are heavily influenced by Tom Coates's instructions, but I have condensed and rearranged them and tried to put them into the context of the needs of an HEI. ## How HEIs can engage with a Web of data

1. Recognise the potential value of the Aggregate Web of data and invest/engage accordingly

The cost of making data available on the Web is falling steadily, as technology and skills improve and the fixed costs of infrastructure are also reduced. The act of making useful data available on the Web does carry a cost, but it also introduces potential benefits. On a simple cost/benefit analysis, it is becoming apparent that we will soon be needing to justify not making data available on the Web. The 'loss-leader' approach, of making data available speculatively, hoping that someone else will find a use for it to mutual advantage, is one which becomes viable as the costs of doing so become vanishingly small. A lesson learned from open-source software, where the practice of exposing software source-code to 'many eyes' is proven to help in identifying and helping to rectify mistakes or 'bugs', is applicable to data too. As a general principle, exposing data which can be combined with data from elsewhere is a path to creating new partnerships and collaborative opportunities. 'Useful data' can range from the sorts of research outputs or teaching materials which might already be on the Web, to structured contact details for academics in an institution, to data about rooms, equipment and availability. As an example, some institutions already exploit one of their assets - meeting and teaching spaces - by renting them out to external users, especially during holiday periods of the year. Making data about these assets openly available, in a rich and structured way, opens up possibilities for others to better  exploit these assets, and for the HEI to share the benefits of this. In addition to this, we are witnessing a wholesale cultural shift in the public sector towards opening up publicly-funded information and data to the public which paid for it to be produced. The political momentum behind this cannot be ignored and, while it is focussed on central government departments currently, this focus will inevitably widen to include HEIs.

2. Start designing with data, as well as with pages

The typical CMS is geared towards building Web pages. All modern CMS systems allow content to be managed in 'chunks' smaller than a whole page, such that content such as common headers, sidebars etc. can be re-used across many pages. Nonetheless, the average CMS is ultimately designed to produce Web resources which we would recognise as 'pages'. An HEI's web team will continue to be concerned with the site in terms of pages for human consumption. However, simply by exposing the smaller chunks of information, in machine-readable ways, the CMS can become a platform for engaging with the Web of data. My colleague Thom Bunting describes such possibilities having experimented with one popular CMS, Drupal, in  Consuming and producing linked data in a content management system.

3. Develop websites for end-users, developers and software processes

This is a very important principal, and one which is frequently overlooked. Sites which are designed to allow humans to navigate pages are not necessarily accessible to software which might be able to re-process the information in new and useful ways. Widely adopted standard re-presentations of content, such as RSS feeds, have gone a some way to mitigating this. But the principal of designing for these different types of user up-front is one which is not yet widely accepted. Developers, especially, are not yet generally regarded as important users - yet for the Web of data to deliver value to data publishers we require developers to build new services which exploit that data. If you make your data available for re-use, it makes absolute sense to consider the needs of those developers you hope will try to exploit it. A perceived problem with this is that it seems expensive - to develop web-sites for different classes of user in this way. After all, the HEI's web team will already be considering several different sub-classes of human user (students, staff, prospective students, alumni etc.). Human end-users will continue to be the priority audience. However, there are strategies for developing websites in such a way that developers and software are not 'disenfranchised'. An approach which marries these concerns at the beginning, rather than a bolt-on approach of extra interfaces (APIs) for developers and systems is, preferable if a common ' anti-pattern' is to be avoided. The meteoric rise in popularity of Twitter is in no small part due to its developer-friendly website. Twitter has a simple Application Programming Interface (API) which allows developers to build client applications which use the Twitter service but which add value in some way to end-users. This graph at ReadWriteWeb shows how applications built by t?hird-party developers account for mote than half of the usage of Twitter. Some important principals which, if followed, will ensure a website is 'friendly' towards end-users, developers and software are points 4, 5, and 6 below:

4. Identify the important entities and make them addressable, using readable, reliable and hackable URLs

This is crucial - it forms the most important foundation for the Web of data. A traditional, well-designed website will be based on some sort of understood 'information architecture', however simple. The idea of starting with important 'entities' and making sure that they have sensible, managed and reliable identifiers is a somewhat newer approach, yet this is vital for the Web of data to function. The Web of data is, at one level, entirely about identifiers and how they link together. The ability to create ('mint') new identifiers and manage them carefully such that they as usable as possible is a capability which HEI Web teams will need to recognise is important. Identifiers for entities about which the HEI cares will become valuable in their own right in the Web of data. It is already understood that ownership of 'domains' in the Web address space can have value. In the world of business, Web domains change ownership for large sums of money frequently. In the UK the value of the HEI's '.ac.uk' domain is largely connected with reputation. Breaking this down: - identify the important entities: e.g. courses, units, departments, staff, papers, rooms, learning objects, lectures…. etc. - make them addressable: give them URLs. For example, if it's a course, mint a URL which points to a unique resource representing that course. - using readable URLs: make the URL intelligible to an end user. If it's a URL pointing to a course, then a URL which has the word 'course' in it will help. - using reliable URLs: manage the URLs you mint, and ensure that they are persistent. - using hackable URLs: make the URLs predictable and consistent, such that a developer can figure out the logical structure of the URLs and the underlying information architecture. As with 'readable URLs' above, do not be cryptic in URLs if this can be avoided

5. Correlate with external identifier schemes

Don't mint your own URLs for things which have been identified elsewhere. Linking to authoritative identifiers is what will create the critical mass in the Web of data - this is diluted every time someone mints a new URL to point to something already identified with a different URL. This aspect of re-using identifiers is explored in Jon Udell's post The joy of webscale identifiers.

6. Consider individual entities, and lists of entities in Web design

URLs can be for lists of related entities, as well as individual entities. All other guidelines apply to this use of URLs. Lists of things are pretty fundamental to the Web (or just about any information system).

Conclusion

If I wanted to abbreviate this even more into three brief instructions, they would be: - think in terms of information entities, identifiers and relationships, as well as pages - integrate into the wider Web by re-using existing identifiers and by linking to other information - realise that developers are a potentially important stakeholder in any modern website


JISC CNI Meeting, Edinburgh

I've been at the excellent JISC CNI Meeting in Edinburgh these last two days. Lots of interesting work being described and met some great new people. Some people have asked me to post my slides, so here they are:

JISC CNI Meeting, Edinburgh 2010 from Paul Walk

OR10 Challenge

In case you missed it, the OR10 Developers Challenge is now live!

Andy McGregor has explained why he thinks you should enter the challenge and, I'm pleased to say, there have been some expressions of intent already. If you do decide to enter, please register your intention on the OR10  Crowdvine forum.

A reminder of the challenge:

Create a functioning repository user-interface, presenting a single metadata record which includes as many automatically created,useful links to related external content as possible.?

We had one comment suggesting that the challenge was limited to dealing with Linked Data - this is certainly not the case - we are interested in linking in its broader sense.


Draft OR10 Challenge idea

Please note that what follows is a draft. A few weeks ago I posted some thoughts about a Developer Challenge for OR10, with a plea for ideas for specific challenges. I'm pleased to say that this post got a really good response, with plenty of useful ideas and comments. Thank you to all who responded. I think it fair to say that all of the comments influenced our thinking, but the interest in linking content ( most fully expressed by Andy Powell) stood out from several comments, so we have concentrated on trying to create a challenge around the this. While linked data was mentioned often (naturally enough), we wanted to stick to our principle of involving non-developers (or users) as much as possible: this can be difficult when dealing with the more esoteric aspects of linked data. So, after some discussion within the DevCSI team, we have worked up the following challenge:

Create a functioning repository user-interface, presenting a single metadata record which includes as many automatically created, useful links to related external content as possible.

Definitions: - "functioning" in this sense means that mockups/screenshots are not sufficient - however a working prototype is OK - "related" in this sense means that the external content is related to this particular metadata record in some way. - "as many useful links" means that marks will be awarded for useful links, so an interface with fifty meaningless links does not beat one with three genuinely useful links! - links must be related to content, not just a system. So, for example, a link to the page at http://www.wikipedia.org is not legitimate, but a link to a specific page in Wikipedia could be. Only one link of each 'type' counts: i.e. having four links to URLs which reference ‘topics’ in a given system is fine but will count as one link for the challenge.

Rules: Entries must come from a team of at least one developer and one person representing users. The entries must be presented, in person, at OR10. If a team is responsible for the entry then not all of the team members need be present at OR10, but at least one team-member must be. Judging: The entries will be presented/demonstrated at OR10 in a show and tell session in a room dedicated for this. The show and tell will be open to OR10 delegates to come along and see the presentations as they are being made. These presentations/demonstrations will be video-recorded. There will be an opportunity for those delegates present (the 'audience') to ask questions and/or comment on the presentations. There will be a panel of judges who will observe and make notes. The judges will take note of the responses from the audience. Following the show and tell, the judges will privately discuss the entries and draw up a shortlist. The videos of the shortlisted entries will be presented at the conference dinner for the assembled delegates to vote a winner and a runner-up. The judges will particularly take into account the following: - functionality - the links must work and must have been created automatically as part of the repository system - usefulness - the usefulness of the links to an end-user of the developed interface must be demonstrated - number of links - the number and variety of links will be considered - audience reaction - favourable and unfavourable reactions for the audience will be taken into account

General points: The Challenge will be issued well in advance of the conference, giving people plenty of time to develop an entry. We will make facilities available at OR10 - such as a Developers' Lounge area, for further work to be done at the conference itself. We are very interested in any comments people may have about this - we intend to publish the final version of this, and open up the Developer Challenge, at the end of this week.


Ideas for the OR10 Developer Challenge?

Update: I have closed comment on this post now. Thank you very much to all who commented and suggested ideas for a challenge. I have now posted a draft Challenge here and would welcome comments on that post. Thanks again!

Through the JISC-funded DevCSI project, UKOLN has been asked to arrange a 'Developer Challenge' for the Fifth International Conference on Open Repositories, (OR10) to be held in Madrid in July of this year. This will be the third consecutive year that the Developer Challenge has been a feature of this conference. Previous challenges have been both competitive and creative. OR09_dev_challenge.jpg Photo by Graham Triggs

This year we have been considering doing something slightly different. Previously, a general challenge has been issued, inviting developers to submit prototypes for anything which they feel is relevant and useful to the repository community. But now that the community has a better appreciation of the sort of creativity which developers can bring to these events, we wonder if we might try something a little different. A general challenge? We have been thinking about the possibility of the repository community issuing a particular challenge to the developers planning to attend OR10. This could be decided on by the community well in advance of the conference. If we managed to 'crowd source' a few ideas, we could organise a simple vote. Something we are trying to do more with the DevCSI project is to get developers together with non-developers from the same 'domain' (repositories in this case) - so we are quite interested in pursuing this approach with OR10. The OR10 organisers have helpfully couched the conference itself in terms of some challenges:

In a world of increasingly dispersed and modularized digital services and content, it remains a grand challenge for the future to cross the borders between diverse poles:

  • the web and the repository
  • knowledge and technology
  • wild and curated content
  • linked and isolated data
  • disciplinary and institutional systems
  • scholars and service providers
  • ad-hoc and long-term access
  • ubiquitous and personalized environments
  • the cloud and the desktop.

Perhaps one or more of these could serve as the inspiration for a more concrete developers challenge? What this boils down to is finding a challenge in the general area of repositories, recognised as important by the community generally, which could only be met by getting developers to work with non-developers at the conference. For it to be fair, the challenge would need to be non-specific with regard to any particular repository software. I would welcome some feedback:

  • is this general approach a good idea?
  • do you have any ideas for a challenge?

please feel free to comment here if you have any ideas, or alternatively drop me an email at p.walk@ukoln.ac.uk. Thanks!


An agile approach to the development of Dublin Core Application Profiles

I have been asked to provide a position paper for next week's Future of Interoperability Standards meeting hosted by CETIS. This blog post is one I have been meaning to write for ages so I'm offering it as a position paper of sorts.

UKOLN has been charged by JISC with the task of supporting the development of Dublin Core Application Profiles (DCAPs) in a number of areas. While I have not (so far) had much direct involvement in this work I have developed, over the last year or so, a real interest in the process of developing these.

The development of DCAPs is governed through the application of the Singapore Framework for Dublin Core Application Profiles. In this document, the concept of the application profile is explained thus:

The term profile is widely used to refer to a document that describes how standards or specifications are deployed to support the requirements of a particular application, function, community, or context. In the metadata community, the term application profile has been applied to describe the tailoring of standards for specific applications.

The requirements for an application profile to be legitimately termed a Dublin Core Application Profile are defined within the Singapore Framework. In brief, a DCAP is a "packet of documentation" which includes the following elements:

  • Functional requirements (mandatory)
  • Domain model (mandatory)
  • Description Set Profile (DSP) (mandatory)
  • Usage guidelines (optional)
  • Encoding syntax guidelines (optional)

This seems mostly sensible although I have not been party to much of the discussion around the Singapore Framework and so I have never entirely appreciated the purpose of or need for the Description Set Profile (DSP). In passing I will note that it seems to me that the DSP could be optional rather than mandatory, and that the Usage Guidelines should be mandatory rather than optional.

According to the Singapore Framework web page, "there are no stable, published examples of full-blown application profiles that conform to these guidelines". With one exception, Scholarly Works Application profile (SWAP), it is difficult to find any examples of DCAPs which are close to being realised. SWAP was developed for the most part at UKOLN so I have an interest in seeing it adopted; however to date we have seen no actual usage of this DCAP.

I come from a background of software and service development, rather than standards development. For this reason, the development of application profiles is more appealing to me than is standards development per se, as I expect to be able to apply my experience and skills more readily to work which is aimed at supporting "specific applications". It is natural for me to measure success in terms of usage. This means that I take usability seriously, and tend to focus on users and their responses.

Early in 2009 I began to notice a few things about how DCAPs such as SWAP were expected to be developed. It seemed to me that usability was not a stated priority. As, I think, a consequence of this, there is little attention given to testing the usability of DCAPs within a context involving users and applications. It does seem that DCAPs are expected to be tested for conformance to the standard, for internal cohesion and logic in terms of the underlying information model, and even for theoretical satisfaction of functional requirements, but if the DCAP has not been tested for usability before it gets to this point then it is at high risk of failure. It was also apparent to me that users, even experts in the domain for which the DCAP was intended, might struggle to be able to appreciate, test or criticise the DCAP documented according to the Singapore Framework, unless they had relatively rare information management knowledge and understanding.

At UKOLN, I got together with some colleagues and proposed that we consider a more Agile approach to the development process. I use the term Agile in the sense in which it has been applied to software development in recent years. A key feature of Agile development is that it allows the development of not only the solution, but also the requirements, in a highly iterative process. Agile development tends to favour working solutions over future capabilities and encourages near-continuous engagement with users during the development process, responding to changes in functional requirements as both the developer and the user increase their understanding of the problem space. I wondered if we couldn't devise some tools and techniques which would allow the early development stages of DCAP development to be done iteratively, with close engagement from prospective end-users. The following is a description of what we have developed so far.

An Agile approach

montage.jpg

In order to focus on usability in the development of DCAPs, we realised that we would need to introduce a methodology which would allow us to frequently test what had been developed so far against user-requirements and understanding. Borrowing again from software development, we decided to adopt a rapid prototyping method, where we would give prospective end-users the means to quickly assemble information models which made sense to them in the context of their requirements. Some of our early experiments were in the domain of scholarly works because we have a particular interest there. Our method therefore relies on being able to assemble small groups of prospective users to participate actively in the development process.

We have observed an issue with users’ engagement with application profiles. Application profiles are, essentially, intangible - users cannot interact with them directly. For many users, this presents a very real barrier to engagement. Even if formal documentation such as Description Set Profiles (DSP) is developed during the development iterations, it tales a certain kind of user with a particular interest to engage with these. Many users need to see the sort of system interface which they will ultimately be using in order to contribute feedback on the development of an AP. We have developed two approaches to making DCAPs tangible, paper-protoyping and a flexible user interface tool for information modelling.

In early stages of requirements gathering, a paper-prototyping approach has shown real promise as an accessible method for eliciting requirements from groups of users. This has the advantage of being potentially very free-form, such that the developer’s unconscious influence on users’ contributions is reduced. Users are encouraged, collectively, to develop their own understanding and to model it. You can read about this in more detail in Emma Tonkin's paper: Multilayered Paper Prototyping for User Concept Modelling: Supporting the Development of Application Profiles.

One limitation of paper prototyping comes from this very free-formedness: it is difficult to correlate the outcomes of a free-form modelling exercise with the outcomes of other similar exercises. For this reason, we have developed a second stage development tool which uses software to structure and, crucially, record, users' engagement with the developing application profile.

Our software for allowing users to experiment with modelling their domain is MrVobi. Below you can view a short video of it being demonstrated on an interactive whiteboard:

Users are encouraged to use this tool to create and restructure entities and attributes through a user-friendly and intuitive interface. The user interface is is connected to a web service which records every decision, and which can hold and serve up pre-recorded models so that users can start from an advanced position in a given session.

As we move users from the free-formed to the more structured interfaces, we can start to gain an important benefit. By recording the decisions that individuals make about the information model, we can aggregate these so that, theoretically, we can start to assign a level of confidence to the decisions which are eventually made about the application profile. For example, we can say something like "this attribute belongs with this other in this entity, and 71% of our test users from this domain agree with this".

As an application profile becomes more developed, it can be presented to users for testing through this same interface. This means, importantly, that an application profile can be treated as something more dynamic. As a domain changes over time, with shifting aspirations, challenges and issues, so the application profile can be re-assessed in terms of its usability in a changing context.

A concern which we identified early in the development of these processes and tools was the fact that the tools influence the testing process: when a user gives feedback, they are to some extent commenting on the artificial interface as well as the application profile. The paper-prototyping patly mitigates this, as does the simple fact that we don't rely on a single interface. Within the very real constraints of users' patience and available time, the general approach is to introduce as many types of interface as the user can bear so that biases based on the idiosyncrasies of specific tools are gradually cancelled out.

To bring this back to the Singapore Framework: we believe that we are evolving an effective process to develop several parts of the 'package' - the functional requirements, the domain model, and the usage-guidelines. We believe that if these are developed with frequent recourse to user-testing, then the resulting DCAP will be more robust, and more likely to be adopted. We think that we can build into the process an aspect of evidence gathering to allow to make assertions about the resulting DCAP which are based on a certain degree of confidence.

This is very much a work in progress. We have experimented with the paper-prototyping approach with a number of different groups, and in more than one domain, with some very interesting results. We ran an interactive workshop at last year's International Conference on Dublin Core and Metadata Applications using the MrVobi software which was very well received (this was informed by a presentation which is also a useful overview). We have received strong encouragement from the Dublin Core Metadata Initiative to continue to develop this approach and are now considering how how we might take this work forward in 2010. Any comments are welcome.

Note

This work has been been the result of collaboration within UKOLN. Special mention should be made of Emma Tonkin's efforts which have been crucial in a number of aspects of this work. Others at UKOLN who have contributed are Andy Hewson, Talat Chaudhri, Mark Dewey, Stephanie Taylor, Julian Cheal and Tom Richards.


Direction counts!

I took advantage of an offer to upgrade my iPhone 3G to the 3Gs model just before Christmas. I spent some time considering the alternatives, and speculating about what might become available during the next eighteen months of my new contract, but I've been more than happy with the 3G so my decision was quite an easy one. The 3Gs offered three main improvements over the 3G:

  • a faster processor
  • a better camera
  • a 'compass'

At first glance, these improvements seem quite modest. But, as we shall see, they add up to something quite significant.

The feature which attracted me mostly was the better camera. People talked about the paltry 2MP camera on the iPhone 3G but to be honest it wasn't the resolution that was the problem - 2MP is actually adequate for the sorts of pictures I want to take with a pocket camera. The problem with the camera on the 3G was that it was just a rotten camera. I had a better camera in a Sony Clie PDA some five years ago. The camera on the 3Gs is, indeed, better than that on the previous model. It's not great, but it is just about usable.

The surprise for me is the impact of the other two features. The faster processor was firmly in my 'nice to have' category - a welcome improvement but not especially important to me. Once I tried the new model however, I quickly realised what a difference this has actually made. With the previous model, I had attributed a lack of performance in certain applications to network latency. Essentially, I believed that a few apps were simply a little too advanced for the prevailing networks to serve them well. A good example of this was Evernote, an app which seemed promising but was just too sluggish on the 3G to be very useful to me. On the newest iPhone however, Evernote really flies, and network latency does not often impinge on it's usability. Having a snappier user interface is always nice - but the 3Gs is so much more responsive as a result of its faster processor (and presumably its increased memory).

I had assumed the compass was, effectively, a gimmick. I could see how it would be occasionally useful to orient myself when using the GoogleMap application for example. But over Christmas I started to play with some of the many astronomy apps available for the iPhone. Several of these take advantage of the iPhone's built-in GPS receiver and compass, allowing the screen to show the night-sky exactly as it appears to the user based on their location and the direction they are facing. This allowed me, for instance, to identify and point out Jupiter to my actually-quite-impressed-for-once family. Direction counts!

What the iPhone 3Gs offers to its applications is a sense of location and direction. Combined, these properties can afford a powerful new functionality.

During 2009 there was a little buzz about _ augmented reality,_ with apps such as Wikitude appearing for Android and iPhone, superimposing text and images over real-time views of the physical environment. While I try to avoid predictions for the new year, I'm confident that augmented reality apps will continue to develop, and will become more interesting, during 2010. All of the hardware ingredients - a fast processor, a decent camera, GPS + compass, are present in the iPhone 3Gs. I'm looking forward to what develops as a result.

Coincidentally, my good friend Peter just alerted me to an application called Star Walk. It doesn't do anything that several other apps don't also do, but it does it so beautifully. Like all Apple products, aesthetics count for much with the iPhone. When I fired up Star Walk I had a sudden thought - that the reality had just caught up with the aesthetics of mainstream science fiction. If you have an iPhone 3Gs, I recommend you spend the £1.19 for this application, if only to admire the way it looks.

Happy new year!

IMG_0003.PNG IMG_0004.PNG IMG_0003.PNG


An infrastructure service anti-pattern

Last week I outlined an idea, that of the service anti-pattern, as part of a presentation I gave to the Resource Discovery Taskforce (organised by JISC in partnership with RLUK). The idea seemed to really catch the interest of and resonate with several of those members of the taskforce who were present at the meeting. My presentation was in a style which does not translate well to being viewed in a standalone context (e.g. on Slideshare) so I have decided to write it up here. I would very much welcome comments on this. (The presentation will be published on the Resource Discovery Taskforce pages and I will ask for this post to be linked to from there when it does appear).

The following diagram is meant to represent a design 'pattern' which I have seen often proposed, and sometimes implemented, in the JISC Information Environment (IE) as well as in the wider higher education (HE) sector in general:

anti-pattern.gif

It is my belief that readers who have been involved with the IE for some time will recognise this, at least in a general sense, if not in specific cases. In this arrangement, an aggregation of data is presented to the end user, through the development of a user-facing application or service. The user-facing service will in almost all cases be a web-interface, somewhat similar to the ‘portal’ concept of old but in a centralised, single, global deployment. Because it is generally accepted to be desirable to make such data available to other services (in keeping with the larger goal of interoperability through open standards), one or more machine interfaces or so-called APIs, giving access to the 'backend' of the system, will be offered. What this design pattern aspires to is a service implemented to be both user-facing service and machine-facing infrastructure component.

However, I contend that this is, in fact, what software engineers might call an anti-pattern. An anti-pattern is a design approach which seems plausible and attractive but which has been shown, with practice to be non-optimal or even counter-productive. It's a pattern because it keeps coming up, which means it's worth recording and documenting as such. It's anti, because, in practice, it's best avoided….

There is much which is implicit in this pattern, so I will attempt to surface what I believe are some hidden assumptions in a new version of this diagram: this is what this design pattern, once implemented, reveals:
anti-pattern-extended.gif

In this second diagram, the orange colouring indicates the parts which actually get built and are supported; the yellow indicates the parts which might get built, but which won't really be supported as a service - in a sense, this is stuff which is believed to work but actually doesn't; in the case of the users, the yellow colouring indicates that their demand for this service is believed to exist; those components in the diagram which are neither orange, nor yellow, are the product of little more than speculation. In the end, the investment in creating a user-facing application based on an expectation of future demand which doesn't materialise is wasted while, at the same time, the investment in providing unused machine interfaces is also wasted.

I believe that this design pattern rests on several assumptions which are actually fallacies, and is, therefore, an anti-pattern.

Fallacy 1: “Build it and they will come”:

While infrastructure services can, indeed should, be developed with future opportunity in mind, it is helpful to have an existing and real demand to satisfy, which the new development addresses. If the service is demonstrably useful to users, and is developed effectively with future opportunity in mind, then there is more chance of the service actually working, and of it being attractive to developers working on future opportunities.

Fallacy 2: Interoperability through additional machine interfaces:

Machine interfaces need as much specification, development, testing an maintenance as user-interfaces. Simply making a machine interface available through the adoption of a platform which has a built-in facility offering some standard interface is not enough. A system which proposes to offer three or four APIs is quite likely not going to support any of them adequately. I have argued before that ' interoperability is not enough': in fact, this arrangement does not often lead to interoperability, let alone actual exploitation of the capability to interoperate.

Fallacy 3: People/organisations who can make good infrastructure are also going to be good at building end-user-facing services (and vice versa):

Effective infrastructure supports services which in turn support end-users. The skills and knowledge required to support service-providers are generally quite different from those needed to deliver good user-facing services.

I call this the infrastructure service anti-pattern because the result comes from conflated requirements to deliver both infrastructure (machine-to-machine interfaces) and compelling user-facing services and applications. The result can be something which satisfies neither requirement. The users, requirements and priorities are often completely different between these two problem spaces. I suggest that the following are some possible reasons for this anti-pattern appearing:

  • funding (naturally) tends to follow services, happy users and, importantly, new features.
  • funders like to see their investment showcased
  • infrastructure is mostly invisible making it hard to ascertain impact from users

Proposals for alternative design patterns

Here is a suggested alternative design-pattern:
better-pattern.gif

In this design pattern, the API is developed before any user-facing application, or at least in parallel. An application is developed to exploit this API based on real users requirements. No service is developed until such requirements can be identified. This means that an API will be developed, and it will be being used in at least one case. Opportunities for third party integration for usage of the service are, ideally, identified beforehand. The API is properly supported from the start, or else the service fails completely. The value proposition being offered for further, opportunistic third-party developments, whether real or imagined, is now real and, crucially, supported.

An interesting alternative to this is the approach of combining the user-facing web pages and the machine-actionable API into one interface, through embedded RDFa for example:
better-pattern2.gif

It remains to be seen how this approach is going to work out over time, but we have seen hints of simpler approaches to combining user and machine interfaces in the past, such as RSS being styled to give a decent human-readable interface, or earlier attempts to do interesting things with XHTML.

I wonder if readers agree that the first diagrams represent an anti-pattern which they recognise. And would the proposed alternatives fare any better?


IDCC09

There is still time to register for this year's International Digital Curation Conference in London, although you will need to be quick - I'm told that registration closes on the 25th November.

This year's conference (the fifth), organised in partnership with the Coalition for Networked Information, has the theme Moving to Multi-Scale Science: Managing Complexity and Diversity. It promises to be an interesting event - see the full programme for more details.

You can keep up with developments in advance of the event itself by reading the Digital Curation Blog (see this particular post for example) and/or following on Twitter etc. using the tag '#idcc09'.


Linked, open, semantic?

During an interesting session called the 'Great Global Graph' at the CETIS conference this week I formed the opinion that, in the recent rush of enthusiasm for 'linked data', three 'memes' were being conflated. These next three bullets outline my understanding of how these terms have been used in recent discussions, including the CETIS session:

  • Open data: I see this as something expressed as a philosophy or, in more concrete terms, as a policy, such as that espoused by the UK Government. There are aspects of public ownership in this, but also a philosophical approach based on 'openness' and a rejection of the economic idea of value in scarcity of information. I think that specific technology does not come into this really: for example one concrete realisation of this policy in the UK is the Freedom of Information Act under which it is perfectly permissible for a data owner to supply data in any reasonable format and medium. Essentially, I generally take 'open' to mean accessible to all, notwithstanding conditions of use.
  • Linked data: This one is trickier, as the term is used in quite a precise way by some proponents, based on the principles of linked data form the W3C. There are others who prefer a looser definition. There have been some well-reshearsed arguments about this, which generally come down to whether or not RDF is a pre-requisite of linked data. I've become inclined to use the term in its more precisely defined sense, in recognition of the efforts going on in this space.
  • Semantic Web: This term introduces 'semantics' into the mix, by layering on ontologies allowing inferences to be made from the data itself.

It seems that these terms are often used together in the same discussions, and I suspect I could benefit from some separation of concerns in some of these discussions. It seems to me that the following are true:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked

Option 1 satisfies, in part at least, the drive to make available to the public data which has been paid for by the public and which might be useful to it. There are those (and I count myself among them) who generally believe that at present, for example, it would be better to quickly make the data open in some useable form than to delay this unduly while it is processed into RDF. However, there is a reasonable case to be made for not polluting information spaces with poorly prepared datasets.

Option 2 is an approach for organisations which want to take a more resource-oriented approach to managing and exploiting internal information assets. In the CETIS session an interesting idea was floated around how such an approach might go a long way to helping organisations address data-quality issues.

Option 3 seems increasingly viable. There is value in the 'linked' aspect, regardless of whether or not semantic layers are introduced. This is how the Web works after all, and much of the impetus behind Web 2.0 seems, to me, to have come from a healthy mixture of addressable and accessible information and human-mediated convention (e.g. 'hackable URLs). Perhaps this is the 'Great Global Graph' and it's just a matter of scale?

I'm very open to comment and argument on any of this. Perhaps I'm worrying unduly about these things being mixed up, but I do sense that this space could benefit from some clarity to match the excitement and endeavour.


Not ready to wave goodbye to email

Last week I posted a remark on Twitter:

Can't help thinking that the idea that Google Wave will replace email rather misses the point….

The first response to this echoed my view on this suggesting that the real nature of Wave is rather harder to explain or understand, and implying that people fall back on a frame of reference with which they are comfortable. It certainly looks as though Google have anticipated this and offered some easily digested marketing messages. However, I also saw responses which suggested that some people still seem to be missing the point. One response insisted that Wave would only be successful if it was ‘integrated’ with email. I must confess that I still don’t understand this - I can’t really imagine what impact an integration between Wave and email would really have.

It seems to me that Wave is an ambitious attempt to exploit the idea that one future for the Web lies in social networked activity clustered around shared artefacts. Such artefacts, often what we still call ‘documents’, have been given the useful label social objects. At the centre of a Wave is a social object, with a series of applied and recorded operational transforms. Wave would therefore seem to be primarily about collaboration, as opposed to email or IM which are primarily concerned with messaging. Another way of looking at this would be to suggest that Wave is 'object-centric', as opposed to email which is message oriented with a facility to attach auxiliary objects.

The idea that Wave would replace email seems to be suggesting that we won’t need apples anymore because now we have oranges. This is not to say that Wave might not better fit some use-cases currently served by email - such as the problematic mode of collaborative editing of documents by sharing copies sent as email attachments. But even as we adopt better software for collaboration, there's not much sign that we're giving up using email. I don't know about you, but my email inbox isn't getting any smaller just because I use Google Docs, IM, Twitter…. Email has been tested quite thoroughly now over a few years, and appears to work quite well for asynchronous messaging!
Wave uses XMPP as its underlying protocol which is both interesting and important, but it is also slightly misleading as it implies an important connection with 'instant messaging, which I think is illusory and unhelpful.

Wave is possible because the barrier of network latency is gradually being reduced. Real-time collaboration across the global network is now viable for many. Of course Wave is not the only game in town - other interesting approaches (mostly also using a variation on the pubsub paradigm) to the real-time Web, such as pubsubhubbub are being actively developed and experimented with. But Google Wave is important - because it's Google who are doing it. It will gain a lot of publicity, and will likely play its part in driving a culture change allowing real-time collaboration across the global network to 'go mainstream'. It should be remembered that Google's Gmail, the poster-child for Web-based email, is still significantly smaller in terms of users than Yahoo and Hotmail.

Because Wave offers APIs to developers and users out of the box, I think it is going to be difficult to say what shape this new offering from Google will take once a significant number of people are using it. The ability to federate Wave services could be significant in this respect.


JISC Rapid Innovation Event

Manchester City FC Ground I have just spent an interesting and inspiring 24 hours at the JISC Rapid Innovation Programme meeting, which was organised by UKOLN (disclaimer: I work for UKOLN), and funded through the JISC-funded IE Demonstrator project. The venue chosen for the event was certainly an unusual one - the City of Manchester Stadium, home of Manchester City Football Club. I thought the venue worked very well for this event and would recommend it. The event was primarily aimed at developers from the JISC Rapid Innovation projects, but with a significant number of others delegates drawn from JISC programme management as well as the the wider developer community With this event, we decided to address an issue that has become apparent as JISC has started to engage more directly with developers in the HE sector: developers are often untrained, and sometimes not naturally disposed to explaining their projects to others, especially when those others are not themselves developers. So we hit on the (admittedly somewhat artificial) exercise of requiring a representative of each project to deliver a 45 second ‘pitch’ to the assembled audience, which was recorded to video. The project reps were then given a 15 minute consultation with one of a set of panels of three ‘experts’, led by people from media and communications backgrounds, where they reviewed the video of their pitch and discussed ways of improving upon it. These were held in the Stadium's executive boxes overlooking the pitch, which was pretty cool! The reps were warned that they would be required to deliver a new, improved 20 second pitch the following day…. This exercise was something of a gamble to be honest. We were confident that a significant number of the project reps would appreciate why it is important that they be able to clearly explain what their project is about in a few sentences, in terms that a wide variety of people might be able to understand. We hoped that the majority would be able to benefit from these exercises to the point where they could deliver a compelling, clear pitch for their project. The results, I’m really happy to say, were outstanding! The improvement over 24 hours was remarkable, and JISC now has a portfolio of clear explanations of the 40+ ‘rapid innovation’ projects, not to mention a group of developers better equipped to explain what it is they are working on. As well as a training exercise, this event delivered a series of ‘lightening talks’, panel sessions and ‘show & tell’ opportunities - a set of features which has become a staple of developer-centric events. Twitter was actively used as a back-channel to the event so you can get a small sense of what was going on from that stream. I also used this event to ‘officially’ launch the DevCSI project - but managed to cock up my presentation - by ‘losing’ my presenter’s display with all my notes. As I’d decided to go for a one word per slide approach for much of the presentation, this was a bit of a disaster for me. Oh well, I gathered some real interest in this nonetheless and some opportunities for events and other engagements. If you’re interested, you can read more about this on the project blog. I’d like to extend a big thank-you to everyone who came, many of whom stepped some way out of their ‘comfort zone’ to engage with the ‘pitching’ exercise. I’d especially like to thank Mahendra Mahey (UKOLN) who did most of the organising together with David Flanders (JISC), as well as the ever-professional UKOLN Events Team (Natasha Bishop and Michelle Smith) who seemed to work non-stop for 24 hours. Our army of professional bloggers was fantastic, offering expert advice on the pitches as well as conducting interviews, all of which have already been transcribed to the IE Demonstrator Blog, with a large number of the projects. David Tarrant (Southampton University) and Julian Cheal (UKOLN) provided excellent technical support, maintaining a networked news service which was displayed all over the venue. This event was a pleasure to be involved in - there was a great spirit of cooperation throughout which bodes really well for future events with the developer community.


No data here - just Linked Concepts

Over the years I've found the ' Semantic Web' to be an interesting though, at times, faintly worrying concept. It has never much impacted on my work directly, despite my having been embroiled in Web development since, well pretty much, Web development began. Of late I've tried to follow the earnest discussions about how the Semantic Web went all wrong because it was hijacked by the AI enthusiasts, and how it is going to be alright now because a more pragmatic paradigm has gained the upper-hand, that of Linked Data.

This post is my tuppence worth provoked by an interesting debate on Twitter recently which was kicked off by Andy Powell who has just blogged about it. It's worth reading Andy's post to get the details of this, but in essence, Andy asked if there was a term we could use for Linked Data where the RDF part is not required. This provoked a distributed argument between those who believe that the RDF model is integral to Linked Data, those who believe it shouldn't be, and those who Don't Really Care To Be Honest.

I found myself generally in agreement with Paul Miller who made the point:

Despite this undoubted progress, the green shoots of a Linked Data ecology remain delicate. By moving from a message that stresses the value of unambiguous and web-addressable naming (HTTP URIs), providing ‘useful information,’ and enabling people to ‘discover more things’ by linking toward a message that elevates one of the best mechanisms (RDF) for achieving this to become the only permissible approach, we do the broader aims great harm.

It seems to me that there has been progress over the years which a zealous insistence on RDF could jeopardise. I had thought about joining in and blogging about this, and then came across this comment from Dan Brickley via Rob Styles, which pretty much said it all I thought. He finishes with:

But we needn’t panic if people put non-RDF data up online…. it’s still better than nothing. And as the LOD scene has shown, it can often easily be processed and republished by others. People worry too much! :)

Quite.

But then I read Andy's post, in which he links to various people including Ian Davis in the Linked Data Brand. Right up front, Ian states:

This is not a technical issue and its not one of zealots or pragmatists: its a marketing and branding issue.

The term Linked Data was coined to brand a specific class of practices: namely assigning HTTP URIs to abitrary things and making those URIs respond with RDF relating the things to other things.

Here very few of the ‘things’ are documents, instead they are people, places, objects and concepts.

That deliberately excludes many other practices of publishing data on the web such as atom feeds, spreadsheets, APIs and even many existing RDF use cases.

Ah - so, It's the label which is important, because it denotes an important movement, led by Tim Berners Lee himself. Interestingly, it's concerned with a very small part of the general concern of making data available on the Web - actually it's not even about data per se - it's about linking concepts.

Ian goes on to say:

The Semantic Web community has been notorious for its poor marketing over the past decade. Now just when it seems the community has found the right balance between technology and mass appeal it feels like people are trying to rip away that success for their own purposes. That is deliberately emotive language because brands are all about emotion.

I have spent much of my career linking data on the Web, linking eLearning systems to Library OPACs for example. I have occasionally used RDF in the past and am working with it again now. I have used many other technologies. In the last few years I have seen the dawning of an understanding on the part of the mainstream of Web developers and users that this kind of thing might be useful and worth investing some time and effort in. I would argue that the most significant advance in linking data in recent years has been in the wide-spread adoption of cottage-industry XML formats in Web 2.0 mashups. I don't think people are trying to appropriate the brand, so much as resisting the idea that a term as generic sounding as 'Linked Data' could be owned by what is, in the scheme of things, a small group.

So if I decided to use 'Linked Data' to describe linking data in general - it certainly wouldn't be because I was jumping on a band-wagon - I think that the wheels came of that particular band-wagon years ago.

So that leaves us back at Andy's question. I'm happy to avoid winding up the Linked Data people by 'appropriating' their term but, then, what do I call it when I link data on the Web and I don't check Sir Tim's design issues first? Personally, I like 'Web of Data'. I've blogged about this before, but I still believe that this slide from Tom Coates's Native to a Web of Data presentation (which I suggested to Andy as part of the answer to his original question) sums it up best - I've had a print-out of that particular slide stuck up on my office wall for about three years.



Designed by Paul Walk