professional reflections

Ideas for the OR10 Developer Challenge?

Update: I have closed comment on this post now. Thank you very much to all who commented and suggested ideas for a challenge. I have now posted a draft Challenge here and would welcome comments on that post. Thanks again!

Through the JISC-funded DevCSI project, UKOLN has been asked to arrange a 'Developer Challenge' for the Fifth International Conference on Open Repositories, (OR10) to be held in Madrid in July of this year. This will be the third consecutive year that the Developer Challenge has been a feature of this conference. Previous challenges have been both competitive and creative. OR09_dev_challenge.jpg Photo by Graham Triggs

This year we have been considering doing something slightly different. Previously, a general challenge has been issued, inviting developers to submit prototypes for anything which they feel is relevant and useful to the repository community. But now that the community has a better appreciation of the sort of creativity which developers can bring to these events, we wonder if we might try something a little different. A general challenge? We have been thinking about the possibility of the repository community issuing a particular challenge to the developers planning to attend OR10. This could be decided on by the community well in advance of the conference. If we managed to 'crowd source' a few ideas, we could organise a simple vote. Something we are trying to do more with the DevCSI project is to get developers together with non-developers from the same 'domain' (repositories in this case) - so we are quite interested in pursuing this approach with OR10. The OR10 organisers have helpfully couched the conference itself in terms of some challenges:

In a world of increasingly dispersed and modularized digital services and content, it remains a grand challenge for the future to cross the borders between diverse poles:

  • the web and the repository
  • knowledge and technology
  • wild and curated content
  • linked and isolated data
  • disciplinary and institutional systems
  • scholars and service providers
  • ad-hoc and long-term access
  • ubiquitous and personalized environments
  • the cloud and the desktop.

Perhaps one or more of these could serve as the inspiration for a more concrete developers challenge? What this boils down to is finding a challenge in the general area of repositories, recognised as important by the community generally, which could only be met by getting developers to work with non-developers at the conference. For it to be fair, the challenge would need to be non-specific with regard to any particular repository software. I would welcome some feedback:

  • is this general approach a good idea?
  • do you have any ideas for a challenge?

please feel free to comment here if you have any ideas, or alternatively drop me an email at p.walk@ukoln.ac.uk. Thanks!

Comments

Peter, thanks for this - a nice, concrete challenge. I don't think it quite satisfies our requirement for the close involvement of non-developers for the OR10 challenge - but I have an idea where we might actually address this issue. Will be in touch!

In the meantime, I have posted a draft Challenge - your comments are welcome!


John, an interesting idea - and I too took note of that discussion on DC-ARCHITECTURE and thought it significant. However, I think that this challenge, while interesting, doesn't quite work with the need to closely involve non-developers.

In any case, I have posted a draft Challenge - comments very welcome!


Lorna - thanks for this comment, which we read as a vote for the challenge we have now drafted (draft Challenge).

The OER angle might throw up some interesting aspects….


Boon, thanks for this - intriguing idea. I don't think this will quite work within the logistical constraints of the OR10 conference and challenge. However, the DevCSI project (which is managing the challenge) works with all kinds of other communities, developers and events. If you're interested, get in touch and we can discuss this further.


Dan, thanks for this interesting idea. We have chosen to go down a fairly different path - see draft Challenge, based partly on what appeared to be wider interest from other commenters here.


Peter, I think this is an intriguing challenge. We've gone for something slightly different - see draft Challenge, but I imagine that some solution to the problem you describe could be incorporated to good effect to meet this more general challenge?


Yvonne - I share your sentiment completely! I'm not sure that your challenge quite fits the open repositories agenda though. Perhaps we should look for a different place to issue this challenge!


Thanks Andy - your suggestion is at the core of the draft Challenge I have just posted.

The other ideas suggested here are interesting. We have chosen to avoid getting too specific about Linked Data as it might restrict opportunities for non-developers to get involved - a point Hugh raises in the comments above.

As for the 'pingback' idea - which comes up several times in these comments - if you don't already know of this then you may be interested in the work which done in theJISC-funded Claddier project.


[…] few weeks ago I posted some thoughts about a Developer Challenge for OR10, with a plea for ideas for specific challenges. I’m pleased to say that this post got a […]


You are right Andy Powell…I agree with your reply..


Here's a challenge for you:

Counting the number or proportion of full text items (or equivalent) in ANY repository.

It's something people keep on asking for, but the software providers have tended not to provide the necessary tools for generating or harvesting the statistics. OAI-PMH cannot be relied on for this information either. (I can't comment on ORE.) We therefore ideally need a tool or application that is independent of both.

Those pesky metadata-only repositories skew the profile of open access resources, so this tool would a great help in advocacy and in prioritising harvesting operations.

Entries could be judged on speed (how many repositories they can gather statistics for in a set period, or how long it takes to process a given sample collection), and on accuracy (assuming we can find a list of repositories with known data).

It may not seem very sexy, but it would be damned useful.

Peter


It was reported on DC-ARCHITECTURE yesterday that purl.org has been having troubles scaling up, rejecting between 20 and 60 requests per second for DC terms with PURL-based URIs. As was observed in that email thread, this particular "incident" shines a light on the critical role that DC terms play in the Linked Data world, but more importantly it is a proverbial "canary in the coal mine" with regards to how schema will be provisioned on the Web of Data moving forward.

So: I'm wondering if a good challenge might be to create some kind of tool that could access whether a schema provider's cache control methodology is "well-behaved", and especially whether cache-control headers are set such that down-stream web caches will be optimally used?

Daniel Koller (@dakoller) has even speculated on a tool that cool compare the performance and availability of web-cached schema with non-cached versions


[…] En prévision du prochain Open Repositories: Ideas for the OR10 Developer Challenge? (source: Paul Walk, […]


Hi. It is quite a challenge to identify links (so good for a challenge!), but not sure where the non-developers come in.

If a resource is required to do it for RDF, then we have a Linked Data site for all the OAI I was able to harvest at oai.rkbexplorer.com (about 24M triples, 22GB of RDF) (as of last Nov). The links we have identified ourselves are in the sameas.org service. For example: A paper: http://sameas.org/?uri=http://oai.rkbexplorer.com/id/doc.utwente.nl/oai:doc.utwente.nl:48626 A person: http://sameas.org/?uri=http://data.semanticweb.org/person/carl-lagoze If there was such a challenge that generated good data, I would be delighted to add the data to sameas.org. Best Hugh


Following Peter's suggestion, I'd like to suggest development of two-way metadata protocols. OAI-PMH and OAI-ORE posit a one-way flow of data from data provider to harvester. Experience demonstrates that the harvester, which has much more exposure to metadata from many more sources, can often come up with useful metadata corrections – it just can't tell the source about them because the protocols are one-way and have no error-reporting components.

I would love to see this fixed, and I could see it impacting the progress of such initiatives as ORCID.


Kinda in this space, I noticed recently that arxiv.org supports "trackback" (as is used in many blogging platforms).

So, e.g., I wrote a post on eFoundations

http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html

which cited & linked to Herbert & friends' paper on Memento in arXiv

http://arxiv.org/abs/0911.1112

Because arXiv supports trackback, the arXiv page now includes a link to my eFoundations post.

(I honestly can't remember asking TypePad to ping arXiv when I wrote the post, but I must have done!)

Anyway, that seems like one mechanism by which a repository can be notified of "related stuff elsewhere", and based on that info it can provide new links for the reader of that arXiv page to follow. In this case, it was me asking my blogging tool to do the pinging, but it could just as easily be an automated process.


I think bringing users and developers together is an excellent idea. I also like Andy's suggestion of focusing on external links from repositories to the open web. This is something that the OER Programme is certainly very interested in. At the end of the programme there will be #ukoer materials, quite intentionally, scattered across the web, and we're very interested in looking at ways to aggregate and use them. I'll talk to Phil about this and see if we can come up with more concrete ideas.


If you're after a general challenge and want to engage non-developer active participation, the challenge should cut across conference topics, and relevant to both developers and non-developers. The areas which fulfil this criteria could be usability and user-centred design. There is a branch of usability that would just suit a 2-day challenge: Guerilla Usability - see http://bit.ly/bJZKka.

There are many plausibilities for such as a bash. You should make it fun, e.g. with a badge "Guerilla! Users vs Developers". Starting with existing UI/systems brought forward by developers, you can typically set the following end goals per system:

  1. improved usability
  2. feature enhancement (adding AJAX etc)

The objective isn't focusing on the best systems per se, but rather on the improved/enhanced aspects of the systems. The challenge is about getting users and developers together and see how they could meet the end goals together and creatively produce the mandatory evidence of usability evaluation, user recruitment and UCD approach. There is a variety of artefacts from guerilla usability (to evaluate which team, individual best meet the challenge): video diary, focus group/presentation, paper/low-fi/working prototypes, heuristic reviews/blogs.

There is even a "hallway" approach involving intercepting test users. May be the conference could warn attendees they may be 'hijacked' (in a game situation!). Social events could doubled up as user recruitment opportunity. Non-developers/usability experts who participate in the challenge may of course actively deconstructing systems and provide heuristic reviews / guidance.

Not sure if this would work.. but I'm just brainstorming.


I've been thinking a lot about authority in general lately. I believe the posting of pre-prints (articles before the peer-review process has taken place) in repositories can pose a serious long-term problem. People need to be able to recognize peer-review versus not-so-much. But peer-review is not infallible, as I'm sure we all know (see Mankind Quarterly). So we need something extra to help determine authority. So I will just throw this out there: if there were a way to tag all authors as having x amount of expertise in their field (garnered through citation rates, or whatever means), a sort of prestige index (I've even toyed with the idea of an ACADEME stock market of sorts), then this might go a long way toward helping students and other people determine what may be trusted.


Would strongly support the idea of linking the repository internal data with what is available somewhere else, especially the efforts to make these relations internal part of the repository data rather than leave it only to presentation-level and aggregation services.

Another shift in thinking in this way about repositories might be needed: from closed / managed repositories - to collaborative, web-enabled and trustful repositories. Indeed, if repositories are able to exchange relevant related information themselves, they could bring great benefit for quickly extending the network of data (e.g. something like repository-level pingbacks that additionally store the relations).


Also on "linked and isolated data"; one of the big challenges is getting linked-data-style metadata into repositories. We're still typing strings into metadata fields and hoping for the best. The challenge is to show how can slowly start to bring together your "Journal Article" with my "Article, Journal". I'd love to see what people can do with this challenge - normalizing OAI-PMH proxies that clean up data, crowd-sourced services to say that things are "same-as" etc.

Can the developers show us a repository where every metadata field has not just a string like "Walk, P" but a URI as well, and a way to match the URIs in other repositories?


Talking of linked data, there's an interesting attempt at an RDF-like interface here: http://adasdaughters.org/ (apparently is compatible with RDF but written in MySQL - I asked).

It would be good if someone developed a web interface that allowed people to display and edit RDF data.


FWIW, your suggested way forward sounds reasonable to me… but I guess the proof of that will be in the quality/quantity of challenges that come forward. Here's one suggestion… somewhat off the top of my head.

Starting from the phrase "linked and isolated data" (above) let's think about Linked Data for a moment (very much flavour of the month!). I'm going to focus on the 'linked' word and ignore technology issues around RDF and the like (because I think that is more useful as a general challenge).

Is it fair to say that most repository-based links are internal (i.e. they form links between things within the current repository)? I think so (though I am somewhat out of date on these things so please correct me if I'm wrong).

I think there is a useful challenge around making much greater use of 'external' links from repositories - both links to other repositories and links to other (more general) stuff on the Web. Such links might be

  • there is another copy of this paper in repository X

  • this author has contributed to the following works (held in repositories Y and Z)

  • other works on this topic can be found in repositories A, B, C, and D

  • this paper is related to this topic in Wikipedia

  • this paper is related to these resources in JORUM Open

and so on.

We usually think of these kinds of links being created and surfaced in some kind of centralised aggregator-come-portal-type service. But I think it would be much more useful if the links could be surfaced directly in each repository. The aim is to make the repository user experience much more Web-like - a follow-your-nose approach to scholarly resource discovery if you like.

??



Designed by Paul Walk