The week before Easter and we were slipping and sliding the icy streets of Birmingham risking a morning in Accident and Emergency all in the name of research data management. The reason? Attending a JISC Managing Research Data Programme (JISCMRD) event.
Titled “Achievements, Challenges and Recommendations” this was effectively an end of programme get-together for projects funded in the latest round of JISCMRD. We, of course, are not. Yet this event was significant to us because it was a sort of homecoming. Not only because I grew up some 40 miles (sorry, 64.4km) south of where the event was held, but also because prior to joining GESIS, I worked on a JISCMRD project, the snappy titled DMP-ESRC. Therefore, it is no exaggeration to say that without JISCMRD I would not be where I am today, and you’d not be reading this blog right now. But enough of parallel universes, after all, everything is for the best in this the best of all possible worlds.
Another reason this event was so interesting to us is that one of my last tasks on DMP-ESRC was to go to Birmingham and present the work we had done to our colleagues working on other JISCMRD projects. Consequently, this event was a good chance to not only be informed about the work done in the two years since the JISCMRD community last met in Brum, but to compare progress in the field. Now, I would love to do this here, but in what can only be described as a shameful and catastrophic example of research data management failure, I cannot find my notes from the 2011 event. What follows then is a mixture of summaries from this latest event mixed in with impressions and recollections from the 2011 equivalent (and if you are a representative of one of the projects I mention and feel I have grossly misrepresented your work then do let me know just how wrong I am).
The first thing to say is that JISCMRD itself is different. This latest wave of projects seems to have moved on from the infrastructure/training themes of its predecessors into a concentrated strategy of implementation of RDM support tools. Also, the event itself was smaller. Two years ago I remember a larger international representation present, this time less so. However, I feel this worked to the event’s advantage as the smaller number of attendees were focused on the JISCMRD programme itself rather than the research data management world at large, with a few of us ‘fellow travellers around to add a little flavour Consequently, less time spent on big names with grand messages and postcards from abroad (although we offered our own tasteful and informative postcard), and more on what people are actually doing and have done.
Having seen many of presenters in the support and guidance session at the IDCC13 event in Amsterdam, I skipped this session and sat in on the Institutional RDM systems presentations. This is an area that I know is important, but one of the areas I feel least comfortable as I come from a research and not technology background. Representatives from the RDTK Herts, iridium, KAPTUR, and DataPool project seemed to be saying to us (well, me) laymen (or is the correct term n00bs?) that researchers have a dependency on DropBox and (my interpretation here…) that dependency is somewhat akin to an addict and dealer. Researchers know it’s probably not good to be so dependent, and there’s probably something potentially nasty and dangerous about using DropBox as a research data storage and collaborative tool (and this report suggests so) but it’s so easy to use and once you’re hooked it’s impossible to wean yourself off. The problem was summarized best by RDTK Herts Bill Worthington: researchers have a predisposition towards independence (also mentioned by KAPTUR along with inconsistency in what constitutes good RDM practice) so whatever institutions make in terms of systems, has to be so good it’s a “no brainer” to use. This is, after all, a new area said Ben Adam from iridium, where the theory is much easier than the practice, and on top of that is the realization that whatever you do will be obsolete in five years. Not what the people with the money want to hear, but then again, when was the last time you used a memory stick?
After a break I took in the Repositories, Portals and Catalogues session featuring RD@Essex, C4D Metadata, DataPool, and PIMMS which provided useful strategies on metadata collection using DataCite as a basis and ePrints.
For the final session of the day, we found ourselves presenting alongside other projects with a training component. RDMRose started the session by inviting us to think of a movie that best represents research data management – an example of the great little tips you can pick up from others to enhance your own teaching. TraD highlighted an approach many training projects seem to adopt: a blended approach of introductory meetings, hour-long modules, reinforcing tasks, group meetings repeated over five modules and topped off with a concluding meeting. However, they found that “enforcers” were needed to push people into taking RDM training seriously. Jo Goodger spoke about RDM training for astronomers at Hertfordshire and just how common their concerns turned out to be, we can all relate to where to get data, where to keep it, how to keep it secure, how to preserve it, even if we can’t relate to the chemical properties of SDSS J141624.08+134826.7. Two librarian approaches finished off the session as Mariëtte van Selm talked about RDM activities in the University of Amsterdam library, and Sarah Jones introduced the RDM Training for Librarians on behalf of the University of Edinburgh.
The day ended with a tools demo and poster session introduced by Steve Androulakis from Monash University who reminded us that data capture is hard from a software perspective (that theory meets reality problem again).
Next, to the posters, and congratulations to MiSS Project and Leeds RoaDMaP sharing best poster prizes with data.bris (pictured) who showing how to use a visual metaphor to get a point across. I’ll also include a honourable mention for RD@Essex for facing down that county’s reputation, although for all the great work the UK Data Archive do, they’re still a long way from replacing TOWIE with RDM as the first thing people think of when Essex is mentioned).
Day two began with a stimulating intellectual breakfast session from Geoffrey Boulton of the Royal Society. Under the title of “Open Data: Why it Matters and What Needs to Be Done” Professor Boulton mentioned a statistic that out of key cited papers replication was possible in only 11 percent of cases and that this statistic undermines the credibility of science, that not making data underpinning scientific paper available constitutes a form of malpractice. So having posited that “Science is broken” he outlined a route to fixing the problem with open data and transparent peer review. Openness in itself has no value, he stated, what is required is intelligent openness – data that is accessible, intelligible, assessable, reusable. However, blanket demands for openness are often naïve as there must be boundaries for legitimate commercial interests and personal data sensitivity. For example, administrative data is a rich and untapped data mine but there is major privacy issues when mining. Furthermore, where exactly the boundaries apply for these issues is “fuzzy” and potentially made worse by excessive restrictions from “poorly” thought through EU regulation on data protection. Once again, theory meets practice and don’t get along.
A final point of Boulton’s worth noting is that most data are proxies for phenomena so that data has to be able to be reworked for long-term exploitation to take place, and this depends on algorithm integration, file-format transition, software archiving, automated data reading, audience-sensitive metadata. Not cheap, and a significant investment of care. and attention. In concluding Boulton noted generation gap in scientists’ behaviour and attitudes. Younger scientists produce more data and recognize sharing. What kind of expectations do they have? It is a world where:
- Evidence must be open
- A shift away from data as private
- RDM is embedded
- Data easy to remix
- Credit is given for useful ways for data collaboration
Back upstairs to an intriguing session on implementing institutional RDM change. Some aspects of these presentations were off the record so I will be general in my comments. It seems that a lot of the problems come down to the semantics of what constitutes “data” and “sharing” and specifically to whom RDM policies should apply. Then emerges the problem of finding a middle ground between specific requirements and concise, accessible statements is a difficult compromise. It also seems that implementing RDM is dependent on finding champions within the institution.
The event closed with a wrap-up featuring the thoughts of six panellists on successes, perceived progress in the last few years, and recommendations for next steps in RDM infrastructure and support. Lee-Anne Coleman from the British Library made what I feel is an important point, we must remember to tell researchers why we are doing this: to make research better. I agree. I am conscious of an us and them gap where we can’t understand why researchers don’t just do things like, for example, metadata and researchers cannot understand why they are being harassed to do these seemingly pointless tasks.
Geoffrey Boulton mentioned something I too had hoped to get across in my presentation that the field still is shaped by a passive role of compliance with funding requirements. The full exploitation of researcher data relies on us moving into active phase – with researchers seeing RDM as integral and so important they factor in potential exploitations of data beyond and outside the original research project. Essentially, he seemed to advocate leadership from the bottom (researchers themselves) rather than the top on RDM as this potentially a more powerful advocacy. And with research councils already persuaded (at least in the UK one could argue) this advocacy should focus on the next “big target” – journals.
Louise Corti from the UK Data Archive stated that JISCMRD has brought researchers a better awareness of issues like file-formats and metadata as well as producing useful training resources. However, there is, in her view, a problem of economies of scale. Institutions cannot afford to do everything. However, some can specialize, and federated groups specializing and offering support to others should be encouraged.
The DCC’s Sarah Jones noted that communication was integral to the programme’s success and encouraged openness and sharing as a community. This was supported by Wendy White from RLUK who believed things evolve and work where there is trust and that JISCMRD is a comfortable environment in which to solve complicated issues either through discussion or more direct collaborative efforts.
Joss Winn from Orbital noted a sense of coherence to this JISCMRD programme compared to recent ones but would like to see things joined up a bit more now. For example, while we speak RDM to researchers the speak VREs back to us.
A final thought. Well done to the programme manager Simon Hogson and JISC in organizing and funding the collective efforts of all the projects that provide the UK with, no exaggeration, a world leading infrastructure for supporting the re-use and long-term preservation of research data. My personal view is that we who reside outside the UK have a distance to catch up.
And finally, our presentation: The Archive and Data Management Training Center