GESIS HQ hosted a DataCite workshop recently under the (translated) title of New Opportunities for New Solutions in Research Data Management. Although I was unable to stay for the whole workshop, I was able to sit in the morning session on Trends in Data Management.
Professor Björn Brembs came out swinging with an engaging polemic on the inadequacy of data infrastructures to support research. To paraphrase (i.e. the mistakes are mine, not his), the three things researchers produce are all in peril because the infrastructure to support them isn’t there. Data is in peril because suitable archives hardly exist for most research data. This is a long-term problem stretching back decades, but still, today, no end is in sight to the problem. Publications are in peril because there is no global search functionality for publications. Doing a literature review is like using the internet in 1995, we are almost two decades behind what the public is used to when it comes to searching for information. Software is in peril because software archiving is non-existent – algorithms, syntax all disappear within a generation because nobody is storing them.
Researchers then, work in a dystopia where there is no infrastructure support to archive publications, data, or software. Instead universities are still providing all the stuff around infrastructure, like email, web space, library access.
There is a utopian vision, however, of no corporate publications – libraries archive everything and make it publicly accessible to a worldwide standard, and a single semantic decentralized database of literature, data, and software.
What we have is a system that is incentives towards scientists wanting to promote their publications and not their science, promoting the best salesmen, not the best scientists.
Anyway, pugilistic stuff and very well delivered. Of course, a bottom-up message from the research world to the infrastructure providers at the top is always welcome. I am sympathetic to these arguments*
*There was also a great line about science inventing the internet, and then doing noting but sending emails and word attachments to each other for the next 20 years.
Certainly, when placed in the context of fantastic developments like YouTube, Google, Facebook, Twitter, that change not only the medium but the ways in which we consume, share, and interact, for science to still be wedded to a model still essentially analogue is pretty damning. Personally, I am aware of moves to fill-in this infrastructure and pushing open access publications, that is currently lacking, but despite their great work we still essentially playing catch-up. Science gave the platform, but capitalism has made off with the good (well, ok good or impressive depends on your attitude) innovations, and the money.
As a “Workers of the world unite” call it was great. But there’s always the detail. GESIS President York Sure-Vetter made the point that you’d expect a social science archive to make – what is “open data”? Fine, put your experimental psychology data out there for the world to use and verify, that’s good. But we deal with human subjects, who have moral and legal rights to protection and simply allowing open access is not possible without significant anonymisation. Yes, we want data shared to the fullest extent possible, but that may fall well short of “open” data.
With software, the obvious problem is intellectual property. I suppose the algorithms or syntax is fine, but archiving proprietary software programs is almost a non-starter. For publishing, we have seen remarkable progress in moving to open access for publications, but while journals hold all the good professional incentives (i.e. impact) researchers are rather held to ransom, and journals have a huge disinterest in cutting off a very lucrative income stream. Yes, DataCite are doing a great job pushing persistent identifiers as equivalents to publication references, but still there is a long way to go.
Anyway, to be fair Professor Brembs presentation wasn’t looking at the obstacles to the…uhh, obstacles and I may just be displaying a few of my hobby horses for show. It was instead a good reminder to us infrastructure people on what to keep our focus on – getting data shared responsibly and as easily as possible – and, well, that utopia looks like a nice place to live.