Horizon 2020, Research Data Management, You, and Us.

Horizon 2020[i] is the European Union’s latest research and innovation funding programme, making €80 billion available in the seven years between now and 2020.

Horizon 2020 embraces the global movement amongst research funders that requires data generated be made available for verification, reuse, repurposing, and innovation. Consequently, Horizon 2020 project applications participating in the Open Research Data Pilot require a Data Management Plan (DMP) [PDF] [ii]. Other projects are encouraged to submit a Data Management Plan if relevant for their planned research.

What does this plan cover? On a basic level it requires a statement on types of data the project uses and plans for long-term availability, discoverability and usability for others of data created by a Horizon 2020 project.

The Archive and Data Management Training Center at GESIS is happy to help with your Horizon 2020 DMP, both writing and implementation one, from project conception to data archiving and beyond.

The center specialises in Research Data Management and digital preservation as part of the GESIS Data Archive for Social Science. Our team come from social science research backgrounds, pursue active research careers, and have experience dealing in the range of data social science researchers work with and the issues they face when doing data management. Issues like dealing with intellectual property matters, data discovery, producing descriptive metadata, privacy concerns, informed consent, data citation, security and back-up, and depositing data into an archive.

Thinking about data management early in a project is critical to producing good quality, reusable, preservable data. We are available for consultations, project specific training events, or invite you to one of our RDM or digital preservation training workshops.

You can learn more about us and our activities from our website, or follow us on twitter. We are also happy to hear from you directly.

[i] European Commission (2013) What is Horizon 2020? http://ec.europa.eu/programmes/horizon2020/en/what-horizon-2020 accessed 20 February 2014.

[ii] European Commission (2013) Guidelines on Data Management in Horizon 2020 Version 1.0 http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf accessed 20 February 2014.

Advertisements
Posted in Data infrastructure, Horizon 2020, RDM policies, Training, Workshops | Tagged , , , , , , , , , , , | Leave a comment

DOIs and the danger of data “quality”

I’ve just spent a moment looking at guidelines [PDF] from the UK’s National Environment Research Council (NERC) on how NERC funded research can obtain a persistent identifier through the DOI® system.

Just DOI it, just don't DOI it like that.

Just DOI it, just don’t DOI it like that.

NERC have a data sharing policy, and fund data centres for sharing and long-term data preservation. Like us here at GESIS, they have an interest in promoting stable persistent identifiers (in both cases Digital Object Identifier (DOI) names) that allow datasets to be cited as one would a publication. All well and all good.

I certainly have no issue with the advice they provide for researchers on obtaining a DOI name. Its good, clear, and concise. However, I’m going to expand on my reaction to one line in their guidance document. NERC state “by assigning a DOI the [Environmental Data Centre] are giving it a ‘data center stamp of approval’”. Effectively they see a DOI name (or by implication any other form of Persistent Uniform Resource Locator (PURL)) as a quality check-mark in addition to its role as a reference to an object. Except the DOI system isn’t designed to suggest the “quality goes in before the name goes on”. Just to remind myself, I quickly looked at the International DOI Foundation handbook and it doesn’t mention data quality. Identification, yes. Resolution, yes. Management, yes. Quality, no.

There is no standardized quality symbol for data themselves. Instead we have informal ones that act as proxies – not of concern to researchers themselves but do closely correlate to the contestable idea of “quality”. But remember, they remain proxies, not the variable of interest. For example, just because a data set is available from a social science data archive doesn’t mean it is any good. It means the archive think people will use it (or that we are contractually obliged to take it), it can be understood and isn’t just a set of numbers, doesn’t violate data protection laws or intellectual property rights, and doesn’t break our will or budget taking the data into our collection. So, if you order data from a data archive it is preserved and contextualized and probably good quality data, but it need not be good quality. Indeed, I suspect most archives have a data set or two that somehow ended up accepted into the collection as the result of an impenetrable act of madness or despair. Likewise receiving a DOI name might be a stamp of approval if minted by a NERC data center, but other assigners might not be so fussed about the quality of what’s getting DOIed. As this blog post reminds us, anything can be given a DOI, multiple times.

Now, archives are working towards establishing their own stamps of approval for digital preservation and archiving. The Data Seal of Approval, nestor Seal for Trustworthy Digital Archives, and ISO 16363 standards are recognized levels. Yet, these are explicit symbols of quality in digital preservation – showing an archive knows what to do, how to do it, and are doing it. Thus it indicates the quality of curation, not the quality of the data being curated. The best preserved and contextualized data set in the world could still be junk.

So should we as a community be moving towards quality symbols for data themselves? The risk of starting down that route is encountering a host of problems defining contestable notions of “quality”. Digital preservation is, after all, measurable in the sense something is both preserved and accessible or it isn’t. However, research data  is subject to all kinds of challenges as to its quality, even to the point of dismissing entire research approaches. I have no problem with NERC specifying their own concept of quality (which they effectively do), it’s just the use of DOI names as a tool to signify this. To that effect, we shouldn’t start using tools designed for one end to another.

Posted in Persistent Identifiers | Tagged , , , , , , , , , | 1 Comment

Oh! Vienna…notes from the CESSDA experts Seminar on Research Data Management

What follows are summaries of presentations and discussions. They are my summaries, so any misrepresentations, mistakes, slanderous accusations, lies, written lies, twisted lies etc. that follow are mine.

All the presentations linked to in this blog post are available under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

What's the collective collective noun for CESSDA RDM experts, a package? Photo: Rein Murakas

What’s the collective collective noun for CESSDA RDM experts, a package?
Photo: Rein Murakas

It did not feel like the end of an era, but it was. For the last time, experts gathered to hold a CESSDA expert seminar before CESSDA’s rebirth as an established legal data infrastructure called CESSDA AS. However, the reason it didn’t feel like the end of an era was the topic and direction of the meeting was looking forward to Research Data Management (RDM) training and work on and RDM costing. In this respect, as the seminar would show, significant work has taken place but there’s still much to be done.

Hosted by WISDOM in Vienna, and organized with GESIS, the day-long meeting was attended by some 20 people from a range of existing CESSDA member archives and interested observers.

The reason for choosing RDM as the topic is the movement towards data sharing policies from research funders, including Horizon 2020 – the next European Commission round of research and infrastructure projects. Funding developments like Horizon 2020 show RDM planning and implementation is increasingly important when securing or complying with funding agreements, so what can we do to best support researchers in promising and realising good intentions?

While we as a community have developed international standards on metadata, other topics remain difficult to address from a cross-national perspective. For example, data protection laws and intellectual property rights vary across, and sometimes within, nations. Therefore, the objective of the meeting was to promote further cooperation between archives, identify experts who could contribute to cooperation, and discuss the possibility for common European RDM support. To address these goals the meeting was structured into two parts. First, incentives and teaching; second, costing.

Incentives

The incentives session started with presentations from Elisabeth Strandhagen and Sara Svensson (SND) on “Working with data management” [PDF] and Sonja Bezjak (ADP) on “Data Management Planning in Slovenia” [PDF]. These presentations outlined situations in their respective countries where funders seem to be moving towards data management planning requirements and it has fallen to SND and ADP to define and provide the infrastructure support to underpin these requirements.

After the presentations, discussion commenced and questions were asked. Elaborating on their presentation, ADP discovered a difference in attitudes to sharing between researchers who used international data and those who did not. Others pointed to discipline differences, while ADP found natural science researchers like to keep data for themselves the Finnish Social Science Data Archive (FSD) have the natural science researchers approaching them to learn about running a data infrastructure because social scientists have been running them for a long time. GESIS picked up on this point, stating that the RDM challenge in the natural sciences comes from storage; the natural sciences tend to be weak on metadata and data description because they traditionally haven’t needed to be strong in those areas. Social sciences, in contrast, have over 50 years of good work in establishing data description standards.

ADP returned to the policy and recommendations in their presentation by reiterating their strategic approach. Setting a national policy first allows institutions to develop their own policies within the national framework. Their recommendations suggest disciplinary data centres should develop in fields where they are needed; for example, “islands” of researchers are already informally sharing research data where there is a need for that data to be shared. This is because, in their view, scope for archives or repositories that exist for things that don’t fit elsewhere are limited. Discipline specific centres have specialisation, embracing a knowledge of both user and depositor communities.

Discussion then moved into a round table, with representatives from other countries talking briefly about RDM requirements, planning, and data sharing culture in their countries. A pattern emerged whereby most countries have funding bodies with some requirement or encouragement for researchers to produce a data management plan, but less emphasis is placed on requirements to share data. Even less effort is made by funders to implement these requirements.

Alexia Katsanidou (GESIS) [PDF] then talked about the importance of framing RDM incentives in ways that have emotional appeal to researchers. Much of the talk in RDM is about compliance in delivered in a cerebral tone, and much of the resistance is emotional in nature. If we could tap into emotional motivations to practice good RDM techniques, the result could be a more positive and active reaction from the research community.

To conclude the day, Alexandra Stam (FORS) [PDF] and Laurence Horton (GESIS) [PDF] gave overviews of recent training events in their institutions. FORS adopted a brave and daring scenario approach for a five day RDM course, weaving RDM themes and lessons into a data kidnapping role play requiring  participants to recover a data set. Laurence Horton outlined the work at GESIS on RDM training courses with a cross-national perspective, mentioning the problems of addressing national level issues in an international course and the need for good, stimulating, approaches to delivering and reinforcing RDM training.

The discussion afterwards raised the topic of needs driven training. Researchers are most receptive and open to training when they have a need for it, either from a policy view or working in a collaborative environment. This moved onto consideration of how we can build training activities in a way that integrates with existing research practice and reduces friction between archives and researchers.

Costing

Part two of the seminar began with presentations on costing from Laurence Horton (GESIS) [PDF], Veerle Van den Eyden (UK Data Archive) [PDF], and Heiko Tjalsma (DANS) [PDF]. Laurence argued that focusing our training on researchers generating good metadata and documentation can help us reduce the financial cost of archiving, a cost which is often tied to the need of the archive to add metadata and documentation during ingest. Veerle presented a tool developed at the UK Data Archive that helps researchers identify RDM costs thereby allowing them to factor realistic, specific RDM costs into funding applications. Heiko spoke about the work at DANS who applied business costing models to their activities in order to develop costing models fit for an archive. DANS are now implementing their cost model within the organisation, particularly in regard to classifying activities into principal and auxiliary costs. They are also contributing to European projects on cost modelling, APARSEN and 4C.

The post presentation discussion raised some worthy points. It was noted that although specific task cost data can be captured, the problem is there is no standard research collection. UK Data Archive and FSD both mentioned their experience with qualitative data and how it was expensive to ingest. DANS noted that their data collection, which contains social science, archaeology, humanities (history) data had “huge” differences between domains when it came to ingest cost. This is a result of incorporating three different disciplines with their own archiving processes, into one archive, something DANS is busy attempting to standardise. Jared Lyle from ICPSR asked a provoking question as to whether well curated studies are actually getting usage regardless of the quality of metadata and documentation.

The costing session ended with an open presentation [PDF] and discussion led by Mari Kremola from FSD. Mari argued that most of the work on costing is storage based and does not help us when it comes to costing the production of metadata, particularly as a proportion of total archiving costs. Furthermore, promoting self-archiving platforms is very well, but it is done so on the expectation that researchers will provide sufficient metadata and display other RDM considerations, which they probably will not. Mari also touched on an issue that seemed to resonate with others in the group, namely the reluctance of funders, institutions, or researchers to claim intellectual property ownership of research data due to a fear of responsibility for the resources required for long-term preservation.

Wrap Up Session

Regarding future directions, the question was asked as to how we cooperate on coordination for training RDM across the CESSDA archives. As a community we are now in a better position as CESSDA to work with universities and negotiate with archives on training. It was suggested we focus on finding commonalities rather than highlighting differences and look at holding training where we can focus on researchers who can’t travel. Options here include federated systems of training where we hold training in different places with one person delivering it from one country with others listening in and providing support. Furthermore, we could possibly offer some advance qualification or certification in RDM.

Further reading

Bezjak, S. “Ravnanje z raziskovalnimi podatki: spodbude in stroški (CESSDA Expert Seminar 2013)”, Prispevki: blog, 18 November, 2013.
http://www.adp.fdv.uni-lj.si/blog/2013/neuvrsceni/ravnanje-z-raziskovalnimi-podatki-spodbude-in-stroski-cessda-expert-seminar-2013/#ixzz2l0J3NlnQ

Posted in CESSDA, Data infrastructure, Documentation, Presentations, RDM policies, Research data management, Training, Workshops | Tagged , , , | Leave a comment

Self-archiving platforms and data verification

There used to be a comedy show on TV that featured a character who described everything as either “brilliant!” or “fantastic!” Isn’t open data, brilliant! Data sharing, brilliant! Expanding ways to facilitate open data and sharing, fantastic! And, you know what, it is! Transparency is brilliant, accountability is fantastic, and advancing scientific knowledge is, yes, fantastic and brilliant!

Automated data sharing platforms, or self-archiving, that make it as easy possible for researchers to share their data by providing a Dropbox, YouTube, or Flickr like interface are fantastic facilitators for sharing as they provide a platform for discovery and get data to their community quickly and easy in a world where “discovery”, “quick” and “easy” are essential elements. For the most part this is not a problem and is, well, brilliant, or even well brilliant.

But (and of course there’s a “but” as I have another 800 words for you) the fantastic move to open data and exchange of data is occurring in a time of increasing fear about sharing personal data. What’s personal data? It’s data relating to a living individual who can be identified either from those data or from those data in combination with other available information.

Data protection laws tend to get stronger rather than weaker, the grumbling around data social networks require and what they do with the data we give them gets louder, and of course, the revelations as to just how much governments (not even our own government) pry into our lives generates increasing outrage. Not so brilliant. Certainly not fantastic.

So, what happens when our desire to share scientific data quickly and easily comes up against a need to protect personal data?

Here’s an example based, as they say in Hollywood, on a true story. A self-archiving platform is launched welcoming research outputs from across the sciences. It’s established on the principle of making it as easy possible for researchers to share their through an easy to use interface and quick uploading of data in almost any file format. From an IT perspective, it’s brilliant! Anything that makes research data management and sharing less of a chore to do must be attractive, right? Especially when the alternative is that data is lost forever. However, social science archiving nearly always has a problem with the “quick” and “easy”. Compared to other data sources we rightly are neither quick nor easy when it comes to research based on human subjects that could contain personal data.

Archives — like self-archiving platforms — take data on a basis of trust. We trust the people offering data are telling the truth when they claim they have the right to offer us data, and we trust researchers when they tell us anonymisation and personal data issues have been addressed to a mutually agreed standard. Occasionally through naiveté or a genuine mistake researchers may give us data that violates that trust. However, an archive, compared to most self-archiving platforms will have a data ingest procedure that includes a manual data verification process and safeguards to ensure personal data laws are respected and re-identification of participants prevented to the point where “disproportionate amount of time, expense and effort” is required [Germany] or “dentification is not likely to take place” [UK]. Now, we often don’t do the work on personal data and anonymisation, but we do a lot of manual work making sure researchers have done what they said they did and that it is addressed before we make data available to the research community or others.

However, if that ingest and verification process is automated and supported with a loose policy towards data acquisition and ingest, these potential problems with personal data and identification only get caught on becoming realized rather than hypothetical problems. If you accept any file format that makes it harder to verify the contents. If you promise instant data availability that makes it impossible to verify anonymisation and data checking has taken place. Therefore if something sneaks through that shouldn’t, it doesn’t matter if you take it offline immediately. Like a misjudged celebrity tweet, the fact it appeared at all is enough to do damage.

This is bad for two reasons. There is the obvious legal issue and potential punishments, which can be severe. But the other reason is trust. Trust is our currency. This archive thing of ours only works, and can only work, on a basis of trust. In fact, the social science archiving community have even have adopted standards to support the value of trust. We have to trust what researchers give us; they have to trust we can look after it. Users have to trust the quality and contents of what we give them, we have to trust users respect the terms of use under which we provide them data. Trust is precious. A violation of that trust, either wilful or not, leads to a devaluation in the currency of trust and this is not brilliant nor is it fantastic. It is awful.

Data infrastructures have a lot in common but what this episode suggests is that data archiving and sharing isn’t just an IT issue. Sure, we need platforms that are easy to use and get data to people as quickly as possible. However, we need policies, procedures and expertise underpinning these platforms. We can’t just take anything, take it all, let it sort itself out, and only act when a problem is pointed out by the depositor or user community. Trying to retrospectively moderate data you may not be able to read or understand is not a “policy”; it’s an invitation for trouble. At some point you’re still in need of a human touch.

Posted in Data sharing, file formats, Repositories, Self-archiving | Tagged , , , , , , , , , , , , | Leave a comment

First steps towards an introductory workshop in digital preservation

Last week, we ran our first ever digital preservation workshop here at GESIS in Cologne, entitled “First steps towards digital preservation”. We will assess the workshop and the feedback we received in more depth in the weeks to come, but we would like to share some initial thoughts on the workshop while the impressions are still fresh in our minds.

Planning the workshop

When we began to think about a course in digital preservation, we decided we wanted to start “at the beginning”. Accordingly, the course was intended as a primer in digital preservation, requiring no previous experience with or knowledge of the subject. Despite our background in the social sciences, we also decided to design the course for a wider audience instead of focusing exclusively on the preservation of social science data. We did so because we thought that the basic principles of digital preservation, its general framework, can be taught largely independently of disciplinary specifics and that at this stage in particular discussions from the perspective of different disciplines can be immensely fruitful.

A second principal decision that we took was to allow the course to strongly lean towards the “organizational leg” of the three-legged stool of digital preservation (see http://www.dpworkshop.org/dpm-eng/conclusion.html) – the question of what to consider before actually beginning to ingest digital objects into a preservation system. This decision was made because it is our conviction (and we keep good company here!) that while good technology will undoubtedly make the life of the digital preservationist much easier, digital preservation will fail if technology is not embedded in an adequate organizational infrastructure of “policies, procedures, practices, people”  (ibid.).

Thus, after a general introduction to the “why and what” of digital preservation and the OAIS Reference Model, we focused the workshop on the following topics:

  • Defining a designated community
  • Defining significant properties
  • Acquisition policies and selection criteria
  • Sustainable digital preservation and cost models
  • Licensing for preservation and re-use
  • Trusted digital repositories and certification.

The last goal that we set ourselves was to create a course that was also “hands on,” which meant that for every topic that we presented we created exercises designed to give participants to apply the theory to a real digital preservation world example.

Digital preservation, analogue presentation.

Digital preservation, analogue presentation.

Lessons taught, lessons learned

Here are some initial thoughts as to how the workshop went and where there might be some room for fine tuning or improvement. The workshop took place between October 23-25 at GESIS Cologne. It was delivered to a small group of participants most of whom had a background in social sciences or educational research, the notable exception being a participant coming from the world of music libraries.

We are very proud to say that the feedback from our first group of participants indicates that the workshop was a success and that we seem to have achieved the goals that we set ourselves. Overall, the participants’ impression was that the workshop covered the ground of “digital preservation 101” in adequate breadth and depth and that nothing vital was missing. Whew! At the same time the feedback shows an interest in covering certain topics in more depth, possibly in a separate workshop or training course rather than in the introductory workshop. Among these topics were persistent identifiers and ingest as well as digital preservation strategies “in practice”.

While the composition of the group meant that discussions veered off in the direction of the challenges of preserving social science research data from time to time, the feedback indicates that we managed to keep the balance between social science/research data specific questions and the discussion of questions relevant to other disciplines and digital object types.

Overall we are very happy how the exercises and discussions of practical examples turned out – we owe a big thanks to our participants here, who did not seem to get tired of all the scenarios and case studies we had in store for them. Yet we have a hunch (and the feedback confirms this) that we should aim to shift the balance between presentations and exercises just an additional notch more towards exercise. Although the time dedicated to exercises was generally deemed sufficient, we could easily have spent more time on these and the related discussion. Moreover, the time required is likely to increase with a bigger number of participants and accordingly we will put some thought into how we can make yet more room for the exercises.

Where do we go from here?

We are planning to run the “First steps towards digital preservation” workshop again in spring 2014, so this will leave us time to adapt and modify the workshop as described above.

In addition, we will also start working on the conception of advanced and more specialized workshops which will cover a single topic in more depth. This will involve contacting experts from the respective areas to see how we can bring in their expertise, both in the conceptualization and delivery of the workshops. So exciting times are ahead and we look forward to taking many more steps towards digital preservation!

Posted in Training, Workshops | Tagged , , , , , | 1 Comment

The five stages to data sharing: Depression

Applying the Kübler-Ross model[1] to researchers and data sharing, based on various attitudes and comments we have encountered over the years. Don’t take the presentation seriously, but take the content seriously. Part four in a series of…uh, five.

4. Depression

Symptomatic statement: “I do all the work and someone else gets the credit, why bother with anything?”

Heaven Knows I'm Miserable Now

Heaven Knows I’m Miserable Now

During the fourth stage, the researcher begins to understand the certainty of data sharing. Because of this, the individual may become silent, refuse visitors and spend much of the time crying and grieving over the perceived loss of a Nobel Prize. This process allows the researcher to disconnect from things of love and affection. It is not recommended to attempt to cheer up an individual who is in this stage. It is an important time for grieving that must be processed. Depression could be referred to as the dress rehearsal for the ‘aftermath’. It is a kind of acceptance with emotional attachment. It’s natural to feel sadness, regret, fear, and uncertainty when going through this stage. Feeling those emotions shows that the person has begun to accept the situation.

Data reuse isn’t about “stealing” other people’s work. It’s actually a great opportunity for promotion as people mix, reuse, and re-purpose original data.

Using data without attribution is unacceptable; it’s also known as plagiarism. But using someone’s data with attribution but without permission is also wrong.

Provided you have invested a certain level of creativity and originality in collecting data, you have a right to be recognised as the creator of that data set. If the intellectual property of data is yours (and it’s worth checking with your institution and funder if it is yours because there is be a difference between the moral right of recognised authorship – which you will always have, and the intellectual property right of ownership – which you may not), then you can apply a license to your data protecting and asserting your moral right to be recognised and legal rights that data be used responsibly.

Part of the problem with data sharing, or non-sharing, is the ambiguity as to who can use it, how, and to what ends. Part of this is a consequence of variations in national laws[2] regarding intellectual property rights. Depending on where you are, a user could claim “fair usage[3]/”right to quote[4] or invoke freedom of information[5] laws to access and use data. However, a licence[6] can specify reasonable (i.e. lawful – no human sacrifice requirement please) conditions on what other people can do with the data, thereby bringing clarity to data reuse for both parties. The text you see above in italics isn’t (mostly isn’t) my work, but I have permission to use it in a particular way because the article it came from had a licence[7] allowing me to “Remix—to adapt the work” on condition I attribute the original work in the manner specified by the author or licensor (which I have, but not in any way that suggests that they endorse me or my use of the work – and thank God for that you may be thinking). So even if you plan to make data available to everybody for any use by waiving your rights, then adopt a licence stating you waive your rights so we all know. Alternatively, you may not want commercial companies making money from your data in which case adopt a restriction stating data can only be used for non-profit research or teaching. Licensing data is complicated though; so do not try this at home. Template data licences exist that may be suitable for your research data, for example Creative Commons licences[8] – which while simple in concept and action (you pick and choose from a menu set of conditions), are not designed for research data – hence the “may” qualification. The Open Knowledge Foundation’s Open Data[9] licences are more data/database orientated, but the clue is in the title: “open data”. As previously discussed[10], “open data” in the social sciences is a problematic concept and, again, such a licence may (it’s that word again) not be suitable[11].

One of the good things about archiving your data in a specialist data archive is that the archive will never claim ownership of the data so it remains yours or with the original owner. Furthermore, the archive has experience of managing the legal and bureaucratic side of data reuse on your behalf (now, that can be depressing) with people having to agree to a license (or user agreement) before they access the data – here’s[12] how we do this at GESIS. Bottom line: your data is still yours and should be recognised as yours, even if it is in an archive, even if someone else is using it.

It’s understandable to be depressed if you see sharing your data as something with no professional benefit in a world dependent on being published and cited. However, such feelings are unfounded. Investment is taking place to support better ways of making data citable, tracking citations, elevating data sets to the status of a research output and reuse to the level of citation. In the past the world was much simpler: you cited a publication by referring to its publication – author, title, publication, volume, and page numbers. These days it is more complicated, how do you cite data[13], variables, different versions of datasets, and what happens when websites change address or switch-off their servers? So what? Don’t think that’s a problem? Well, be thankful[14] you aren’t a legal scholar.

With funders[15] investing in long-term reference systems known as persistent resource identifiers[16], we can start (and it is a start, there’s still lots[17] to address[18]) bringing stability[19] to data referencing. For example, GESIS uses a form of identifier called the Digital Object Identifier [20] (DOI®) that allows a fixed, persistent, reference to be applied, providing a standardised reference for citation for data, documentation, and publications. It not only makes work discoverable and citable but also offers a long-term reassurance this remains the case, and if your data is available, discoverable and citable then people will discover and cite[21] it.

Just, ahem...DOI it.

Just, ahem…DOI it.

If you archive and share data, thereby establishing when your research was conducted, in addition to doing the normal expected things a good researcher should be doing anyway to build up a good professional reputation – presenting and writing publications based on that data to make a name for yourself in a field – it’s going to make it so much harder for unscrupulous types to pass-off your work as theirs.

[1] Adapted from http://en.wikipedia.org/wiki/K%C3%BCbler-Ross_model under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

[2] Centre for Intellectual Property Law (2011) “The Legal Status of Research Data in the Knowledge Exchange Partner Countries” (Accessed October 09, 2013) from http://www.knowledge-exchange.info/default.aspx?id=461

[3] Association of Research Libraries (2012) “Code of Best Practices in Fair Use for Academic and Research Libraries” (Accessed October 09, 2013) from http://www.arl.org/bm~doc/code-of-best-practices-fair-use.pdf

[4] Council Regulation (EC) 2001/29 of 22 June 2001 on the harmonisation of certain aspects of copyright and related rights in the information society [2001] OJ L167/10 http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32001L0029:EN:HTML

[5] Whyte, A. (2013) “FOI dataset provisions – what do they mean for RDM?” DCC Blog (Edinburgh: Digital Curation Centre) (Accessed October 10, 2013) from http://www.dcc.ac.uk/blog/foi-dataset-provisions-what-do-they-mean-rdm#sthash.7jsGneuZ.dpuf

[6] Ball, A. (2012) “How to License Research Data” DCC How-to Guides (Edinburgh: Digital Curation Centre) (Accessed October 09, 2013) from http://www.dcc.ac.uk/resources/how-guides/license-research-data

[7] Creative Commons (n.d) “Wikipedia:Text of Creative Commons Attribution-ShareAlike 3.0 Unported License” (Accessed October 09, 2013) from  http://en.wikipedia.org/wiki/Wikipedia:CC-BY-SA

[8] Creative Commons (2013) “About the Licenses” (Accessed October 09, 2013) from http://creativecommons.org/licenses/

[9] Open Data Commons (2013) “Open Data Commons Attribution License” (Accessed October 09, 2013) from http://opendatacommons.org/licenses/by/

[10] Archive and Data Management Training Center (2013) “The Five Stages to Data Sharing: Bargaining” (Accessed October 09, 2013) from https://admtic.wordpress.com/2013/09/30/the-five-stages-to-data-sharing-bargaining/

[11] Naomi, K., & Oppenheim, C. (2011) “Licensing Open Data: A Practical Guide” (Accessed October 14, 2013) from http://discovery.ac.uk/files/pdf/Licensing_Open_Data_A_Practical_Guide.pdf

[12] GESIS – Leibniz Institute for the Social Sciences (2007) “Usage regulations – Dept. Data Archive for the Social Sciences” (Accessed October 09, 2013) pp.7-8 from http://www.gesis.org/en/services/data-analysis/data-archive-service/usage-regulations/

[13] Ball, A., & Duke, M. (2012) “How to Cite Datasets and Link to Publications” DCC How-to Guides (Edinburgh: Digital Curation Centre) (Accessed October 09, 2013) from http://www.dcc.ac.uk/webfm_send/525

[14] Zittrain, J., & Albert, K. (2013) “Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations” Social Science Research Network http://dx.doi.org/10.2139/ssrn.2329161

[15] Economic and Social Research Council (n.d) “Data citation: What You Need to Know” (Accessed October 09, 2013) from http://www.esrc.ac.uk/funding-and-guidance/grant-holders/data-citation.aspx

[16] Tonkin, E. (2008) “Persistent Identifiers: Considering the Options” Ariadne, 56. (Accessed October 09, 2013) from http://www.ariadne.ac.uk/print/issue56/tonkin?

[17] CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013) “Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data” Data Science Journal, 12, 1–75. pp.33-36 http://dx.doi.org/10.2481/dsj.OSOM13-043

[18] “DOIs unambiguously and persistently identify published, trustworthy, citable online scholarly literature. Right?” (2013) CrossTech (Accessed October 09, 2013) from http://crosstech.crossref.org/2013/09/dois-unambiguously-and-persistently-identify-published-trustworthy-citable-online-scholarly-literature-right.html

[19] Mooney, H., & Newton, M. P. (2012) “The Anatomy of a Data Citation: Discovery, Reuse, and Credit” Journal of Librarianship and Scholarly Communication, 1(1), 1–16. (Accessed October 09, 2013) from http://jlsc-pub.org/jlsc/vol1/iss1/6/

[20] International DOI Foundation (2013) “The DOI® System” (Accessed October 09, 2013) http://www.doi.org/

[21] Piwowar, H. A., & Vision, T. J. (2013) “Data reuse and the open data citation advantage” PeerJ, 1, e175. http://dx.doi.org/10.7717/peerj.175

Posted in 5 Stages to Data Sharing, Data Citation, Data sharing, Licences, Persistent Identifiers, Research data management | Tagged , , , , , , , , , , , , , , , , , | Leave a comment