I have spent the last day and a half in the company of linguists (mostly linguists) at the first Innovative Networking in Infrastructure for Endangered Languages (INNET) conference on “Best practices in digital language archiving of language and music data”
I enjoy going to events outside the social science remit in which our own organization operates. To me it is like going on a data infrastructure holiday (and that is where the holiday metaphor ends, as I never left Cologne).
Like any good holiday that does not involve sex, sand, and excessive strong alcohol consumption, it is interesting to see familiar things to us as a social science infrastructure, and the things that are different. Furthermore, this infrastructure holiday was not that exotic as this event included GESIS’s own Natascha Schumann and representatives from CLARIN who work with us in the DASISH project on data infrastructures in the social sciences and humanities.
The familiar things include the need for metadata and user authentication, albeit with different emphases. Of course, the attempt to educate researchers on long-term implications of reuse and the value of data archiving could be taken from any social science research data management event. However, what I want to do is highlight some of the interesting differences I perceived.
One emergent theme was the changing role of language archives. Gary Holton emphasized in his opening presentation that the endangered language archive he is involved with has transitioned in the last decade. A transformation from not just a physical to digital archive, but also a resource that has changed from a researcher only facility (not through access requirements but more though awareness and physical accessibility) to an archive that serves the very communities that have generated its collection. I was wondering if such a transformation is, or ever could, be occurring in the social sciences. In linguistics, there has been a move towards crediting participants in a language corpus, but in social science, the emphasis is strongly on protecting the identity of participants – directly and indirectly. In a sense, we are talking about trust. It is essential archives – whatever their remit – build a trust relationship between their user communities and depositor communities and have to be responsive to the demands of both. There is, I sense, a strong sensitivity amongst social scientists towards anonymisation and keeping data to greater or lesser extents closed. Our community would not tolerate what is acceptable in the linguistic community, and both have good reasons for what is acceptable in their communities.
This theme of community was another strong emergent message in INNET. Two presentations (Nick Thieberger, and David Nathan and Kakia Chatsiou) talked about the role of networking (social or otherwise) in their presentations. Nathan and Chatsiou from ELAR outlined how they are importing communication methods that work in social networking into restricted access data to facilitate relationships between depositor and user. Impressive stuff, but they admit, not something that was culturally possible before facebook and twitter plugged us into on-line interactions, and something they also admit is that “the next big thing” may be around the corner that makes social networks passé. This was my worry: how do you get depositors who slap restrictive approval conditions on their data to respond to requests to use that data? Or maybe linguists are nicer people that some (and it is some) social scientists. Maybe this approach only works with small focused disciplinary groups? Nick Thieberger’s presentation meanwhile was closer to home, outlining the findings from a “do you have data” survey that revealed the classic retired-professor-with-boxes-of-stuff-and-don’t-know-what-to-do-with-it routine also beloved by those who work in archives. More concerning was his findings of institutions that hold data in analogue forms but are unwilling or unable to pursue digitisation.
One final reflection of mine based on sitting in this conference is that linguists appear to have (perhaps inadvertently) hit upon the value of giving data back to the communities that generated them to discover uses and values not originally apparent, or not apparent to a discipline specific mind. Of course, social science is also aware of the value of re-purposing data, but it was interesting to hear how involving communities in data had produced fresh perspectives on the value of that data to different audiences – bird song, social chat – all things found that had been understandably overlooked by a linguistic ingest process. However, description and metadata always works better than attempts to digitise or migrate data, which should – from a long-term preservation view – always come with that often used TV warning “do not try this at home”.
There were of course other presentations of interest and note, but I wanted to focus in this case on those presentations that made me think about our role in data management and archiving. Finally, these are my interpretations of presentations and I am sorry if I have misrepresented any participants. I’d also like to thank the hosts for accommodating me at short-notice.