“Variable baskets” or Lost in the Supermarket

For the most part the worlds of research and infrastructure get along well. We have a mutual relationship of need: with no infrastructure, there is little possibility for research and with no research, no need for an infrastructure.

However, I sometimes feel that as a researcher working in an infrastructure team, there is a communication breakdown.

For example, do you know what “variable baskets” are? It is the great invention of the data infrastructure world to resemble online retailers’ one-stop shop system. Imagine, you can go to a virtual supermarket that contains all datasets a social scientist could ever desire. There you can mix and match the variables that could answer your question from all the available datasets. Millions of choices! The possibilities are endless! No waiting in line at the few checkouts that remain are open! No being stuck behind the person that tries to pay for their shopping in pennies!

Let us think of one possibility. Say you want to investigate the relationship between monthly amounts one spends on clothing with the opinion on foreign policy issues. You find relevant items in two different studies, and then you go to the checkout a-la-Amazon and you download them.

Isn’t that cool?

The tool is very resource intense, state of the art of software development and based on the latest metadata structures. It works beautifully. However, let me now put on my researcher hat (it is my floppy PhD cap, if you must know) and ponder the way to express what I am thinking. How do we researchers put this diplomatically? How about, it is useless. Completely and utterly useless.

The problem is that no matter how well the infrastructure functions, it disregards the basic principal of comparative research: comparing like with like. The two studies were not conducted at the same time; do not have the same sample, or even the same sample size. There is no way you can match the two variables on the individual level. There is no way you can make any reasonable analysis.

Pointing the problem out to the developers, I felt I was talking a different language. They did not seem to understand my concerns. “But the researchers should love that, no?!” I felt them thinking. “They merge datasets all the time, so we make it easier for them”! Except merging is not as simple as mixing it all up and seeing what comes out at the end.

Let me now put on my infrastructure hat (it is a baseball cap with <DDI> stitched into the side, if you must know). It seemed no researchers were consulted at the conceptualization phase and I was sad that such a great tool was developed but we will be unable to use it for any meaningful research.

It is a reminder. For all the great work done by researchers and infrastructure projects, if the two sides don’t recognize the needs of each other, talk to each other, and reach a common understanding as to what one wants and what the other can provide we end up lost in the supermarket.


About CESSDA Training

CESSDA Training offers and coordinates training activities for CESSDA, the Consortium of European Social Science Data Archives (http://www.cessda.net/). Hosted by the GESIS - Leibniz Institute for Social Sciences, our center promotes awareness throughout the research lifecycle of good research data management practice and emphasizes the importance of long-term data curation.
This entry was posted in Data infrastructure and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s