On 30 April 2013 I attended Linking datasets and articles for publication – cross-linking and workflows at Rutherford Appleton Laboratory in the countryside in Oxfordshire.
I used to live in High Wycombe and hadn’t seen the soaring Red Kite birds for years so I enjoyed the trip to the lab. (Thanks to Chris Frewin for the picture) The village full of dragons without explanation amused me too.
The event was organised by the Peer REview for Publication & Accreditation of Research Data in the Earth sciences (PREPARDE) project. Sarah Callahan’s cross-linking and workflows report contains a link to the presentations.
This was the first workshop I attended for my new job and it was a helpful one. I met a mixture of people working in research data management and publishing.
There were plenty of details of workflows – how people actually work with their data and share it. I think this sort of detail will be what makes data publication usable and useful to researchers. I expect to be spending much of my time learning how our researchers prefer to read, work and share and offering tools that will help.
Big research organisations with a lot of data had developed their own systems and were sharing their learning and thinking about interoperability with systems outside of their organisation. It was good to see subject specific ways of working that had been well tested with bulk data.
A pretty new thing was the data journal – (peer reviewed) papers which describe datasets in detail but don’t do research analysis on them.
The ideas of linking research publications have been around for ages: citations and systems that help organise them. This workshop was about extending citations to data and one area was how you make links in both directions:
This is my paper, it is based on these data. Here is our dataset, it was used in these papers by these people on these projects.
A paper tends to be a discrete thing with versions for preprint, reviewed and corrected, formatted etc. Live documents extend this perhaps: wikis, multiple authors commenting and editing, trails of changes…
A data set can be complicated to describe: consider raw data from a digital camera, formatted pictures of assorted sizes, pictures filtered to remove red-eye, the set which are of Aunt Maude. How we describe data sets and point to different versions and subsets so they can be found and used will be interesting.
I look forward to seeing what can come out of open-access to data for researchers.