Unpublished abstract: fMRI Data Center Quality
Not all research findings make their way out of the lab. Sometimes they can get snagged on the way out the door. The reasons for this can range from funding, to politics, and even simple forgetfulness. Below is an abstract that I have been sitting on for over two years. It details an analysis that we conducted of the full fMRI Data Center (fMRIDC) archive. All datasets in the archive are from published manuscripts, so the analysis was an investigation of both fMRIDC archive quality and the quality of data used for publication in the early 2000s. Unfortunately, I don’t have the time or resources to do much more with it, so I will release it here in the hopes that our existing work might be of some utility.
The fMRI Data Center (fMRIDC) was founded as a large-scale repository for functional neuroimaging datasets from around the world. Since its inception the archive has grown to hold 122 fMRI datasets from a diverse array of cognitive domains. For years these datasets have been made available at no cost to any interested party. Within the last 12 months there have been 543 requests for 725 datasets coming from a mix of 60% domestic and 40% international sources. The goal of this project was to investigate data quality across the entire fMRIDC archive by holding each study up to the same stringent examination criteria. We hoped to determine what percent of studies could adequately be reused in a larger meta-analysis of functional imaging data.
We examined each of the 122 datasets contained in the fMRIDC archive. Initial criteria for inclusion required a dataset to contain functional MRI data in normal human volunteers. This eliminated all studies with only anatomical data, nonhuman data, and all clinical datasets. Further criteria for inclusion required datasets to have whole-brain coverage, no anomalous signal dropouts, no severe MR artifacts, and a minimum group size of 8 subjects. This eliminated all studies with gross data quality problems. It should be noted that only studies with data problems across all subjects were excluded on this basis. A single subject with bad data would not lead to disqualification.
Across all datasets we found that 48% of studies in the fMRIDC archive had issues that prevented their reanalysis. The most likely reason for exclusion was missing fMRI data (19 studies), with the second most likely reason being missing study metadata (11 studies). These issues have nothing to do with the data themselves, but center around problems related to acquisition of the data in a complete set. Other issues we found that would prevent the reuse of data included incomplete brain coverage (9 studies), corrupt/blank data (6 studies), data with severe visible artifact (6 studies), experiments with less than 8 subjects (5 studies), nonhuman data (2 studies), and experiments with only anatomical data (1 study).
This project represents the first step in understanding how data quality varies across a large sample of fMRI studies. From this analysis we can conclude that only about half of the studies met our criteria for further reanalysis. Still, the figure of 48% should not be taken as an indicator of quality across all fMRI experiments in the literature. The vast majority of issues had to do with the challenge of acquiring datasets and study metadata from the original authors.