The Double-Edged Sword of Open Access to Research Data
In this issue of Circulation: Cardiovascular Quality and Outcomes, 2 articles have made a compelling case for an aggressive move to widespread dissemination of study data that can support other scientists' independent analyses and publications. The first article, by Gøtzsche,1 is a revised transcript of a speech made to the European Parliament in Brussels, and the second is a thoughtful commentary by Ross and colleagues.2 Both of these articles make a compelling argument that open access to clinical study data would accelerate science, illuminate new opportunities to improve care, optimally leverage existing resources to generate the most knowledge in the most cost-effective manner, and are an ethical responsibility to the patients who participated in the original studies so that their experiences can help the greatest number of individuals. These are all important points, and I wholeheartedly embrace the vision of these scholars; however, a number of practical issues need to be considered for such an approach to become a reality.
First, there needs to be a means of ensuring that data are appropriately analyzed so that the results of any published findings are, in fact, true. Although the primary analysis of a randomized trial is straightforward, investigators often use these data to examine other treatments and outcomes within a longitudinal study. Although there are methods for comparing treatments or patient characteristics that are not randomly distributed in the study, such as propensity or instrumental variable techniques, a recent review of the application of these methods in cardiovascular studies revealed that <5% were conducted optimally.3 Although the proponents of open science believe that the peer review process is capable of filtering the quality of research, the other 95% of studies reviewed by Austin were published in some of the most prestigious cardiovascular journals, and these journals' review processes failed to identify the deficiencies that he did. Given the paucity of reviewers capable of understanding and critiquing advancing statistical methods, the challenges for such reviewers to devote uncompensated time to the peer review process and the proliferation of journals, I worry that inaccurate conclusions may be drawn from publically available data and that patients adversely treated on the basis of such fallacious insights will be harmed. Since an important source for such data will be large clinical trials, which have garnered respect from the medical community, incorrect analyses and conclusions may be given added credibility and accelerate adoption of misinformation.
Second, each study has its challenges, whose limitations are best known by the investigators who conducted the study. For example, some data fields may not be adjudicated or to have been collected in an imprecise way. Although the original investigators would know of these weaknesses, others accessing the data may not. For example, the National Cardiovascular Data Registry collects elements, such as periprocedural myocardial infarction, that would understandably be of great interest to clinical investigators. Not knowing the enormous variability in the surveillance of periprocedural infarcts across institutions might introduce important confounding in analyses of this outcome, and the extensive review process and restricted analyses of National Cardiovascular Data Registry data prevent such mistakes from occurring. Similarly, there is often missing follow-up data in clinical studies, and understanding the magnitude and impact of these data (and properly analyzing such fields) is challenging. Cursory analyses that are insensitive to these limitations can lead to erroneous conclusions that could be averted by ensuring that the original investigators are involved in the analyses of their studies, and they could prevent these types of potential errors.
Finally, the funding of secondary analyses of clinical data is limited. Although the recent decision of Medtronic, Inc, to provide and fund analyses of their entire clinical experience with recombinant bone morphogenetic protein-2 is not only laudable and precedent setting, it is by no means the norm.4 In fact, recent decisions to eliminate R21 grants by the National Heart, Lung, and Blood Institute means that even fewer resources to support such analyses are available.5 Dr Ross notes that substantial resources are needed to collate and annotate the data from a study so that future investigators are able to analyze it. Without adequate funding or other mandates, however, I fear that this will be difficult to accomplish. Efforts by the National Heart, Lung, and Blood Institute to host conferences to educate researchers as to how best to analyze its large epidemiological databases are excellent examples of how this can be done; however, without investing in such an infrastructure, it is hard to imagine how high-quality science can emerge from the mere availability of more data.
These concerns are not meant to discourage the movement to open science and the sharing of data. I wholeheartedly concur with the sentiment of the authors of this issue's articles calling for open science; however, we need to define how best to invest in the infrastructure of open science. One model is that used by the Cardiovascular Outcomes Research Consortium, which I cofounded with Dr Krumholz. Using a web-based infrastructure (www.cvoutcomes.org), we enable all investigators in the study (and outside investigators who request permission) to post analytic proposals to the website. This enables all involved with the study to become aware of, and to participate in, these research ideas, thus elevating the review and import of the proposals. A Publications Committee then reviews and approves proposals, and all analyses are conducted by the original study statisticians who are most familiar with the nuances of the dataset. Articles produced from such analyses then undergo extensive peer review within the Consortium, prior to journal submission. Creating such an infrastructure within all clinical studies would be a huge advance, although its funding remains a challenge. Whether the funders of the original research (including governmental agencies, foundations, and industry) would consider such support is an open question, but one that could clearly maximize the intellectual contributions and impact of their original studies. The dialogue in this issue of Circulation: Cardiovascular Quality and Outcomes should serve as an import stimulus to solving the challenges of open science. Clearly, our patients and practicing community will benefit from an innovative solution to this important problem.
Sources of Funding
The opinions expressed in this article are not necessarily those of the American Heart Association.
- © 2012 American Heart Association, Inc.
- Gøtzsche P
- Ross JS,
- Lehman R,
- Gross CP
- Austin PC
US National Heart, Lung, and Blood Institute. NHLBI will no longer participate in the investigator-initiated innovative research grant (R21) program. 2011. http://www.nhlbi.nih.gov/funding/r21.htm. Accessed February 19, 2012.