The Importance of Clinical Trial Data Sharing
Toward More Open Science
In cardiovascular medicine, as in all other medical disciplines, realizing the full value of clinical trial research data requires that the data be accessible to the research community and others who might be able to use them. Traditionally, the dissemination of knowledge derived from clinical research has been limited in scope: Investigators who have designed and conducted clinical trials make the decisions about which statistical analyses to conduct and then publish peer-reviewed articles to disseminate their findings. Clinical trial data are considered the property of the investigators and the entities that sponsored the research, with little or no opportunity for investigators external to the original study team to access the data. This traditional model is based on dissemination via print publication, the origins of which date back to the 17th century.
By continued adherence to this model in the age of electronic knowledge exchange, our understanding of clinical interventions is limited by our lack of access to comprehensive data from all clinical trials in several ways. First, a select number of individuals decide which analyses to conduct, choosing some at the exclusion of others, while an analysis that might have been of great interest to another investigator (and which may have a direct bearing on clinical practice) may not be performed. Second, among these findings generated, a select number might be included in any peer-reviewed publication, leaving the research community and clinicians at a loss to know about findings generated but not disseminated. In fact, by comparing published articles with trial protocols, 50% of efficacy and 65% of harm outcomes per trial have been shown to be incompletely reported and biased toward the reporting of statistically significant findings.1 Third, among all trials conducted, there may be significant publication delays, as happened with the Ezetimibe and Simvastatin in Hypercholesterolemia Enhances Atherosclerosis Regression (ENHANCE) trial,2 which was completed in April 2006, but the findings of which were not released until after substantial coverage in the news media in January 2008.3 Finally, only a limited number of trials are eventually published. By examining trials registered with an Institutional Review Board or the publicly available trial registry ClinicalTrials.gov,4 submitted to the US Food and Drug Administration as part of new drug applications, or presented as research abstracts at national scientific meetings, it has been estimated that between 25% and 50% of completed trials remain unpublished.5–12
The cumulative effect is that patients, physicians, other healthcare professionals, and the research community are placed in the position of making clinical or research decisions with access to only a fraction of the relevant clinical evidence that might otherwise be available. Making clinical research data available outside individual pharmaceutical companies or clinical research groups has obvious value, as Gøtzsche argues in this issue of Circulation: Cardiovascular Quality and Outcomes,13 in terms of validation, reproduction, and optimization of new knowledge generated from clinical research, but why are data not made more widely available to the scientific community? In this commentary, we will review some of the common concerns about data sharing, provide some prominent examples of data sharing currently underway in cardiovascular clinical research, and conclude with our expectations for more open scientific and information exchange through data sharing that will increase the value of all clinical trial research.
Data Sharing Trials and Tribulations
Data sharing is increasingly common in some areas of medical research, particularly among genomics investigators and research groups engaged in systematic reviews and meta-analyses; however, individual, patient-level clinical trial data sharing is less common because of concerns among investigators and challenges with the actual act of data sharing. The principal concern, voiced primarily by investigators, is that a substantial amount of individual time and effort has been invested to design the trial and collect the data and that, in return, they deserve ample opportunity to conduct their analyses and disseminate their findings. Without question, investigators do deserve some period of respite during which they can prioritize their analyses and publish their work; however, a recent study found that fewer than half of trials funded by the National Institutes of Health are published within 2.5 years of completion.12 Dissemination delays exceeding 2 years inevitably slow and diminish the impact of any research. Although investigators may be concerned about being “beaten to the punch” with their own data, they should focus their attention on the fact that the time and effort they have invested has not resulted in data that is being fully used to further scientific understanding and improve patient care.
Other objections to data sharing are also frequently raised,14 including concerns that multiple analyses by various independent research groups will produce analyses with differing results, either because of human error or because external investigators conduct inappropriate analyses; that clinical trials are designed with prespecified study protocols and that additional analyses amount to “data-dredging”; and that data ownership belongs by right to the original investigative team. However, the scientific community is well-positioned to review and put into context differing results from the same trial data, as well as to judge whether data has been “dredged” or appropriately analyzed. Regarding ownership, Vickers has posed the rhetorical question “whose data set is it anyway?”15 He posits that although the data legally belong to the investigators, science (particularly medical science) is essentially an enterprise conducted for moral reasons.15
Data sharing is a complex undertaking; the scientific community must reach a consensus about several critical points before the promise of sharing clinical research data can be realized. First, what are the responsibilities of the original investigator team? To share data effectively, they must produce a clean, well-described, and accurate data file that can be used by others and protects patient confidentiality. Second, who supports their effort to create this data source? Third, what if there are subsequent questions and inquiries? Who bears responsibility for the shared data?
Another broad issue is the question of who owns the data and who should be allowed to access the data. Is access unfettered or should some minimal application or registration system be used to minimize data-dredging and incentivize prespecification of analyses using shared clinical trial data? If the latter, who is responsible for reviewing these applications? Should there be a commitment to publish these analyses or, at least, report results on a central repository akin to ClinicalTrials.gov?4
Finally, where should the data be placed for others to access? Is it the responsibility of journals to house data from affiliated publications, as is currently done by the journal Trials,16 an open access, peer-reviewed, online journal that publishes on all aspects of the performance and findings of randomized controlled trials? In this and other models, who should support this effort and pay the costs of maintaining Internet accessibility?
Although many of these questions remain unanswered, the scientific community has begun the process of developing standards and solutions to common problems. The Institute of Medicine set forth recommendations on managing research data in the information age.17 The Wellcome Trust convened a number of research funders to develop a coherent vision, principles and goals to promote the sharing of research data to improve public health,18 supporting organizations which include the World Bank, the National Institutes of Health, and the Bill and Melinda Gates Foundation.19 Journal editors have developed guidance for the preparation of raw clinical data for publication.20
Current Data Sharing Initiatives
There are several prominent examples of data sharing currently underway in cardiovascular clinical research that are illustrative and can inform our expectations for open scientific and data exchange.
National Institutes of Health Data Sharing Requirements and the National Heart, Lung, and Blood Institute
The National Institutes of Health has implemented a policy to share clinical trial data, placing a priority on making the results and accomplishments of the activities that it funds publicly available. All investigator-initiated applications with direct costs >$500,000 in any single year are expected to address data sharing within their grant proposals21; however, in practice, the sharing of data varies widely among the Institutes, and the National Heart, Lung, and Blood Institute clearly leads the way. Alone among the Institutes, the National Heart, Lung, and Blood Institute has established the Biological Specimen and Data Repository Information Coordinating Center (BioLINCC),22 which provides centralized access to more than 100 clinical trials originally funded by the National Heart, Lung, and Blood Institute, including the Antihypertensive and Lipid-Lowering treatment to prevent Heart Attack Trial (ALLHAT), Coronary Artery Risk Development In young Adults (CARDIA) trial, and the Multiple Risk Factor Intervention Trial for the prevention of coronary heart disease (MRFIT). To access this data, one need only register as a BioLINCC user and submit a simple request form for review by a National Heart, Lung, and Blood Institute official that essentially requires a brief overview of your research needs, a research plan or protocol, proof of ethics committee review, the principal investigator's curriculum vitae, and a research materials distribution agreement.
The International Stroke Trial
The investigators who designed and conducted the International Stroke Trial (IST) have also been leaders in data sharing. The IST, conducted between 1991 and 1996, was a large, prospective, randomized controlled trial of nearly 20 000 individuals to determine whether early administration of aspirin, heparin, both, or neither influenced the clinical course of acute ischemic stroke.23 This trial was originally funded by multiple agencies, most prominently by the UK Medical Research Council, the UK Stroke Association, and the European Union BIOMED-1 program. These data have now been made available for public use, to facilitate the planning of future trials and to permit additional secondary analyses.24 As part of making these trial data available, the investigators explain the process of anonymization of the data and ethics committee review. This is a critical issue in data sharing, given that consent for publication of raw data is not routinely obtained from study subjects.
The Yale University Open Data Access Project
We have engaged in a project to promote and facilitate sharing of industry clinical trial data, led by Principal Investigator Dr. Harlan Krumholz, the Editor of Circulation: Cardiovascular Quality and Outcomes. The Yale University Open Data Access project, although not specifically directed at cardiovascular research, aims to create a model that can be applied to the complete range of medical interventions.25,26 Through a grant from Medtronic, Inc, we have developed a model to facilitate wider availability of clinical trial data and independent analysis by external investigators. As an initial effort, we are coordinating reviews of the safety and effectiveness of Medtronic's recombinant bone morphogenetic protein-2 product by 2 independent research groups and will subsequently disseminate all clinical research data regarding the product provided to us by Medtronic to external investigators. This effort is intended to provide a means to ensure access to comprehensive clinical research data currently owned by industry or sponsored by any other funder.
The Way Forward
Increased open science and information exchange through data sharing will further the value of all clinical trial research. Most of the data from clinical trials in cardiovascular medicine are currently not available to the scientific and clinical communities. When providers recommend treatment options to patients, this is routinely done on the basis of information that is biased and seriously incomplete. This standard of practice is tolerated not because it has any intellectual or ethical justification but because we are accustomed to it. The clinical and research community often only becomes aware of its shortcomings when safety concerns are raised about a drug, device, or other treatment strategy. Past experiences with rofecoxib,27 rosiglitazone,28 and oseltamivir29 have illustrated that it is in the public's interest to have access to comprehensive clinical trial data to ensure a complete understanding of drug or device safety and effectiveness.
Clinical trial data sharing, however, goes beyond product safety concerns. Science is a community, continually building on one another's ideas. In the era of electronic knowledge exchange, when open access to data has become an accepted norm in most of science, we need a better way of working together. Only by making individual patient data available to the whole research community can we derive full benefit from the enormous resources devoted to human clinical trial research and maintain patient trust in the research process.
Sources of Funding
This manuscript was not supported by any external funds. All authors receive support from Medtronic, Inc, to develop methods for clinical trial data sharing. The ideas and opinions expressed are the authors. The content of this publication does not necessarily reflect the views or policies of Medtronic, Inc and was not subject to review or approval prior to submission or publication. Dr Ross is supported by the National Institute on Aging (K08 AG032886) and by the American Federation for Aging Research through the Paul B. Beeson Career Development Award Program.
Drs Ross and Gross are members of a scientific advisory board for FAIR Health, Inc.
The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.
- © 2012 American Heart Association, Inc.
- Kastelein JJ,
- Akdim F,
- Stroes ES,
- Zwinderman AH,
- Bots ML,
- Stalenhoef AF,
- Visseren FL,
- Sijbrands EJ,
- Trip MD,
- Stein EA,
- Gaudet D,
- Duivenvoorden R,
- Veltri EP,
- Marais AD,
- de Groot E
US National Institutes of Health. ClinicalTrials.gov. Available at: http://www.clinicaltrials.gov/. Accessed February 5, 2012.
- Decullier E,
- Lheritier V,
- Chapuis F
- Dwan K,
- Altman DG,
- Arnaiz JA,
- Bloom J,
- Chan AW,
- Cronin E,
- Decullier E,
- Easterbrook PJ,
- Von Elm E,
- Gamble C,
- Ghersi D,
- Ioannidis JP,
- Simes J,
- Williamson PR
- Gøtzsche PC
- Vickers AJ
- Hrynaszkiewicz I,
- Altman DG
Institute of Medicine. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: National Academy Press; 2009.
Wellcome Trust. Sharing research data to improve public health: joint statement of purpose, signatories to the joint statement. Available at: http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-epidemiology/Signatories-to-the-joint-statement/index.htm. Accessed February 6, 2012.
- Hrynaszkiewicz I,
- Norton ML,
- Vickers AJ,
- Altman DG
US Department of Health and Human Services, National Institutes of Health, Office of Extramural Research. NIH Data Sharing Policy. April 17, 2007. Available at: http://grants.nih.gov/grants/policy/data_sharing/. Accessed February 6, 2012.
US Department of Health and Human Services, National Institutes of Health, National Heart Lung and Blood Institute. Biologic Specimen and Data Repository Information Coordinating Center. April 17, 2007. Available at: https://biolincc.nhlbi.nih.gov/home/. Accessed February 6, 2012.
International Stroke Trial Collaborative Group. The International Stroke Trial (IST): a randomised trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke. International Stroke Trial Collaborative Group. Lancet. 1997; 349: 1569– 1581 .
- Sandercock PA,
- Niewada M,
- Czlonkowska A
Yale School of Medicine. Yale University Open Data Access (YODA) Project. Available at: http://medicine.yale.edu/core/projects/yodap/index.aspx. Accessed February 6, 2012.
- Doshi P