Collecting and Harmonizing Clinical Data Across an International Initiative

August 1, 2018, by Caitlyn Barrett, Ph.D.

NCI's Human Cancer Models Initiative (HCMI) is an international consortium generating patient-derived next-gen cancer models and case-associated data as a community resource.

The Human Cancer Models Initiative (HCMI) is an international consortium with the goal to produce and distribute up to 1,000 novel human cancer models. These models will be developed from a wide range of patient tumor types using next-generation culturing techniques, such as organoids and conditional reprogramming. Not only will the sequencing data acquired from the models, parent tumor, and normal tissue be accessible to the wider research community, but the models will also be associated with clinical data. Data from the NCI-funded Cancer Model Development Centers (CMDCs) will be found at the National Cancer Institute’s (NCI) Genomic Data Commons (GDC), and European consortium member data will be at the European Genome-phenome Archive (EGA). The founders of the HCMI at the NCI, Wellcome Sanger Institute, Cancer Research UK, and the foundation Hubrecht Organoid Technology recognize the importance of clinical data to the utility of HCMI models, as it provides context regarding the patient’s exposures and treatments, as well as vital diagnostic information. In the process of developing a system to provide these data, participants in the initiative have learned about the trials and rewards of collecting clinical information, both from different institutions and distinct countries. During a face-to-face meeting in June 2018, the HCMI consortium members shared their experiences with clinical data collection, and this perspective piece aims to share challenges as well as resolutions and opportunities that have arisen in response to these challenges.

The first necessity to providing thorough and accurate clinical data is establishing collaborations in which there is a clinical team that is fully dedicated to the goals of the project. Surgeons, oncologists, nurses, clinical research coordinators, and the institution all need to buy-in to the goals of the program to ensure its success. The HCMI has been very fortunate to be working with hospitals around the world that recognize the benefits of these models. When challenges arise (and they do), tissue source sites address them. The first step for the HCMI was to determine the clinical data elements that would be collected for each tumor type. With models being developed from cancers ranging from pediatric tumors, such as osteosarcoma and Wilms tumor, to common and rare cancers including breast cancer, lung cancer, gastric cancer, and ampullary cancer, clinical data working groups have been convened to address the important clinical data to be collected for 19 distinct tumor types with more anticipated in the future.

Working groups are comprised of oncologists, surgeons, pathologists, and clinical research coordinators from tissue source sites in the US, UK, and Netherlands. Members are asked to identify important clinical data collected at their sites, while keeping in mind the goal of providing information that will be useful to downstream model users. Of critical import is ensuring that the data are standardized so that they can be compared regardless of their origin. The development of translation tables has been important for guarenteeing consistency, especially in terms of demographic information, such as ethnicity, which is collected using different standard vocabulary depending on the country in which it is collected. As an added assurance, after the clinical data working groups have met, it is the job of the NCI, in collaboration with the Cancer Data Standards Registry and Repository (CaDSR) to ensure that the metadata and vocabulary of the clinical data elements enable longevity and consistency of the data across all NCI programs. The selected clinical data elements are then used by the NCI Clinical Data Center to produce Clinical Report Forms (CRFs) for each tumor type. The completed CRFs are posted on the HCMI Resources page, with more to be added as new models are developed.

Once the CRFs have been established, implementation in the form of clinical data collection is the goal of HCMI-associated clinical sites. As sites have begun the process of data collection and submission, they have identified mechanisms by which they can address challenges. For instance, there are complexities to collection of data from patients that have switched hospitals; complications that are compounded by the fact that each hospital has a different mechanism for data collection and storage, as well as specific restricted vocabulary, making data merging and validation complicated. In addition, merging data from multiple sources often requires significant informatics support. Thus, tissue source sites rely on teams of knowledgeable and experienced data extractors and informaticians to bring disparate data into a format consistent with HCMI requirements. Development of these teams and standard operating procedures for data collection have been essential. Moreover, sites have begun to develop flexible and practical systems for data collection that can be applied at multiple institutions and allow for data entry using assorted mechanisms—from Excel spreadsheets and tab-delimited files to web-based submission systems. Finally, a robust quality control process must be in place, especially for data that are collected and/or entered into databases manually. HCMI contributors have established quality control pipelines that integrate not only the clinical sites but also the model developers and, in the case of the NCI-supported CMDCs, the Clinical Data Center which passes data through strict validators and checks for accuracy and completeness. Once the data have been collected, they then need to be submitted to a unified data repository that has been designed to enable data sharing. In the case of HCMI data, they will either be submitted to the aforementioned GDC (for US data) or EGA (for European data). At the conclusion of this process, the data are accessible to the research community.

The successful implementation of a clinical data collection-to-distribution process requires contributions from a range of people of varying expertise. From the NCI to the tissue source sites, the HCMI is addressing challenges to clinical data collection so the data that is ultimately distributed to the research community is of the highest quality. Please visit the HCMI website to see the fruits of the efforts of the many contributors to the initiative.

Many thanks go out to the following for providing tissues and clinical data and contributing their expertise and perspectives on clinical data collection to the HCMI: Office of Cancer Genomics external consultant Martin Ferguson; Dianne Reeves, Associate Director for Clinical Research Programs in CBIIT; Northwell Health and Feinstein Institute for Medical Research; Cambridge University Hospital; University Hospital Southampton NHS; University Hospital Birmingham NHS; Cancer Research UK Glasgow Centre; University Medical Center Utrecht; The Netherlands Cancer Institute; Meander Medisch Centrum; Ziekenhuis St. Antonius; Boston Children’s Hospital; Dana-Farber Cancer Institute; Brigham and Women’s Hospital; Boston Medical Center; and the Rare Cancer Research Foundation.