Collecting Uniform Clinical Data for a Community Resource
, by Eva Tonsing-Carter, Ph.D.
The use of cancer cell lines to model diverse cancer types have several challenges. Most of the commonly used cancer cell lines in research do not have clinical or therapeutic outcome data of the participant from which the cell line was derived. The genomic relatedness of the cell line to the parent tumor is unknown, and molecular characterization including assessment of genomes and transcriptomes of these cell lines, until recently, were mostly unavailable. Diverse racial and ethnic groups, as well as rare cancers, are seldom represented in currently available cell lines. Additionally, cancer cell lines often do not represent certain molecular subtypes. To address these challenges, the Human Cancer Models Initiative (HCMI) was formed.
HCMI is an international consortium which focuses on generating novel, next-generation tumor-derived cancer models annotated with clinical, genomic, and molecular data as a community resource. The HCMI consortium founders include the National Cancer Institute (NCI), Wellcome Sanger Institute, Cancer Research UK, and Hubrecht Organoid Technology. The NCI arm of the consortium supports HCMI next-generation model development at four Cancer Model Development Centers (CMDCs).
To advance cancer research and further understand the relationship between in vitro findings and clinical biology, HCMI next-generation cancer models are associated with clinical data as well as molecular characterization data. To collect the associated clinical data, a cancer type-specific Case Report Form (CRF) is developed for each cancer type for which HCMI models are generated. As of June 2019, eighteen different enrollment and follow-up CRFs for glioblastoma, breast, colorectal, pancreas, pediatric cancers, and others have been developed.
The CRFs function to standardize the clinical data that are collected from participating Tissue Source Sites (TSSs) and are composed of standardized NCI common data elements (CDEs) with a controlled vocabulary or permissible values (PVs). Clinical Data Working Groups (CDWG) consist of cancer type-specific clinical experts including pathologists, oncologists, and surgeons from the United States, United Kingdom, and the Netherlands. They contribute to the content of the CRFs used to collect clinical data. A preliminary list of CDEs is generated according to guidelines from the World Health Organization, International Classification of Diseases for Oncology, and American Joint Committee on Cancer. The CDEs are used as a base on which the working groups build a final CRF. The clinical data include the participant’s demographics, prognostic factors, and specifics of the tumor including histological subtype, pathological staging, and grade. Each cancer type-specific CRF includes the current neoadjuvant and adjuvant therapy information and prognostic/predictive/lifestyle feature information based on the feedback from these experts.
A few patients donated tumor tissues from more than one anatomic site for model generation. Examples include a.) a primary tumor and a metastatic lesion, b.) a primary tumor and recurrence, c.) multiple metastases, or d.) a pre-malignant lesion and a primary tumor. With successful generation of multiple models per donor, the CRFs required updating to keep all the associated data in a single form. These CRFs utilize new CDEs to collect clinical data for distinct tissue types including “primary”, “metastatic/recurrent”, or “other” tissue as well as information linking a specific model to its corresponding originating tissue sample. Multiple model CRF design is available for several cancer types including melanoma, brain, hepatocellular, and rare cancers. It is not known if all cancer types will have donors with multiple models, however, the multiple model CRFs are available on the HCMI Resources webpage.
Because HCMI is an international consortium, consideration for differences in clinical data collection are also addressed during the CDWG meetings. Differences in terminology are discussed and incorporated into the CDEs. The Office of Cancer Genomics (OCG) works closely with the NCI’s Cancer Data Standards Registry and Repository (caDSR) to ensure that the clinical data elements and metadata used in the HCMI CRFs adhere to a uniform vocabulary and meaning. This data standardization will enable HCMI clinical data to be compatible not only across the global HCMI TSS locations but also across different groups and programs. The use of PVs allows for mapping of the clinical data to NCI’s Genomic Data Common’s (GDC) data dictionary, enabling users to search and filter the data.
Once the CDEs have been finalized, and they are registered, updated, or modified with NCI’s caDSR, the CDEs are submitted to the Clinical Data Center (CDC) at Information Management Systems. The CDEs are compiled into an interactive web-based electronic CRF (eCRF) where TSSs may submit the HCMI clinical data. The clinical data submitted by the TSSs are quality checked (QC’d) to ensure that no personal health information (PHI) is inadvertently submitted, and that the information submitted conforms to the PVs and type of information expected (e.g. numeric values for time interval questions). Once the clinical data are QC’d and any errors are addressed, the clinical data are approved and mapped to the GDC dictionary before submission. The finalized clinical data are submitted and stored at the GDC for access by the scientific community.
PDF versions of the CRFs are also generated and accessible on the HCMI Resources page: HCMI Case Report Forms. These CRFs can be utilized by anyone interested in collecting clinical data associated with cancer tissues for other projects. Check the HCMI Resources page for updates to the CRFs as new cancer types are added. The HCMI models and associated high-quality clinical data coupled with molecular characterization data provide the scientific community with a valuable resource for cancer research.