Informatics Tools Supported by ITCR
The Informatics Technology for Cancer Research (ITCR) Program funds open source, informatics tools that support research across the cancer continuum. All of the tools are free for use by academic and non-profit researchers. Access to tools, code repositories and introductory videos is available through the links in the table below.
Clinical Informatics
Tools to support use of clinical text in cancer research, including data from medical records.
Tool Name | Tool Description | Tool Resources |
---|---|---|
Apache Clinical Text and Knowledge Extraction System (cTAKES) and Cancer Deep Phenotype (DeepPhe) | Deep Phenotyping for Cancer (DeepPhe) uses natural language processing (extending the Apache clinical Text Analysis and Knowledge Extraction System [cTAKES] platform), visual analytics, and ontology-based summarization to extract cancer-specific information from electronic medical records. A range of output formats and API access are provided. | |
CLAMP-Cancer | CLAMP (Clinical Language Annotation, Modeling, and Processing Toolkit) is a tool to quickly build customized natural language processing pipelines for extracting cancer information from pathology reports, though user-friendly interface with minimum programming knowledge. | |
Electronic Medical Record Search Engine (EMERSE) | EMERSE is powerful, enterprise-grade, search and text processing software for unstructured electronic health record documents. It can be used for a variety of clinical research tasks including cohort identification, eligibility determination, and general data abstraction. EMERSE has been implemented at academical medical centers nationwide. | |
GARDE | GARDE is a clinical decision support platform that (i) executes algorithms on electronic health record data compliant with the FHIR standard to identify patients who meet genetic testing criteria for hereditary cancer; and (ii) uses a scripted chatbot to provide patients with education and access to genetic testing. | |
HemOnc Ontology | OMOP-compatible ontology of anticancer regimens & supporting publications, with >95,000 concepts and >290,000 relationships. A portion is available through OHDSI: athena.ohdsi.org. The full ontology is freely available to academic and non-commercial users (CC-BY-NC-SA 4.0 license); a drug-focused subset is available to any user (CC-BY 4.0 license). | |
mCodeGPT | This Python package harnesses the power of large language models like GPT-4 to automatically extract entities from raw text associated with a given cancer ontologies and Knowledge Bases. Input research text and mCodeGPT will output structured data in tabular format. | |
OpenInfoButton | Open source suite of Web services that enables integration of online clinical evidence resources with Electronic Health Record (EHR) systems using the Health Level Seven (HL7) Infobutton Standard. We are extending the OpenInfobutton suite to include automatic summarization of clinical evidence resources for cancer prevention. | |
Personalized Cancer Therapy | Curated website cataloguing clinically actionable information for personalized cancer therapy including clinically significant genetic variants and drug-gene association. |
Digital Pathology and Dermatologic Imaging
Tools and resources to support analysis and sharing of digital pathology and dermatologic images.
Tool Name | Tool Description | Tool Resources |
---|---|---|
Cancer Digital Slide Archive | The CDSA is a web-based platform to support the sharing, managment and analysis of digital pathology data. The Emory Instance currently hosts over 23,000 images from The Cancer Genome Atlas, and the software is being developed within the ITCR grant to be deployable as a digital pathology platform for other labs and Cancer Institutes. | |
HistoQC | HistoQC is an open-source tool which examines slides for artifacts and computes metrics associated with slide presentation characteristics (e.g., stain intensity, compression levels) helping to quantify ranges of acceptable characteristics for downstream algorithmic evaluation. | |
International Skin Imaging Collaboration Archive | The ISIC Archive is a web-based platform to support the sharing, management and analysis of dermatology imaging data. We currently host over 70,000 public images. The ITCR funding is to support improved dataset curation for various educational and machine learning purposes as well as integration of reflectance confocal microscopy data. | |
KiNet | KiNet is an open-source software tool for automated Ki67 labeling index assessment in digital pathology images. | |
Quantitative Imaging in Pathology (QuIP) | Quantitative Imaging in Pathology (QuIP) is a digital microscopy software platform for cancer research. It consists of microservices and web applications that support viewing, annotation, and data management of whole slide tissue images and Pathomics image analyses. |
Epigenetics
Tools for the analysis of epigenetic data, including chromatin accessibility and histone modifications.
Tool Name | Tool Description | Tool Resources |
---|---|---|
Cistrome | Curated and processed human/mouse ChIP/DNase-seq datasets in GEO, allowing users to search, browse, download ChIP-seq data signals, peaks, QC, motifs, target genes and similar datasets. | |
Cistrome Explorer | Cistrome Explorer is a powerful web-based visual analytics tool for exploring chromatin accessibility, histone modifications and transcription factor binding data across >200 samples. Compare patterns between cell types and conditions, and search Cistrome Data Browser for TFs bound near genes and in genomic intervals of interest. | |
Enhancer Linking by Methylation/Expression Relationships (ELMER) | R tool for analysis of DNA methylation and expression datasets. Integrative analysis allows reconstruction of in vivo transcription factor networks altered in cancer along with identification of the underlying gene regulatory sequences. | |
Integrated Methods for Predicting Enhancer Targets (IM-PET) | IM-PET is an open source software tool for identifying target genes of transcriptional enhancers. It uses four statistical features derived from transcriptomic, genomic, and epigenomic data. IM-PET is also available as a Docker container. | |
Lisa: Epigenetic Landscape In Silico deletion Analysis | Lisa is a web-based tool and Python software package for predicting transcriptional regulators of differentially expressed gene sets. Lisa combines histone modification ChIP-seq and chromatin accessibility profiles to identify cis-regulatory regions, and uses transcription factor ChIP-seq to determine relevant regulatory transcription factors. | |
MethCon5 | The goal of methcon5 is to identify and rank CpG DNA methylation conservation along the human genome. | |
MIRA: Multimodal models for Integrated Regulatory Analysis |
MIRA is a Python software package for analyzing single cell RNA-seq, ATAC-seq and multimodal data. MIRA includes effective batch correction methods for both modalities and infers gene regulatory mechanisms and regulatory transcription factors through integrative analyses of joint RNA and chromatin accessibility modalities. |
Genomics and Variant Interpretation
Tools to study genomic data including variant calling, variant interpretation, extra-chromosomal DNA analysis, and prediction of enhancer targets.
Tool Name | Tool Description | Tool Resources |
---|---|---|
AmpliconArchitect (AA) | AmpliconArchitect (AA) is a tool which can reconstruct the structure of focally amplified regions in a cancer sample using whole genome sequence short paired-end data. A full description of the methods and detailed characterization of copy number amplifications and ecDNA in cancer can be found in the following manuscript. | |
Amplicon Reconstructor (AR) | Reconstructs complex variation and disambiguates Amplicon Architect based reconstructions using BioNano optical mapping data and an NGS-derived breakpoint graph. | |
cBioPortal for Cancer Genomics | The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets. | |
Clinical Interpretation of Variants in Cancer (CIViC) Knowledgebase | CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer. Our goal is to enable precision medicine by providing an educational forum for dissemination of knowledge and active discussion of the clinical significance of cancer genome alterations. | |
CNVnator/CNVpytor | A tool for detection of somatic, subclonal, mosaic, and germline CNVs from sequencing. | |
ecSeg | This repository is the official version of ecSeg, a tool used to quantify ecDNA in DAPI-stained images. It also has an extension to analyze FISH probes. | |
FaNDOM | FaNDOM is a fast and open source method for aligning Bionano Saphyr optical map molecules and contigs to a reference, with the goal of identifying structural variation in genomes. FaNDOM utilizes a seed-based filter to speed up the searches. FaNDOM is implemented in C++ and supports multithreading. | |
Gaussian Mixture Model and Proportion Test (GMAP) |
The spatial organization of the genome plays a critical role in regulating gene expression. GMAP is an open source software tool for modeling 3-dimensional chromosomal domains using Hi-C data. GMAP is also available as a Docker container. | |
Gosling |
Gosling is a grammar-based toolkit for scalable and interactive genomics data visualization. It allows creating a wide range of custom visualizations for genome-mapped data with unique interactive features, such as brushing, linking, and advanced zooming techniques. | |
Integrated Methods for Predicting Enhancer Targets (IM-PET) | IM-PET is an open source software tool for identifying target genes of transcriptional enhancers. It uses four statistical features derived from transcriptomic, genomic, and epigenomic data. IM-PET is also available as a Docker container. | |
Integrative Genomics Robust iDentification of cancer subgroups (InGRiD) | InGRiD (Integrative Genomics Robust iDentification of cancer subgroups) is a statistical approach to improve prediction of cancer subgroups and identification of key genes and pathways by integrating information from biological pathway databases. | |
Lancet | Local-assembly based somatic variant caller for short read sequencing data. Lancet detects somatic SNVs, indels, and more complex mutations by jointly analyzing reads from tumor and matched normal samples using colored de Bruijn graphs. | |
Lancet2 | Next generation of Lancet, a local-assembly based somatic variant caller for short read sequencing data, with improved runtime and genotyping performance. | |
OpenCRAVAT | OpenCRAVAT is a cancer variant annotator that is easy-to-use, open source, and includes 300+ modular tools. With a professional quality GUI, easy installation, and local, web, and cloud versions, it makes high-throughput variant annotation accessible to both researchers and clinicians. | |
Personalized Cancer Therapy | Curated website cataloguing clinically actionable information for personalized cancer therapy including clinically significant genetic variants and drug-gene association. | Grant Info |
SampleMap | Projects samples into a GoogleMap explorer to allow overlaying omics attributes to spot patterns quickly. A TumorMap version was built for TCGA analysis. | |
UCSC Xena | UCSC Xena securely analyzes and visualizes your private functional genomics data set in the context of public and shared genomic/phenotypic data sets such as TCGA, GDC, ICGC, TARGET, and the UCSC RNA-seq recompute compendium (TCGA + TARGET + GTEx). | |
WebMeV | Web MeV (Multiple-experiment Viewer) is a web/cloud-based tool for genomic data analysis. Web MeV is being built to meet the challenge of exploring large public genomic data set with intuitive graphical interface providing access to state-of-the-art analytical tools. |
Imaging and Radiation Research
Tools and resources for analysis of radiology images and for radiation research.
Tool Name | Tool Description | Tool Resources |
---|---|---|
3D Slicer | 3D Slicer is the free open source software for medical image visualization and analysis. | |
Cancer Imaging Phenomics Toolkit (CaPTk) | Quantitative image analysis package offering radiomic features and machine learning signatures for oncologic images, as well as predictive tools especially for breast and brain cancer. Both command-line and UI-based versions. CapTk is distributed under a BSD-style license. | |
DICOM ToolKit (DCMTK) | DCMTK is an open source DICOM toolkit. | |
Federated Tumor Segmentation (FeTS) | FeTS is an open-source toolkit with a user-friendly graphical user interface, that aims to i) bring pre-trained segmentation models closer to clinical experts and researchers, and ii) allow secure multi-institutional collaborations via federated learning to improve these pre-trained models without sharing patient data, thereby overcoming legal, privacy, and data-ownership challenges. |
|
Laboratory for Individualized Breast Radiodensity Assessment (LIBRA) | LIBRA is a fully-automatic breast density estimation software solution based on a published algorithm that works on either raw (i.e., “FOR PROCESSING”) or vendor post-processed (i.e., “FOR PRESENTATION”) digital mammography images. LIBRA has been applied to over 30,000 screening exams and is being increasingly utilized in larger studies. | |
LesionTracker / OHIF Web Image Viewer | Extensible open-source zero-footprint web image viewer for oncology imaging. "LesionTracker" is a web browser based platform for viewing and measuring lesion metrics for tracking oncology trials. |
|
PyRadiomics | PyRadiomics is an open-source platform for informatics developments for radiographic phenotyping using automated engineered and deep learning technologies. With this platform, we aim to establish a reference standard for radiomic analyses, provide a tested and maintained resource, and grow the radiomic developers' community. |
|
Quantitative Image Informatics for Cancer Research (QIICR) Tools Catalog | A collection of tools to promote the adoption of quantitative imaging tools and DICOM standard in medical imaging research. |
|
RadxTools | RadxTools is a new image informatics toolkit specifically geared towards developing modules that enable (a) creation of spatially aligned deeply-annotated radiology-pathology datasets, (b) quantitative characterization of specialized tumor response-specific features on post-treatment imaging, and (c) quality control of radiomic features. |
|
SlicerDMRI | Diffusion magnetic resonance imaging for neurosurgical planning in 3D Slicer open-source software. |
|
The Cancer Imaging Archive (TCIA) | TCIA is NCI’s repository for publicly shared cancer imaging data. TCIA collections include radiology and pathology images, clinical and clinical trial data, image derived annotations and quantitative features and a growing collection of related ‘omics data both from clinical and pre-clinical studies. | |
Tools for Quantitative Analysis of PET Imaging | A collection of extensions for 3D Slicer to enable quantitative analysis of PET imaging data: image normalization, segmentation, and extraction of quantitative indices. | Grant Info
|
TOPAS | Monte Carlo simulation of ionizing radiation transport for medical applications with focus on therapy and imaging with x-rays, electrons, protons and all other forms of ionizing radiation. | |
XNAT | XNAT is an open source imaging informatics platform designed to support institutional image repositories, image-based clinical trials, and translational imaging research. |
|
Immuno-oncology
Tools to informatics to advance immuno-oncology including neoantigen prediction/characterization and analysis of the immune repertoire.
Tool Name | Tool Description | Tool Resources |
---|---|---|
CEDAR | The Cancer Epitope Database and Analysis Resource (CEDAR) catalogs experimental data on antibody and T cell epitopes in the context of cancer. CEDAR can be queried for epitopes, antigens, assays, and publications linked to specific categories of epitopes. CEDAR also hosts tools for predicting and analyzing cancer epitopes. | |
HLA-Arena | HLA-Arena is a customizable environment for the structural modeling and analysis of peptide-HLA complexes for cancer immunotherapy. It is implemented using Jupyter Notebook and Docker. It integrates sequence-based tools MODELLER and MHCflurry with structure-based tools APE-Gen and DINC. It includes support for structural analysis and visualization. | |
Network Analysis of Immune Repertoire (NAIR) | NAIR, Network Analysis of Immune Repertoire, is an R package using network analysis and machine learning to explore immune receptor (TCR/BCR) sequences. Users can search and visualize disease-associated/shared sequence clusters and perform downstream analysis. It supports amino acid and nucleotide sequences, multiple distance metrics, and bulk and single-cell data. | |
Tools for neoantigen characterization and personalized vaccine design (pVACtools) | Neoantigens are tumor specific antigens arising from somatic mutations that can be targets for cancer immunotherapy. pVACtools enables identification, prioritization, and clinical application of neoantigens and supports all major sources of neoantigens and associated immunogenicity factors. |
Informatics Platforms
Platforms that support diverse data types and analyses.
Tool Name | Tool Description | Tool Resources |
---|---|---|
Bioconductor | The Bioconductor project offers over 2,000 open-source software and data packages and is essential to modern cancer genomics research. ITCR supports key infrastructure benefiting thousands of researchers from academia, government, and industry who use and develop Bioconductor tools, furthering our knowledge and treatment approaches for cancer. | |
CancerModels.Org | CancerModels.Org is an open global research platform for patient-derived cancer models (PDCMs), including patient-derived xenografts, organoids, and cell line models. It is the largest open catalog of harmonized PDCMs and associated clinical, genomic, and functional data from academic and commercial providers. | |
cBioPortal for Cancer Research | The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets. | |
Galaxy | Galaxy is a Web-based computational workbench that anyone can use to analyze large biomedical datasets. There are >9500 software tools and visualizations integrated into Galaxy for analysis of single-cell and bulk omics, live cell and molecular imaging, machine learning, microbiome, and much more. Galaxy also includes many ITCR tools and visualizations. | |
Galaxy-P-Multi-omics | A unified platform for integrative genomic-proteomic-metabolomic data analysis and informatics in cancer research. | |
GenePattern | GenePattern is an analysis platform providing hundreds of tools for the analysis of multiple molecular data types. A web interface provides easy access to these tools and allows the creation of multi-step reproducible analysis pipelines. The GenePattern Notebook environment allows users to combine GenePattern analyses with Jupyter notebooks to create reproducible research narratives. | |
Globus | Cancer researchers use Globus for secure data transfer and sharing, task automation, and function execution, across distributed and heterogeneous environments at large scales. Tool developers integrate Globus services into data repositories and biomedical applications, leveraging Globus identity management, single-sign on, authorization, and search capabilities. | |
OncoMX | OncoMX is a cancer biomarker database with mutation and expression metadata related to the study of cancer alongside relevant experimental and functional annotations. | |
OpenChallenges | OpenChallenges.io is a centralized hub for biomedical challenges that empowers participants with the most up-to-date information about relevant challenges, while providing organizers with standardized challenge event templates and intelligence. | |
Overture | Overture is a collection of modular, open-source software components designed to make the management of big-data genomic projects both easy and more cost effective. The Overture system promotes FAIR data sharing of genomic datasets by overcoming the major obstacles in storing, managing, exploring, and distributing massive genome-scale datasets. | |
QIIME 2 | QIIME 2 is an end-to-end microbiome data science platform that is widely used for investigation of the human microbiome, including diverse research across the cancer research continuum. QIIME 2 was designed to ensure reproducible bioinformatics and to facilitate accessibility by users with varying degrees of computational sophistication. | |
UCSC Xena | UCSC Xena securely analyzes and visualizes your private functional genomics data set in the context of public and shared genomic/phenotypic data sets such as TCGA, GDC, ICGC, TARGET, and the UCSC RNA-seq recompute compendium (TCGA + TARGET + GTEx). |
Network Biology
Tools and resources for network-based analyses.
Tool Name | Tool Description | Tool Resources |
---|---|---|
Gene Set Enrichment Analysis (GSEA) and Molecular Signatures Database (mSigDB) | GSEA is software for identifying sets of genes representing the activation or dysregulation of biological processes or pathways in molecular data. GSEA can distinguish even subtle differences between phenotypes or cellular states and elucidate underlying mechanisms. The MSigDB is a companion resource of annotated gene sets for use with GSEA. | |
NDEx – The Network Data Exchange | NDEx, The Network Data Exchange, is an online data commons where scientists can upload, share, and publicly distribute biological networks and pathway models. It also provides the NDEx IQuery gene set analysis web app based on NDEx networks. Developers can integrate their programs and web applications with NDEx and IQuery. |
Proteomics and Protein Structure
Tools and resources for protein-based analysis including protein-protein interactions, kinase prediction, and protein expression.
Tool Name | Tool Description | Tool Resources |
---|---|---|
Averon Notebook | Averon Notebook provides a computational environment to identify new therapeutically actionable mechanisms of oncogenic signaling enabled by mutant-directed protein-protein interactions to inform target discovery in cancer. | |
DINC 2.0 | DINC 2.0 is a parallelized meta-docking method for the incremental docking of large ligands (currently using AutoDock Vina). | |
Galaxy-P-multi-omics | A unified platform for integrative genomic-proteomic-metabolomic data analysis and informatics in cancer research. | |
KinPred | A unified and sustainable approach for harnessing proteome-level human kinase-substrate predictions. | |
KSTAR | Prediction of kinase activities from phosphoproteomic data. | |
The Cancer Proteome Atlas | The Cancer Proteome Atlas is a comprehensive bioinformatic resource for assessing, visualizing and analyzing the functional proteomics data of patient tumor and cell line samples. |
Transcriptomics
Tools and resources for the analysis of transcriptome data, including bulk data, single-cell data, and spatial transcriptomics.
Tool Name | Tool Description | Tool Resources |
---|---|---|
ARCHS4 | ARCHS4 provides gene and transcript counts uniformly processed from all the human and mouse RNA-seq samples from GEO. | |
cBioPortal for Cancer Genomics | The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets. | |
DANA | DANA (DAta-driven Normalization Assessment) guides the selection of a depth normalization method in a microRNA sequencing data set, using biology-motivated and data-driven metrics. For each method, its metrics allows assessment of (1) how effectively normalization removes handling effects and (2) how normalization biases true biological signals. | |
Integrated Methods for Predicting Enhancer Targets (IM-PET) | IM-PET is an open source software tool for identifying target genes of transcriptional enhancers. It uses four statistical features derived from transcriptomic, genomic, and epigenomic data. IM-PET is also available as a Docker container. | |
MIRA: Multimodal models for Integrated Regulatory Analysis |
MIRA is a Python software package for analyzing single cell RNA-seq, ATAC-seq and multimodal data. MIRA includes effective batch correction methods for both modalities and infers gene regulatory mechanisms and regulatory transcription factors through integrative analyses of joint RNA and chromatin accessibility modalities. | |
PRECISION.seq | PRECISION.seq enables an objective and systemic evaluation of depth normalization methods for microRNA sequencing, in the context of differential expression analysis, using realistically distributed and robustly benchmarked data. Users can assess their chosen methods and compare their performance to nine methods already implemented in the package. | |
spatialGE R package | spatialGE R package is an analytical suite for the visualization and spatial statistics analysis of spatially-resolved transcriptomics data. The spatialGE R package features a data object to store data and results from multiple tissue sections, as well as associated methods to detect spatial patterns in the expression of individual genes and gene sets. | |
spatialGE web application | spatialGE is a user friendly, point-and-click, web application that facilitates spatial transcriptomics data analysis. This application contains a collection of analytical methods for visualization and statistical analysis of the tissue microenvironment. Users can create accounts to manage their data and projects, as well as generate figures and tables to report results. | |
Texomer | Texomer deconvolutes allele-specific copy number and mRNA expression levels from paired exome and transcriptome sequencing data of tumor tissue samples and outputs variants associated with tumor specific transcriptional regulation. | |
Trinity CTAT | Trinity Cancer Transcriptome Analysis Toolkit (CTAT) including de novo transcriptome assembly with downstream support for expression analysis and focused analyses on cancer transcriptomes, incorporating mutation and fusion transcript discovery, and single cell analysis. | |
TumorDecon | TumorDecon is an open source Python package that includes several deconvolution methods for estimating the relative abundance of cell types in a given tumor from its gene expression profile. | |
UCSC Xena | UCSC Xena securely analyzes and visualizes your private functional genomics data set in the context of public and shared genomic/phenotypic data sets such as TCGA, GDC, ICGC, TARGET, and the UCSC RNA-seq recompute compendium (TCGA + TARGET + GTEx). |
Visualization
Tools to visualize high-dimensional datasets.
Tool Name | Tool Description | Tool Resources |
---|---|---|
cBioPortal for Cancer Genomics | The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets. | |
Gosling | Gosling is a grammar-based toolkit for scalable and interactive genomics data visualization. It allows creating a wide range of custom visualizations for genome-mapped data with unique interactive features, such as brushing, linking, and advanced zooming techniques. | |
IGV | Integrative Genomics Viewer (IGV) is a high-performance, easy-to-use, interactive tool for the visual exploration of genomic data. It supports flexible integration of common genomic data types, investigator-generated or publicly available, loaded from local or cloud sources. Available in multiple forms: IGV desktop application; igv.js embeddable JavaScript component; IGV-Web in-browser app. |
|
JBrowse 2 | JBrowse 2 is the new version of the JBrowse genome browser, available as a desktop application, a web application, and embeddable web components. It expands on JBrowse 1 by adding support for showing multiple visualizations, including synteny, circular, breakpoint detail, and tabular views. It also has integrations with Jupyter notebooks and R, and a growing library of plugins. |
|
Next Generation Clustered Heat Maps |
Next-Generation (Clustered) Heat Maps are interactive heat maps that enable the user to zoom and pan across the heatmap, alter its color scheme, generate production quality PDFs, and link out from rows, columns, and individual heatmap entries to related statistics, databases and other information. | |
The Cancer Dependency Map Portal | The DepMap Portal empowers researchers to make discoveries related to cancer vulnerabilities by providing open access to key datasets, analytical tools, and visualizations. | |
UCSC Xena | UCSC Xena securely analyzes and visualizes your private functional genomics data set in the context of public and shared genomic/phenotypic data sets such as TCGA, GDC, ICGC, TARGET, and the UCSC RNA-seq recompute compendium (TCGA + TARGET + GTEx). |
ITCR Connectivity Map
The ITCR Connectivity Map shows the interactions between projects funded under ITCR program and the bioinformatic tools they create. This map is maintained and hosted by the NDEx team at UC San Diego.