Mouse Models of Human Cancer Database User Help Reference

What information can I find in the Mouse Models of Human Cancer Database?
How do I cite MMHCdb?
How can I contribute my data to MMHCdb?

MMHCdb Basics

What information can I find in the Mouse Models of Human Cancer (MMHCdb) Database?

The focus of MMHCdb is on in vivo mouse models, including:
- spontaneous and induced tumors in mice
- genetically engineered mouse models of cancer
- diversity panels (e.g., Diversity Outbred, Collaborative Cross, etc.)
- Patient Derived Xenograft (PDX) models
Information about cancer models includes the spectrum of tumor types observed and the frequency of specific tumor types. We emphasize the effect of genetic background on the cancer characteristics of mouse models. For PDX models, genomics data for engrafted tumors and treatment response data for cohorts of tumor bearing mice are available for most models.
The data about mouse models of human cancer in MMHCdb are acquired from the following sources:
- the published scientific literature, and
- the direct submission of model information and pathology images from cancer researchers.
Priority for biocuration activities are determined by the novelty of the mouse model, the quality of the data, and the organ system involved. High priority is given to models associated with the cancers with the highest reported mortality in the United States population.
MMHCdb reports negative as well as positive data. For example, strains of mice that are reported to have a zero frequency of a particular tumor type are included in the database.
Except for PDX models, the mouse model information accessible from MMHCdb is NOT limited to the strains distributed by The Jackson Laboratory. However, if a strain listed in MMHCdb is distributed by The Jackson Laboratory, a link to the data sheet in the JAX Mice database is provided. MMHCdb collaborates with the European Bioinformatics Institute (EBI) to develop and maintain PDCM Finder, a global catalog of Patient Derived Cancer models which can be accessed at https://www.cancermodels.org

How do I cite MMHCdb?

MMHCdb is supported by grant CA89713, entitled "Electronic Access to Mouse Tumor Data", awarded to Carol J. Bult from the National Cancer Institute (NCI) of the National Institutes of Health (NIH).

Please use the following citation when referring to the Mouse Models of Human Cancer Database.

Debra M. Krupke, Dale A. Begley, John P. Sundberg, Joel E. Richardson, Steven B. Neuhauser and Carol J. Bult, The Mouse Tumor Biology Database: A Comprehensive Resource for Mouse Models of Human Cancer., Cancer Res October 31 2017 77 (21) e67-e70.

If you wish to cite a specific area of MMHC we suggest a format similar to the following example:

Some tumor data for this paper were retrieved from the Mouse Models of Human Cancer Database (MMHC, formerly MTB), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://tumor.informatics.jax.org/). (February, 2019 i.e., the date you retrieved the data cited).

How can I contribute my data about mouse models of human cancer to MMHCdb?

Submissions of supporting data for new and existing mouse models of human cancer from the research community are welcome. Contact User Support to request a consultation with an MMHCdb Biocuration Scientist about submitting.

Dynamic Tumor Frequency Grid

The dynamic tumor frequency grid presents the same information as the existing tumor frequency grid, but allows the data to be refined by the user.

Individual strain families and organ groups can be selected to generate a customized grid.

In the resulting grid, individual strains and organs can be selected to further refine the grid's contents. Use the check boxes to select the desired strain and organ rows and columns, then click the 'Generate Grid' button.

Advanced Search Results

We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support

Model Details

We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support

Strain Details

We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support

Reference Details

We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support

PDX Search Form

PDX model identifier: A unique identifier assigned by the database management system to unambiguously identify a PDX model.

Primary cancer site: The primary cancer site is the anatomical site of the cancer origin. More than one primary site can be selected for a search.

Cancer type tags: Tags are used to group models that share clinical characteristics.

Diagnosis: Cancer diagnoses are standardized using terms from the Disease Ontology (DO). More than one term can be selected for a search.

PDX Dosing studies: PDXs that have been used in dosing studies can be searched by treatment and/or treatment responses. Treatment responses are based on modified RECIST criteria. Read more on dosing study design and interpretation here.

Tumor mutation burden (TMB): Tumor mutation burden is a measurement of the number of mutations carried by tumor cells. TMB is potentially a predictive biomarker to identify tumors that are likely to respond to immunotherapy. In the JAX collection of PDXs, a score of 22 is considered high TMB. Read more about how TMB is calculated here.

Gene fusion: Search for PDX models whose engrafted tumor harbors a gene fusion. Only gene fusions associated drug efficacy or cancer-related evidences are reported. Read more about the methods here.

Gene variants: Search for PDX models whose engrafted human tumors harbor specific gene variants. Gene symbols must be official HGNC symbols. Once a gene symbol is specified, the variants/mutations observed in the PDX collection are displayed. More than one variant/mutation per gene can be selected. Genes that can be searched are restricted to those genes on the JAX Cancer Treatment Profile (CTP) gene panel. Read more about the methods and results here.

Gene expression across PDX models: Displays a graphical summary of expression levels across PDX models for a gene. Only genes on the JAX CTP panel can be searched. Gene symbols must be official HGNC symbols. Read more about gene expression data here.

Gene amplification/deletion across PDX models: Displays a graphical summary of gene expression across PDX models for a gene with the bars representing expression colored according to amplification/deletion status of the gene. Only genes on the JAX CTP panel can be searched. Gene symbols must be official HGNC symbols. Read more about copy number aberration data here.

PDX Search Results

PDX models matching the search criteria are displayed in a dynamic table.

The results can be sorted by any column. Columns can be resized or hidden.

Click the model ID to go to the model details page to see any additional data.

To send an email requesting additional information on PDX models select models using the check boxes and click the 'Request Details' button.

PDX Details

Variant, expression and copy number data may not be available for all models.

Variant data

The variant data (point mutations and indels) are analyzed from next-generation sequencing using various capture panels:

Truseq (deprecated) - The Illumina TruSeq Amplicon Cancer Panel covers 48 cancer-related genes. Link.
CTP - The JAX Cancer Treatment Profile panel covers 358 cancer-associated genes Link.
Whole Exome (limited number of samples assayed) - Agilent SureSelect human exon capture.

The analysis of the sequencing output uses the Xenome tool to remove contaminating mouse sequences before alignment and variant calling.

BWA, GATK, and SnpEff are utilized for alignment (GRCh38 human reference), variant calling and annotation.

Field	Description
Model	'T' or 'J' number ID of model
Sample	Alpha-numeric designation (followed by _model id number)
Gene	HGNC nomenclature
Platform	Capture panel used (Truseq-JAX, CTP or Whole Exome)
Chromosome	Chromosome number
Seq Position	Chromosomal position of variant start (in reference sequence)
Ref Allele	Nucleotide(s) present in reference sequence
Alt Allele	Nucleotide(s) present in sample
Consequence	Functional annotation of the variant
Amino Acid Change	Protein sequence change from reference
RS variants	Accession numbers for public databases (dbSNP, COSMIC)
Read Depth	Number of reads at variant site
Allele Frequency	Percentage of variant found as part of total alleles
Transcript ID	RefSeq accession for canonical transcript
Filtered Rationale	Indication of filters which the variant failed. Germline-Alt_AF_{percent} or PutativeGermline: A variant is predicted to be germline based on public databases and its alternate allele percentage frequency
Passage Num	Passage number of PDX tumor sample assayed
Gene ID	Accession number of gene in Entrez or Ensembl

Expression data

The expression data are analyzed from microarray or RNAseq platforms.

Affymetrix microarray HU133 or HG1.0ST (deprecated)

The arrays were processed with the AffyPLM R package, using quantile normalization,

no background correction, and fitting to a simple model that treats the log Intensity

as a sum of array effect, probe effect, and residual.

The array effect is the "summarized expression" that is equivalent to the median polished value produced by standard RMA analysis.
RNAseq

RNAseq data is first processed with Xenome to extract human sequences.

The human sequences are aligned to the transcriptome with Bowtie, and then expression levels are estimated by RSEM.

RSEM estimated counts are finally upper quantile normalized.

Fusion gene is analyzed using Xenome to extract human sequences and SOAPfuse.

Gene expression of the CTP panel genes is displayed as a chart of percentile rank z-score,

which measures each gene's model-specific expression in comparison with that gene in all models assayed by the same platform.

The mean and standard deviation for z-score calculation is obtained based on a fixed set of PDX samples for each platform.

Other forms of expression values (e.g. z-score, normalized expression) and expression of other genes not listed on the MMHCdb site can be made available upon request.

Genes flagged with hatched bars in gene expression chart

Some genes display a fair amount of heterogeneity in the normal population. This means that some may align poorly to the reference genome. The Genome Reference Consortium "provides multiple representations (alternate loci) for regions that are too complex to be represented by a single path." We have analyzed the data for several samples using both the primary build only and the primary build with the alternate loci. Using the extended reference genome introduces complications in interpreting the gene expression, so we have opted to use only the primary build.

We are flagging the genes where the alternate loci are sufficiently different from primary to caution users that expression of these genes could be artifactually lowered.

Gene Fusion

For gene fusion, the gene symbols upstream and downstream of the fusion are reported and whether the downstream fusion partner is frame-shift or in-frame-shift. Only those with associated drug efficacy or cancer-related evidences are reported to minimize false-positives. Other detected fusions and additional information (e.g. breakpoint coordinates) can be made available upon request.

Copy number

The copy number variation is analyzed from the Affymetrix Human SNP 6.0 array. PennCNV-Affy and ASCAT 2.2 are used to predict allele-specific copy number and ploidy. Gene-level copy number is obtained by intersecting copy number segments with genome coordinates of Ensembl genes. In cases where a segment boundary is contained within a gene's coordinates, the most conservative estimate of copy number is used.

In Gene CNV, the copy number of the CTP panel genes is displayed as a chart of log2(cn raw / sample ploidy). The CNV Plots present the difference from sample ploidy along the chromosomes (orange) and indicate where loss of heterozygosity occurs (blue).

Values for specific genes not listed on the MMHCdb site can be made available upon request.

Tumor mutation burden (TMB) estimation:

TMB was calculated using variants that

(i) met all quality criteria (coverage, mapping quality etc.),
(ii) were not present in an in-house curated blacklist of false positive variants from loci that prone to sequencing and analysis errors and/or are associated with highly polymorphic genes (i.e., MUC4, MUC5B, MUC16, MUC17, and HLA-A),
(iii) are likely somatic mutations, and
(iv) have a high or moderate functional impact (i.e., non-synonymous changes, frame-shifts, stop losses/gains, and splice-site acceptor/donor changes).

TMB was estimated by dividing the number of variants that met the criteria list above by the length (in Mb) of The Jackson Laboratory Cancer Treatment Profile (CTP) targeted gene panel.

We defined high TMB as 22 mutations/Mb, which was calculated based on the TMB distribution of all PDX models analyzed as follows: Q3 (third quartile of TMB) + 1.5 x inter-quartile range of TMB.

Microsatellite Instability (MSI)

The MSIsensor2 algorithm was used to determine MSI status of JAX samples. The samples with MSI-Percentage > 20% are considered MSI-High. This threshold demonstrates good differentiation between MSI-High (MSI-H) and MSI-Stable (MSI-S) samples during MSIsensor2 algorithm development and our internal benchmarking.

Other detected fusions and additional information (e.g. breakpoint coordinates) can be made available upon request.

Citing MMHCdb

MMHCdb is supported by grant CA89713 from the National Cancer Institute (NCI).

Warranty Disclaimer & Copyright Notice

Send questions and comments to

User Support

MMHCdb Basics

What information can I find in the Mouse Models of Human Cancer (MMHCdb) Database?

How do I cite MMHCdb?

How can I contribute my data about mouse models of human cancer to MMHCdb?

Dynamic Tumor Frequency Grid

Advanced Search Results

Model Details

Strain Details

Reference Details

PDX Search Form

PDX Search Results

PDX Details

Model

Sample

Gene

Platform

Chromosome

Seq Position

Ref Allele

Alt Allele

Consequence

Amino Acid Change

RS variants

Read Depth

Allele Frequency

Transcript ID

Filtered Rationale

Passage Num

Gene ID

Affymetrix microarray HU133 or HG1.0ST (deprecated)

RNAseq

Genes flagged with hatched bars in gene expression chart

Gene Fusion

Copy number

Tumor mutation burden (TMB) estimation:

Microsatellite Instability (MSI)