Mouse Models of Human Cancer Database User Help Reference
- What information can I find in the Mouse Models of Human Cancer Database?
- How do I cite MMHCdb?
- How can I contribute my data to MMHCdb?
MMHCdb Basics
What information can I find in the Mouse Models of Human Cancer (MMHCdb) Database?
- The focus of MMHCdb is on in vivo mouse models, including:
- spontaneous and induced tumors in mice
- genetically engineered mouse models of cancer
- diversity panels (e.g., Diversity Outbred, Collaborative Cross, etc.)
- Patient Derived Xenograft (PDX) models
- Information about cancer models includes the spectrum of tumor types observed and the frequency of specific tumor types. We emphasize the effect of genetic background on the cancer characteristics of mouse models. For PDX models, genomics data for engrafted tumors and treatment response data for cohorts of tumor bearing mice are available for most models.
- The data about mouse models of human cancer in MMHCdb are acquired from the following sources:
- the published scientific literature, and
- the direct submission of model information and pathology images from cancer researchers.
- Priority for biocuration activities are determined by the novelty of the mouse model, the quality of the data, and the organ system involved. High priority is given to models associated with the cancers with the highest reported mortality in the United States population.
- MMHCdb reports negative as well as positive data. For example, strains of mice that are reported to have a zero frequency of a particular tumor type are included in the database.
- Except for PDX models, the mouse model information accessible from MMHCdb is NOT limited to the strains distributed by The Jackson Laboratory. However, if a strain listed in MMHCdb is distributed by The Jackson Laboratory, a link to the data sheet in the JAX Mice database is provided. MMHCdb collaborates with the European Bioinformatics Institute (EBI) to develop and maintain PDCM Finder, a global catalog of Patient Derived Cancer models which can be accessed at https://www.cancermodels.org
How do I cite MMHCdb?
MMHCdb is supported by grant CA89713, entitled "Electronic Access to Mouse Tumor Data", awarded to Carol J. Bult from the National Cancer Institute (NCI) of the National Institutes of Health (NIH).
Please use the following citation when referring to the Mouse Models of Human Cancer Database.
- Debra M. Krupke, Dale A. Begley, John P. Sundberg, Joel E. Richardson, Steven B. Neuhauser and Carol J. Bult, The Mouse Tumor Biology Database: A Comprehensive Resource for Mouse Models of Human Cancer., Cancer Res October 31 2017 77 (21) e67-e70.
If you wish to cite a specific area of MMHC we suggest a format similar to the following example:
- Some tumor data for this paper were retrieved from the Mouse Models of Human Cancer Database (MMHC, formerly MTB), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://tumor.informatics.jax.org/). (February, 2019 i.e., the date you retrieved the data cited).
How can I contribute my data about mouse models of human cancer to MMHCdb?
Submissions of supporting data for new and existing mouse models of human cancer from the research community are welcome. Contact User Support to request a consultation with an MMHCdb Biocuration Scientist about submitting.
Back to TopDynamic Tumor Frequency Grid
The dynamic tumor frequency grid presents the same information as the existing tumor frequency grid, but allows the data to be refined by the user.
Individual strain families and organ groups can be selected to generate a customized grid.
In the resulting grid, individual strains and organs can be selected to further refine the grid's contents. Use the check boxes to select the desired strain and organ rows and columns, then click the 'Generate Grid' button.
Back to TopAdvanced Search Results
We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support
Model Details
We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support
Strain Details
We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support
Reference Details
We have updated our user interface. New documentation is coming soon. If you need help contact us here User Support
PDX Search Form
PDX model identifier: A unique identifier assigned by the database management system to unambiguously identify a PDX model.
Primary cancer site: The primary cancer site is the anatomical site of the cancer origin. More than one primary site can be selected for a search.
Cancer type tags: Tags are used to group models that share clinical characteristics.
Diagnosis: Cancer diagnoses are standardized using terms from the Disease Ontology (DO). More than one term can be selected for a search.
PDX Dosing studies: PDXs that have been used in dosing studies can be searched by treatment and/or treatment responses. Treatment responses are based on modified RECIST criteria. Read more on dosing study design and interpretation here.
Tumor mutation burden (TMB): Tumor mutation burden is a measurement of the number of mutations carried by tumor cells. TMB is potentially a predictive biomarker to identify tumors that are likely to respond to immunotherapy. In the JAX collection of PDXs, a score of 22 is considered high TMB. Read more about how TMB is calculated here.
Gene fusion: Search for PDX models whose engrafted tumor harbors a gene fusion. Only gene fusions associated drug efficacy or cancer-related evidences are reported. Read more about the methods here.
Gene variants: Search for PDX models whose engrafted human tumors harbor specific gene variants. Gene symbols must be official HGNC symbols. Once a gene symbol is specified, the variants/mutations observed in the PDX collection are displayed. More than one variant/mutation per gene can be selected. Genes that can be searched are restricted to those genes on the JAX Cancer Treatment Profile (CTP) gene panel. Read more about the methods and results here.
Gene expression across PDX models: Displays a graphical summary of expression levels across PDX models for a gene. Only genes on the JAX CTP panel can be searched. Gene symbols must be official HGNC symbols. Read more about gene expression data here.
Gene amplification/deletion across PDX models: Displays a graphical summary of gene expression across PDX models for a gene with the bars representing expression colored according to amplification/deletion status of the gene. Only genes on the JAX CTP panel can be searched. Gene symbols must be official HGNC symbols. Read more about copy number aberration data here.
Back to TopPDX Search Results
PDX models matching the search criteria are displayed in a dynamic table.
The results can be sorted by any column. Columns can be resized or hidden.
Click the model ID to go to the model details page to see any additional data.
To send an email requesting additional information on PDX models select models using the check boxes and click the 'Request Details' button.
Back to TopPDX Details
Variant, expression and copy number data may not be available for all models.
Variant dataThe variant data (point mutations and indels) are analyzed from next-generation sequencing using various capture panels:
- Truseq (deprecated) - The Illumina TruSeq Amplicon Cancer Panel covers 48 cancer-related genes. Link.
- CTP - The JAX Cancer Treatment Profile panel covers 358 cancer-associated genes Link.
- Whole Exome (limited number of samples assayed) - Agilent SureSelect human exon capture.
The analysis of the sequencing output uses the Xenome tool to remove contaminating mouse sequences before alignment and variant calling.
BWA, GATK, and SnpEff are utilized for alignment (GRCh38 human reference), variant calling and annotation.
Field | Description |
---|---|
Model |
'T' or 'J' number ID of model |
Sample |
Alpha-numeric designation (followed by _model id number) |
Gene |
HGNC nomenclature |
Platform |
Capture panel used (Truseq-JAX, CTP or Whole Exome) |
Chromosome |
Chromosome number |
Seq Position |
Chromosomal position of variant start (in reference sequence) |
Ref Allele |
Nucleotide(s) present in reference sequence |
Alt Allele |
Nucleotide(s) present in sample |
Consequence |
Functional annotation of the variant |
Amino Acid Change |
Protein sequence change from reference |
RS variants |
Accession numbers for public databases (dbSNP, COSMIC) |
Read Depth |
Number of reads at variant site |
Allele Frequency |
Percentage of variant found as part of total alleles |
Transcript ID |
RefSeq accession for canonical transcript |
Filtered Rationale |
Indication of filters which the variant failed. Germline-Alt_AF_{percent} or PutativeGermline: A variant is predicted to be germline based on public databases and its alternate allele percentage frequency |
Passage Num |
Passage number of PDX tumor sample assayed |
Gene ID |
Accession number of gene in Entrez or Ensembl |
The expression data are analyzed from microarray or RNAseq platforms.
Affymetrix microarray HU133 or HG1.0ST (deprecated)
The arrays were processed with the AffyPLM R package, using quantile normalization,
no background correction, and fitting to a simple model that treats the log Intensity
as a sum of array effect, probe effect, and residual.
The array effect is the "summarized expression" that is equivalent to the median polished value produced by standard RMA analysis.
RNAseq
RNAseq data is first processed with Xenome to extract human sequences.
The human sequences are aligned to the transcriptome with Bowtie, and then expression levels are estimated by RSEM.
RSEM estimated counts are finally upper quantile normalized.
Fusion gene is analyzed using Xenome to extract human sequences and SOAPfuse.
Gene expression of the CTP panel genes is displayed as a chart of percentile rank z-score,
which measures each gene's model-specific expression in comparison with that gene in all models assayed by the same platform.
The mean and standard deviation for z-score calculation is obtained based on a fixed set of PDX samples for each platform.
Other forms of expression values (e.g. z-score, normalized expression) and expression of other genes not listed on the MMHCdb site can be made available upon request.
Genes flagged with hatched bars in gene expression chart
Some genes display a fair amount of heterogeneity in the normal population. This means that some may align poorly to the reference genome. The Genome Reference Consortium "provides multiple representations (alternate loci) for regions that are too complex to be represented by a single path." We have analyzed the data for several samples using both the primary build only and the primary build with the alternate loci. Using the extended reference genome introduces complications in interpreting the gene expression, so we have opted to use only the primary build.
We are flagging the genes where the alternate loci are sufficiently different from primary to caution users that expression of these genes could be artifactually lowered.
Gene Fusion
For gene fusion, the gene symbols upstream and downstream of the fusion are reported and whether the downstream fusion partner is frame-shift or in-frame-shift. Only those with associated drug efficacy or cancer-related evidences are reported to minimize false-positives. Other detected fusions and additional information (e.g. breakpoint coordinates) can be made available upon request.
Copy number
The copy number variation is analyzed from the Affymetrix Human SNP 6.0 array. PennCNV-Affy and ASCAT 2.2 are used to predict allele-specific copy number and ploidy. Gene-level copy number is obtained by intersecting copy number segments with genome coordinates of Ensembl genes. In cases where a segment boundary is contained within a gene's coordinates, the most conservative estimate of copy number is used.
In Gene CNV, the copy number of the CTP panel genes is displayed as a chart of log2(cn raw / sample ploidy). The CNV Plots present the difference from sample ploidy along the chromosomes (orange) and indicate where loss of heterozygosity occurs (blue).
Values for specific genes not listed on the MMHCdb site can be made available upon request.
Tumor mutation burden (TMB) estimation:
TMB was calculated using variants that
- (i) met all quality criteria (coverage, mapping quality etc.),
- (ii) were not present in an in-house curated blacklist of false positive variants from loci that prone to sequencing and analysis errors and/or are associated with highly polymorphic genes (i.e., MUC4, MUC5B, MUC16, MUC17, and HLA-A),
- (iii) are likely somatic mutations, and
- (iv) have a high or moderate functional impact (i.e., non-synonymous changes, frame-shifts, stop losses/gains, and splice-site acceptor/donor changes).
TMB was estimated by dividing the number of variants that met the criteria list above by the length (in Mb) of The Jackson Laboratory Cancer Treatment Profile (CTP) targeted gene panel.
We defined high TMB as 22 mutations/Mb, which was calculated based on the TMB distribution of all PDX models analyzed as follows: Q3 (third quartile of TMB) + 1.5 x inter-quartile range of TMB.
Microsatellite Instability (MSI)
The MSIsensor2 algorithm was used to determine MSI status of JAX samples. The samples with MSI-Percentage > 20% are considered MSI-High. This threshold demonstrates good differentiation between MSI-High (MSI-H) and MSI-Stable (MSI-S) samples during MSIsensor2 algorithm development and our internal benchmarking.
Back to TopOther detected fusions and additional information (e.g. breakpoint coordinates) can be made available upon request.
MMHCdb is supported by grant CA89713 from the National Cancer Institute (NCI).
Warranty Disclaimer & Copyright Notice
Send questions and comments to
User Support