Sources

DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification.

DoCM currently captures variants from 876 unique publications.


How is the literature curated to produce DoCM?

Publications are identified for curation by disease experts and through review of published lists and resources that outline important cancer mutations. Mutations outlined in the literature are included in DoCM if they satisfy the criteria listed below.

Criteria for inclusion into DoCM

Clinical evidence

  • Drug targets associated with a mutation
  • Diagnostic or prognostic markers associated with a mutation

Functional evidence

  • Disease function described in cell lines
  • Disease function described in animal models
  • Extremely recurrent mutation coupled with expert opinion of the significance of the mutation

Using context clues of the citing literature, useful metadata is obtained from public datasets to uniquely identify a mutation (such as genomic position and transcript), map the mutation to a disease (using the disease ontology), and catalogue other useful information.

The following is an anecdotal example of the curation involved for the variant BRAF V600E. Typically the literature only lists the gene and amino acid change, requiring extensive curation to uniquely identify the variant. Correct genomic coordinates on a consistent genome build need to be identified, with accompanying nucleotide information. Occassionaly there are multiple nucleotide changes that are synonymous with a particular amino acid change. A representative transcript that correctly models the mutation described in the literature also needs to be specified. Cancer subtypes are specified using the disease ontology nomenclature. Tags can be added to an individual variant to provide useful metadata, examples include "pathogenic", "functional mouse model", "prognostic", and/or "activating".

Curation

Variants should be grouped into batches by commonalities like disease or mutation type if curated directly from the literature. Batches can also be created based on a publically available listing of variants that is in scope, like My Cancer Genome or the Drug Gene Knowledge Database. Batches can be submitted, following the batch submission instructions, on the batch submission page. Curators should annotate their curation process and explain the reasoning for including a batch into DoCM in the batch rationale statement on the submission form. This statement allows more transparency to the curation process and allows DoCM users to better understand why variants were included.

Following submission of a batch, the DoCM web app automatically annotates the variant using VEP, validates the the publications pubmed ids using PubMed, and validates the disease ontology ids using the disease ontology API. After annotation and validation, the variants are reviewed by the moderators listed on DoCM’s about page. DoCM moderators ensure that the a submitted batch contains no errors in annotation and validation, that the batch is in scope of the resource, and that the variants appear to be referenced in the listed literature. The moderator will start a dialogue with the submitter via email to correct any errors/discrepancies and then accept or reject the variants in the batch. These variants are then staged for inclusion in DoCM and upon submission of multiple batches the moderator can create a new version of the database.


Datasets that were curated and included in DoCM


Papers citing http://docm.info