Mice are extremely important as the premier model organism in human biomedical and mammalian genetic research. The genomes of several tens of mouse inbred strains have been sequenced. They have been compared to the genome of C57BL/6J, considered by convention as the reference genome. Based on a comparison of this reference genome with 36 other sequenced mouse strains, we generated an overview of all protein-coding genes that are deviant in this reference genome, compared with consensus protein-coding mouse gene sequences. We provide PROVEAN scores, reflecting the likelihood that these C57BL/6J proteins have lost function. We thus identified numerous abnormal proteins, and biological pathways, specifically present in C57BL/6J, suggesting the important caveats of this reference mouse strain, and linking candidate genes to some of the best-known phenotypes of this strain.
Steven Timmermans, Claude Libert
Schematic overview of the work flow and the comparison of the C57BL/6J genome sequence with the newly generated consensus protein sequences and the generation of the lists of (highly) specific C57BL/6J variations.