Gene annotations and nomenclature

Svein-Ole Mikalsen, Marni Tausen and Sunnvør í Kongsstovu have published the paper “Phylogeny of teleost connexins reveals highly inconsistent intra- and interspecies use of nomenclature and misassemblies in recent teleost chromosome assemblies” in BMC Genomics.

This work has been done at the University of Faroe Islands, where Svein-Ole Mikalsen is professor in molecular biology, Sunnvør í Kongsstovu is a PhD student, and Marni Tausen took his BSc in biology back in 2015.

A short and popular summary is found below, but the complete paper (with approximately 150 pages of supplementary materials) can be found here. 

Summary of the paper

When we go to the library, we are able to find what we are looking for, because the library has a good archive system, which includes the overview of the book titles, the authors, when the books were printed, and so on.

Vertebrates, like humans and herring, have large and complex genomes. The genome can be considered the library of the genetic properties in the organism. The making such an overview of the genes in the genome (corresponding to the books in the library)  is called annotation.

The recommendations for naming of genes are made by international nomenclature committees, and they are reasonable simple. They can be summarized into three points. (i) Each gene in a species should have a unique name, (ii) the same gene in different species should have the same name, and (iii) the names of the genes in the human genome should give the standard for the names.

In this work, we checked the status of a group of genes, called connexins, in a number of fish genomes using two of the largest genetic databases in the world, called GenBank and Ensembl. The fishes were eel, herring, zebrafish, cod, three-spined stickleback, Japanese pufferfish and green spotted pufferfish. We found that the naming status of the connexin genes in fishes is rather chaotic, and the naming rules are violated in all thinkable ways. We even found a human gene that was wrongly named.

Having good practices for annotation and naming genes may seem like a minor detail, but comparisons of genomes is a very important part in understanding also how the human genome works. If we are comparing the wrong genes with each other, we may end with wrong conclusions, kind of similar to believing that we are borrowing a book from Conan Doyle while in reality it is one from Dostoyevsky because of errors in the archive system in the library.