Earlier discussions about genome
evolution in this and other blogs coincided with my being reminded of a Trends in Genetics paper in 2009 by Khalturin
et al. (1) on the subject of ‘orphan’ genes, and there have been two recent
papers on this topic, Tautz et al. (2) and Ranz & Parsch (3) that seem
worthy of comment here. Orphan genes
are individual genes or small gene families that are sequestered within
specific taxonomic groups, but have no known related genes outside the group. The term is of course a misnomer, and could
be highly misleading, because unless you’re a Creationist the DNA that’s a gene
today had to be in some genome somewhere ancestrally. But in what form and how did it arise?
An important generalization at
the core of modern molecular evolution is that evolution occurs by duplication
events. The idea is that genes have from early days been so structured that
they cannot arise just by random mutation of single nucleotides. But duplication is clearly only part of the
story: since every gene needs to be regulated, and regulatory elements are
shorter, more fluid in number and location than the genes they regulate, they
can arise more easily by mutation alone than whole genes can. Thus even in modern theory a combination of
duplication and ‘ordinary’ mutation is responsible for genome evolution. But if genes themselves (and/or their exons)
arise by duplication, that creates a family of related sequences. So how can there be orphan genes, without a
trace of relatives?
The above authors point out that in
every taxonomic group so far studied up to 20% or even more of its genes have
no recognizable homologues in other species.
Or, more properly, they are ‘Taxonomically Restricted Genes’ (TRG’s)
with perhaps a small gene family that is, however, found only within a specific
taxonomic clade. Given the idea that
genomes evolve by duplication, the prevalence of orphans needs some
explanation, and these authors basically provide three.
First, the authors argue that the
existing evidence suggests that the orphan genes fulfill some restricted
taxon-specific adaptive needs (e.g., specific functional cell-types in
cnidarians). If that is the case, the
relatives in collateral taxa must have lost their function, their trace erased
by mutation or deletion not opposed by selection. Khalturin et al. (1) suggest that “once a
certain evolutionary time has elapsed” sequence similarity to the ancestral
gene will be erased.
Mutational erasure is clearly possible
over time. Marshall et al. (4) tried to quantify the idea in 1994,
concluding that after about 10 million years, genes mutated into pseudogenhood
could no longer be revived by mutation (but retain enough sequence to still be
recognized as pseudogenes).
For this to be the case, we have
to assume the ‘parental’ gene(s), that must have existed if genes at the time
of the taxonomic split had previously been important when the new species branched
off, but then were later removed by drift or selection from the descendants of
the parental clade, while serving some strong, or new adaptive function in the
new clade. Presumably the parental taxa didn’t include a large gene family
related to the orphan, because if they did a large gene family would have been
serving one—or many—important functions at the time and there would likely
still be at least some of them around today.
We know that gene families can
persist in widespread branches of life, without sequence easily recognized by BLAST
or other homology searches, based on work by Kazz Kawasaki (5) in my group for
unusual kinds of proteins, such as the disordered proteins like those involved in
biomineralization, in which the protein 3D structure is not as important for
function as its ion-binding capacity. At
least one of the genes in the SCPP gene family (Amelogenin, responsible for
capturing Ca+ ions in forming dental enamel crystals) was considered an orphan
gene, until we identified its relatives (6).
Second, the orphan-paper authors
acknowledge that BLAST searching and our genome data bases are imperfect, so
that relatives of some of these orphans
may be eventually identified. However,
it seems unlikely that that will account for all of them, and there should be
some gene-age consequences both for the adaptive function (something new in the
clade, for example) and time for the ancestral genes to be erased. Some evidence cited by Tautz (2) is ambiguous
in regard to the estimated age and functions of orphans, so the picture is not
wholly clear. Is it more plausible that in
so many cases the homologies just haven’t been recognized for some reason,
including incomplete genomic data currently available?
Three, the explanation that is
most interesting is that orphan genes really are orphans in the sense of having
arisen de novo, without being copies
of functional genes. Tautz (2) and Ranz (3)
both suggest that regulatory sequences might arise near to DNA that has enough
of the structure of genes (start, stop, and splice sequences, proper coding
exons, polyA addition sites) to be transcribed as well as translated, and serve
some function that over-rode any possible toxicity a new protein might have in
the cell. Regulatory sequences usually
involve many different TF-binding elements, so may have been put in these
places by translocation events of such elements from other genes. Or in examples cited, the new gene may be in
an exon of an existing (and hence already regulated) gene.
Tautz (2) provide a step-by-step
scenario for de novo gene
creation. These authors recognize the
stretched plausibility of such ideas, given the seemingly miniscule probability
that functional genes—with strongly advantageous effects—could arise this
way. This is certainly a challenge to
Ohno!
One possibility mentioned is that
the recent discovery that much or most of DNA, including non-coding DNA, is
transcribed into RNA. The cell obviously
tolerates this RNA litter, which could make it more likely that occasionally
such an RNA has translatable properties.
Of course, one might suggest that any such RNA sequences are actually
the unrecognized fragmentary trash of long-dead genes. If de
novo creation were to happen often, most of the time selection would
perhaps remove it. But over millions of
years maybe it happens enough.
Could it be that the history of
discovery has misled us to become Ohno-ized?
We discovered interrupted genes, which led to theories of gene origin by
exon duplication (and many genes have repeat exons with high
duplication/deletion properties). Then
we discovered gene families, and this led to the obvious conclusion that
duplication was ‘the’ mode of genome evolution.
We excepted enhancer evolution which can easily come and go by normal
point mutation. But this led to the
discovery of the generality of gene families and focused attention on them, and
the networks in which they participate, and the related but diverging functions
they fill.
Could it be that instead, new
genes often really do arise by de novo
mechanisms, and disappear by deletion before they generate large gene
families? If they are old enough, they
and their paralogs would be less taxonomically restricted than if they are
recent. After all if you had a
phylogenetic dart board and randomly through darts, most would hit on some
branch, not at the very top: their descendants would be ‘taxonomically
restricted’. So it’s the lack of
collateral relatives rather than the taxonomic restriction that seems most
curious to me.
Other authors have suggested that
human orphan genes are often expressed in the brain, but that seems to me to be
yet another kind of forced human exceptionalism, because most genes old or new
are expressed in the brain. Likewise,
Khalturin (1) propose that taxon-specific genes “drive morphological
specification,” as part of “rewiring” of the networks of regulatory genes. But isn’t this always the case? Except in some special circumstances, traits
are usually affected by many interacting genes.
They seem to evolve by gradually diverging functions emerging from
selection acting (again gradually) on the diversity made possible by the
redundancy generated by gene duplication.
It seems rather unlikely that a newcomer of basically random structure
could participate in such a network (or be properly expressed in a relevant
tissue context) to experience strong positive selection.
Orphan genes may be simply be
lucky genes in complex systems that happened to survive for us to observe them--different
contributing genes, for different reasons including drift, surviving in
different taxa. The 20% of such genes
that are orphans may just be the normal passengers on the train of duplication
and deletion.
If de novo evolution is common, or more common than we thought
relative to gene duplication, we may have to revisit the strong evidence for
the evolution of gene evolution by exon duplication and the proliferation of
ancient gene families. Have we missed
something?
This could be a startling
realization. I’m sure many Mol. Evol. readers will know more about this than I do, and I’d like to
see what you think.
References
1. Khalturin, K, Hemmrich, G, Fraune, S, Augustin,
R, and Bosch, TCG. More
than just orphans: are taxonomically-restricted genes important in
evolution?Trends in Genetics 25(9): 404-413, 2009.
2. Tautz, D, and Domazet-Loso, T. The evolutionary origin of orphan genes. Nature Reviews Genetics 12(10):692-702,
2011.
3. Ranz, J, and Parsch, J. Newly evolved genes: Moving from comparative genomics to functional studies in model systems. BioEssays 34: 477-83, 2012.
4. Marshall, CR, Raff, EC, and Raff, RA. Dollo’s law and the death and resurrection of genes. PNAS 91(25): 12283-7, 1994.
5. Kawasaki, K, Buchanan, AV, and Weiss, KM.
Biomineralization in humans: making the hard choices in life. Ann. Rev Genet,43: 119-142, 2009.
6. Kawasaki, K, and Weiss, K. Mineralized tissue and vertebrate evolution: the secretory calcium-binding phosphoprotein gene cluster. PNAS 100:4060-65, 2003.