Introduction


The purpose of this forum is to introduce notable papers and books published by you and other persons. The work can be new or old, but it should be of wide interest and high quality. A brief comment on the significance of the work should be attached. The initial categories of the subjects are (1) protein evolution, (2) gene evolution, (3) genomic evolution, (4) adaptation, (5) symbiosis and evolution, (6) sex determination, (7) speciation, (8) phenotypic evolution, (9) behavioral evolution, (10) molecular phylogeny, (11) human evolution, (12) animal evolution, (13) plant evolution, and (14) microbial evolution. Emphasis will be given on the biological work rather than on the mathematical. Any person may post a paper by sending it to one of the editors listed below. We also welcome your comments on posted work, but we moderate all the comments to control spam. This forum is primarily for scientific discussion and to construct a database for good molecular evolution papers.


Tuesday, April 9, 2013

Rapid Evolution of Resistance to Toxic Chemicals in Atlantic Tomcod

Contributed by: Mary Morales, Rice University


The melanic form of peppered moth Biston betularia in England has been widely used as an example of natural selection in action. Before the industrial revolution in England around 1850 the melanic form of peppered moth was rarely observed in nature and most peppered moth were the light-colored wild type, but by the end of the 19th century it was the dominant phenotype as soot pollution caused the environment to darken. Even though the English peppered moth is a great example of natural selection caused by pollution, molecular basis of this evolution is still not elucidated. Furthermore, the population is not isolated due to constant migration. This makes it difficult to accurately infer the effect of environmental change on survival of natural moth populations.
As in the case of the peppered moth, the Atlantic tomcod Microgadus tomcod of the Hudson River experienced a rapid evolutionary change caused by pollution. For the Atlantic tomcod of the Hudson River, water pollution caused by the improper disposal of toxic chemicals served as the cause of natural selection. Increase in the toxicity of its environment forced the fish to adapt to an environment heavily polluted by polychlorinated biphenyls (PCBs).  Recently, the molecular basis of the evolutionary change was clarified, making the Atlantic tomcod the first well understood case of evolution driven by change of environments (1).

Figure 1. Atlantic tomcod from the Hudson River


Over a million pounds of PCBs was dumped by General Electric into the Hudson River between 1947 and 1976 before the dumping was banned (1). Since PCBs are long lasting chemicals due to their sedimentation in the soil and resistance to degradation, their influence in the Hudson River was significant, and is still significant today (2). They cause serious health issues in animals exposed to them such as cancer through their stimulation of the aryl hydrocarbon receptor (AHR), which is an important transcription factor (2). Two genes encode for the AHR (AHR1 and AHR2) (1). Although AHR1 and AHR2 are both expressed in fish, AHR2 is the more active protein (1).


Figure 2. Frequencies of AHR2-1 and AHR2-2 alleles in tomcod populations in and near the Hudson River. n indicates the number of specimens analyzed in a given location. From Wirgin et al. (1).
 

Wirgin et al (1) studied the Atlantic tomcod population of the Hudson River and nearby estuaries and found that there are two alleles, AHR2-1 and AHR2-2, at the AHR2 locus. These two variants are distinguished by a mutation that evokes a functional difference. The major difference between the two alleles is that the AHR2-1 exhibits a six base pair deletion causing it to be two amino acids (Phe-Leu) shorter (1).
Fish expressing the AHR2-1 protein have a significantly lower affinity for toxins, preventing them from turning on genes that should not be activated (1). Water contamination by PCBs created a strong selection force in favor of the AHR2-1 allele, as it creates a PCB resistant phenotype that enables the fish to survive and reproduce in a toxic environment (3). The frequency of the AHR2-1 allele in the Hudson River population is extremely high (99%), with the non-mutated allele (AHR2-2) only observed in heterozygotes (1). The allele frequencies of AHR2-1 were also examined in other Atlantic tomcod populations from 7 Atlantic Coast estuaries. As shown in Figure 2, the allele frequency of AHR2-1 is significantly lower in populations located more distantly from Hudson River. Specifically, the AHR2-1 allele is even absent in the four populations in distant locations. 
The different distributions of AHR2-1 and AHR2-2 alleles suggest that the wild type allele AHR2-2 is dominant in populations in non-PCBs polluted areas. The frequency of the PCB-resistant allele AHR2-1 has apparently increased rapidly from a very low level to 99% in ~ 60 years in the Hudson River.
In addition, the low frequency of AHR2-1 in other fish populations suggests that the Atlantic tomcod population of the Hudson River is relatively isolated, unlike the English peppered moth. Thus the increase in frequency of the AHR2-1 allele was not hindered by factors such as migration. The case of the Atlantic tomcod is noteworthy because for the first time the selection coefficient can be estimated if the initial frequency can be obtained in some way. Studies that aim to calculate the selection coefficient in favor of the AHR2-1 allele in the Atlantic tomcod population of the Hudson River are highly encouraged.

 
References
 
1.  Wirgin, I., N.K. Roy, M. Loftus, R.C. Chambers, D.G. Franks, M.E. Hahn. 2011. Mechanistic Basis of Resistance to PCBs in Atlantic Tomcod from the Hudson River. Science 331: 1322-1324
2.  “Toxic River means rapid evolution for one fish species.” 2011. Understanding Evolution. Web.  <http://evolution.berkeley.edu/evolibrary/news/110301_pcbresistantcod>.
3.  Roy, N.K., S.C. Courtenay, R.C. Chambers, I.I. Wirgin. 2006. Characterization of the aryl hydrocarbon receptor repressor and a comparison of its expression in Atlantic tomcod from resistant and sensitive populations. Environmental Toxicology and Chemistry 25: 560-571.


Wednesday, February 27, 2013

ENCODE - All about "Functions"

Contributed by: Jianzhi Zhang

In 2012, the 288-million-dollar ENCyclopedia Of DNA Elements (ENCODE) project concluded in its 442-author Nature article that 80% of the human genome are functional. The completion of this massive study was regarded by both Nature and Science as one of the most important scientific accomplishments of the year. But are these 442 scientists correct?
Dan Graur and five colleagues have now analyzed the ENCODE paper. Their conclusion is shocking, to say the least. They found that the ENCODE conclusion was erroneous and was reached mainly “(1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect”. The number and seriousness of the errors and inconsistencies they identified from the ENCODE paper are horrifying. One cannot help but wonder where the reviewers of the ENCODE project/paper were and what the 288 million dollars would accomplish if used to fund 200 NIH R01 grants.
I recommend Graur et al.’s paper to all biologists, evolutionary or not, because it deals with some of the most fundamental concepts in biology with unusual clarity and wit. It also prompts one to ponder on the pros and cons of big science vs. small science. Last but not least, if you are intelligent, you will enjoy reading this 43-page PDF.

References 
The ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. 
Graur, D., Zheng, Y., Price, N., Azevedo, R. B. R., Zufall1 R. A., and Elhaik E. 2013. On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol., published February 20, 2013, doi:10.1093/gbe/evt028

Wednesday, December 12, 2012

Ortholog Conjecture Debated


Contributed by: Jianzhi Zhang
 
Most molecular biologists would agree that a gene tends to be more similar to its orthologs than paralogs in terms of function.  This fundamental tenet, recently termed the ortholog conjecture, is a cornerstone of phylogenomics and is used by both computational and experimental biologists in predicting, interpreting, and understanding gene functions.  But, is this conjecture wishful thinking or empirically founded?
 
Orthologous genes arise via speciation, whereas paralogous 
              genes are generated by gene duplication.
Orthologs: A1 and A2; B1 and B2.
Within-species paralogs: A1 and B1; A2 and B2.
Between-species paralogs: A1 and B2; A2 and B1.

In a pioneering study, Nehrt et al. (3) attempted to test the ortholog conjecture using Gene Ontology (GO) annotations that were based on experimental data.  Contrary to everyone’s expectation, they found that the functional similarity between orthologs is lower than that between paralogs, when the level of sequence divergence is controlled.  Based on this and other findings, the authors proposed that protein function evolution is primarily determined by “the cellular context in which proteins act”.  This would explain why within-species paralogs, which are always in the same organism, were found functionally more similar than orthologs, which by definition reside in different organisms.

Nehrt et al.’s (3) finding stirred considerable controversies in cyberspace when published in the summer of 2011, evidenced by numerous discussions in various blogs.  The last 10 months have seen three papers that challenged Nehrt et al.’s conclusion from different angles, although the three papers do not completely agree with one another either. 

First, Thomas and colleagues, representing the group that annotated GO, claimed that GO annotation differences between homologous genes “do not reflect differences in biological function, but rather complementarity in experimental approaches” (4).  That is, gene function data are so sparse at the present that GO annotations reflect ascertainment biases in experiments rather than true functional differences. 

Second, Altenhoff et al. (1) identified a number of biases in GO.  After correcting these biases, they found weak but significant evidence for the ortholog conjecture.

Most recently, Chen and Zhang (2) reanalyzed GO annotations and confirmed some of the biases identified by Altenhoff and colleagues.  Most disturbingly, however, was the finding of many errors in GO annotation.  Even in so-called experiment-based annotations, across-species functional inferences were frequently made.  For example, an experiment was conducted on a monkey gene, but the function was annotated in GO for its human ortholog, based ironically on the ortholog conjecture. 

In one part of their study, Chen and Zhang (2) focused on pairs of orthologs or paralogs that have identical protein sequences and were studied in the same papers.  Surprisingly, while all nine such paralogous pairs have 100% GO-based functional similarity, only nine of 31 such orthologous pairs have 100% functional similarity.  More extremely, eight of the 31 orthologous pairs show 0% functional similarity, yet none of the papers that studied them explicitly mentioned their functional dissimilarity.  Apparently, they reflect ascertainment biases rather than true functional differences.  The authors also noted an upward trend in the functional similarity of orthologs, relative to that of paralogs, when analyzing the time series data of GO in the last five years. 

These and other findings led Chen and Zhang (2) to conclude that the current GO is unsuitable for testing the ortholog conjecture.  They thus turned to RNA-Seq gene expression data, which would be relative immune to ascertainment bias and annotation error.  They reported that orthologs are more similar to each other than to paralogs in gene expression.  But, regarding gene function, the jury is still out.  The sheer difficulty of proving or rejecting the ortholog conjecture, one of the most wildly assumed principles of molecular evolution, was completely unexpected, and it still amazes me to this day.   



References

1. Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C (2012) Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs. PLoS Comput Biol 8(5): e1002514.


3. Nehrt NL, Clark WT, Radivojac P, Hahn MW (2011) Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals. PLoS Comput Biol 7(6): e1002073.

4. Thomas PD, Wood V, Mungall CJ, Lewis SE, Blake JA, et al. (2012) On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report. PLoS Comput Biol 8(2): e1002386.
 

Tuesday, December 4, 2012

Speciation Driven by Divergence in Heterochromatic Repeats



Contributed by: Zhenguo Lin

 Darwin used the title "On the origin of species" for his most famous book published in 1859. In this book he explained how a single species changes over time, but did not provide a proper explanation about how a species split into two or more different species. The problem of speciation has now become an important subject in evolutionary biology. From Hugo de Vries, Theodosius Dobzhansky, and Ernst Mayr to contemporary workers such as Jerry Coyne and Allen Orr, this problem has been studied extensively.

In this case it seems to be crucial to study speciation at the molecular level. In their recent review article, Nei and Nozawa (1) emphasized the importance of mutations in speciation by presenting many cases of molecular studies.  One of the mechanisms they considered is hybrid incapacity associated with heterochromatin. Specifically, they stated that hybrid sterility or inviability may occur by changes in repeat DNA elements in heterochromatin regions of the genome. Two representative examples were presented in the review article. (1) The different numbers of 359 bp repeats (zygote hybrid rescue locus, Zhr) caused hybrid inviability between Drosophila melanogaster males and D. simulans females.  (2) The localization of Odysseus homeobox (OdsH) protein to heterochromatic Y chromosome causes hybrid male sterility between D. mauritiana females and D. simulans males.  Recently conducting a comparative study of the genomic sequences from two closely related flycatcher bird species, Ellegren et al. (2) suggested that the divergence of complex genomic repeat structures (centromere and telomeres) may have generated the two species.


Figure 1 a, Male collared flycatcher. b, Male pied flycatcher. (From Ellegren et al. (2)).

 The collared flycatcher Ficedula albicollis and the pied flycatcher Ficedula hypoleuca diverged less than 2 million years ago.  They look very similar except for the presence of white collar in the former species (Figure 1). The authors from Uppsala University in Sweden have sequenced the ~1.1Gb genomic regions for 10 unrelated males in each species. By comparing these genomic regions, the author identified 50 "divergence islands", which show significantly high levels of sequence divergence between the two species. The length of an "island" ranges from 100 kb to 3 Mb, with a mean of 625kb. Interestingly, these “islands” are over-represented in the telomere or centromere regions, which are rich in repeat structures (Figure 2).  After detailed analyses of various evolutionary patterns of these "divergence islands" , such as  local mutation rates,  levels of nucleotide diversity,  allele-frequency spectra,  levels of linkage disequilibrium and shared polymorphisms,  the authors confirmed that these islands have experienced parallel selection in each species. Although no direct evidence was provided to support how these "divergence islands" contributed to the speciation, the authors believed that these observations "raise the possibility that centromeres or other heterochromatic repeats themselves are the driver of speciation" (2).

  

Figure 2. Distribution of divergence measured as the density of fixed differences per bp for 200-kb windows across the genome. Chromosomes are listed in numerical order and are separated by gaps. Red horizontal bars show the approximate location of centromeres in homologous chromosomes of zebra finch. Open read symbols are used to indicate that avian microchromosomes are generally acro- or telocentric. Both ends of these chromosomes are labeled as the orientation is not known. For chromosomes 4, 6 and 8, there is a lack of an in situ mapped marker 5′ of the centromere in zebra finch. (from Ellegren et al. (2)).


References
1. Nei, M. and Nozawa, M. (2011), 'Roles of mutation and selection in speciation: from Hugo de Vries to the modern genomic era', Genome Biol Evol, 3, 812-29.
2. Ellegren, H., et al. (2012), 'The genomic landscape of species divergence in Ficedula flycatchers', Nature. doi:10.1038/nature11584