Wednesday, November 05, 2008

Evolution by Gene Duplication

Chymotrypsin (Monday's Molecule #95), trypsin, and elastase are enzymes that digest proteins in the stomach and intestine. All three enzymes have a similar mechanism of hydrolysis characterized by the presence of a catalytic triad of amino acid side chains consisting of aspartate, histidine, and serine residues. The serine side chain is directly involved in catalyzing the cleavage of proteins and that's why these enzymes are called serine proteases.

The three enzymes differ in specificity. Chymotrypsin cleaves foreign proteins primarily at tyrosine (Tyr) resides, trypsin is specific for cleavage at arginine (Arg) or lysine (Lys) resideus, and elastin cleaves at alanine (Ala) residues.

The genes for the three enzymes are homologous and the structures of the three enzymes are very similar as shown below (left: chymotrypsin [PDB 5CHA], middle: trypsin [PDB 1TLD], right: elastase [PDB 3EST]).

The active sites of the enzymes are slightly different so that specificity depends on which amino acid side chains of the substrate protein fit into the binding pocket.

It's reasonable to suppose that the primitive enzyme could bind weakly to many different substrates and cleave many different kinds of proteins inefficiently. An ancient gene duplication allowed one copy of the gene to evolve toward a much more active enzyme that cleaved only at certain residues. A second gene duplication gave rise to a third enzyme that cleaved at another residue. Finally the remaining gene evolved into a very active enzyme that cut at a third position.

The end result was a set of three enzymes that could cut up any protein into small peptides that can be taken up by the cells lining the intestine. The original non-specific enzyme was slower and less efficient.

This is an example of evolution by gene duplication and the important point is that the ancestral gene probably encoded a non-specific enzyme that could carry out several different reactions with different substrates. It's not a question of the duplicated copy evolving an entirely new specificity. Instead, the duplicated gene usually "perfects" an already existing minor activity by becoming more specific. Meanwhile, the other copy can also be selected for enhanced specificity for another substrate.

This model also explains the evolution of lactate dehydrogenase and malate dehydrogenase (Evolution and Variation in Folded Proteins) and the pyruvate dehydrogenase family (Pyruvate Dehydrogenase Evolution).


  1. I'd call the genes for chymotrypsin, trypsin and elastase in the same species paralogs, rather than homologs.

    Since we're speculating, I think more likely that the ancestral enzyme (arbitrarily trypsin for example) before gene duplication was extraordinarily efficient at one particular reaction only, equivalent to modern-day trypsin. The subsequent gene duplication allowed free divergence of the extra gene and rapid evolution under selection when a different but useful specificity arose by chance. Extra copies that didn't enjoy positive selection became pseudogenes. The original trypsin gene, just kept on being trypsin. If anything, now that chymotrypsin and elastase are around, there is less selective pressure on trypsin than when trypsin was the whole show, so modern-day trypsin may be less efficient than the ancestral precursor enzyme.

  2. anonymous says,

    I'd call the genes for chymotrypsin, trypsin and elastase in the same species paralogs, rather than homologs.

    You are correct to refer to them as paralogous genes and not orthologs. They are also homologous.

    All members of a gene family in all species are homologous because they descend from a common ancestor. The terms paralogous and orthologous are subsets of homologous.