Sunday, March 18, 2018

What is "dark DNA"?

Some DNA sequencing technologies aren't very good at sequencing and assembling DNA that's rich in GC base pairs. What this means is that some sequenced genomes could be missing stretches of GC-rich DNA if they rely exclusively on those techniques. This difficult-to-sequence DNA was called "dark DNA" in a paper published last summer (July 2017).

The paper looked at some missing genes in the genome of the sand rat Psammomys obesus. The authors initially used a standard shotgun strategy in order to sequence the sand rat genome. They combined millions of short reads (<200 bp) to assemble a complete genome. A large block of genes seemed to be missing—genes that were conserved and present in the genomes of related species (Hargraves et al., 2017). They knew the genes were present because they could detect the mRNAs corresponding to those genes.

Hargraves et al. isolated GC-rich DNA and sequenced it using a different technique. This revealed the missing DNA and the missing genes. As expected, the entire block of DNA, containing 88 genes, had a high percentage of GC base pairs relative to AT base pairs. The authors attribute this to insertion of GC-rich repeats and to a phenomenon known as "gene conversion."

Gene conversion, or more appropriately, biased gene conversion, is associated with recombination. Recombination results in stretches of DNA containing mismatched base pairs such as A:C or G:T. The mismatches must be repaired to restore the normal GC and AT base pairs. There's plenty of evidence showing a bias in the repair process such that the final product favors GC pairs over AT pairs. This is biased gene conversion.1

Biased gene convesion leads to a gradual increase in GC content in regions of the genome that are hotspots for recombination. This is a well-understood and reasonable explanation of the GC-rich region in the sand rat genome.

Biased gene conversion and GC-rich regions are not new. What's new in the paper is the idea that large regions of the genome may be missing from a genome assembly because of limitations of standard sequencing technology. This is "dark DNA."

A species related to the sand rat also has a high GC content in the same region suggesting that the shift to high GC content occurred before their last common ancestor. GC-rich genes are missing from the chicken genome assembly suggesting that dark DNA may be more common than anyone suspected.

If that's all there is to the story we probably wouldn't have head about it. However, the lead author of the paper, Adam Hargreaves, is mainly interested in how changes in the genome can lead to innovation and adaptation. He wrote an article last summer for The Conversation in which he emphasized the possible role of dark DNA in evolution [Introducing ‘dark DNA’ – the phenomenon that could change how we think about evolution].

He said,
Most textbook definitions of evolution state that it occurs in two stages: mutation followed by natural selection. DNA mutation is a common and continuous process, and occurs completely at random. Natural selection then acts to determine whether mutations are kept and passed on or not, usually depending on whether they result in higher reproductive success. In short, mutation creates the variation in an organism’s DNA, natural selection decides whether it stays or if it goes, and so biases the direction of evolution.

But hotspots of high mutation within a genome mean genes in certain locations have a higher chance of mutating than others. This means that such hotspots could be an underappreciated mechanism that could also bias the direction of evolution, meaning natural selection may not be the sole driving force.

So far, dark DNA seems to be present in two very diverse and distinct types of animal. But it’s still not clear how widespread it could be. Could all animal genomes contain dark DNA and, if not, what makes gerbils and birds so unique? The most exciting puzzle to solve will be working out what effect dark DNA has had on animal evolution.
This is pure speculation. There's only a hint of this idea in the original paper. It's true that hotspots of mutation are going to show more variation than other regions of the genome. That's just common sense. The question that's important is whether there's some underlying selection for hotspots in order to shift the species in a certain direction. The other, more likely, possibility is that the formation of hotspots is fortuitous and evolution just has to cope with the problem.

The role of hotspots in evolution is an interesting question but Hargreaves seems to be capitalizing on the sexy term "dark DNA" when, in fact, he's just speculating that hotspots of recombination may play a role in evolution. The hotspot regions may or may not be "dark DNA" depending on how you sequence a genome. If you use the right sequencing methods then the DNA won't be "dark" at all.

Hargraves followed up on his popularity by publishing an article in a recent issue of New Scientist (March 10-16, 2018). The title is Dark DNA: The missing matter at the heart of nature.2 It made the cover of the magazine.

As you can see, the title conveys the idea that there's a connection between "dark DNA" and "dark matter." The former, like the latter, is supposed to be some mysterious stuff that scientists can't explain. But, as I pointed out above, we have a perfectly good explanation of "dark DNA"—it's GC-rich DNA that can't easily be sequenced using some sequencing technologies.

Here's how Hargreaves hypes his work in the New Scientist article ....
The discovery of dark DNA is so recent that we are still trying to work out how widespread it is and whether it benefits those species that possess it. However, its very existence raises some fundamental questions about genetics and evolution. We may need to look again at how adaptation occurs at the molecular level. Controversially, dark DNA might even be a driving force of evolution.
It's true that we don't know whether extensive GC-rich regions are rare or common. The evidence so far suggests they are not common judging by the quality of the genomes that have been sequenced. Thus, I believe that Hargraves is misleading readers on this point.

In the absence of evidence, we assume that the GC-rich regions does not benefit the species—this is the default assumption. I think it's misleading to speculate that hotspots benefit the species.

In my opinion, the existence of large blocks of GC-rich regions of the genome does not raise fundamental questions about genetics and evolution. Hargreaves is wrong about that.

On the other hand, these articles do raise fundamental questions about the quality of science journalism and how we communicate with the public. Is it acceptable to hype your own work to make it seem far more important than it actually is? Do we, as scientists, have a responsibility to speak out against this behavior?

DNA Image Credit: Moran, L.A., Horton, H.R., Scrimgeour, K.G., and Perry, M.D. (2012) Principles of Biochemistry 5th ed., Pearson Education Inc. page 581 © Pearson/Prentice Hall

1. A mismatched A:C base pair can be converted by removing and substituting either base. If gene conversion were unbiased then the repair would yield A:T and G:C pairs at the same frequency. Instead, there is a higher probability of generating the G:C product.

2. This is the title of the online version of the article. The print version title is The hunt for dark DNA.

Hargreaves, A.D., Zhou, L., Christensen, J., Marlétaz, F., Liu, S., Li, F., Jansen, P.G., Spiga, E., Hansen, M.T., Pedersen, S.V.H., Biswas, S., Serikawa, K., Fox, B.A., Taylor, W.R., Mulley, J.F., Zhang, G., Heller, R.S., and Holland, P.W.H. (2017) Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster. Proc. Natl. Acad. Sci. (USA) 114:7677-7682. [doi: 10.1073/pnas.1702930114]


  1. And, as usual, they got the structure of DNA on the cover wrong...

    1. Because you cannot distinguish minor and major groove? Or is the number of nucleotides per turn wrong? To me the dipicte double helix appears right-handed. Thus, the cover is better than many others.

  2. Under the traditional European system, it was important for a young scientist to show that they supported their Professor's views, and even considered them the last word in that field.

    In the U.S., with its tradition of individualism and advertising, and a different academic structure, it is important for young scientist to promote their work as totally revolutionary. Since Thomas Kuhn's book, most of us have to spend a lot of time knocking down alleged new paradigms.

    I supposes that means that Canadian science is either the best of both worlds -- or the worst.

  3. Biased gene conversion doesn't only arise from biases in mismatch repair of heteroduplex DNA. It can also arise from other correction biases (e.g. insertions vs deletions) and from biases at other stages of recombination, especially at the strand-breakage and degradation steps that initiate meiotic recombination.

  4. Very interesting post. I can't help but be reminded of the criticism Stephen Jay Gould received for his perceived over-promotion of punctuated equilibrium. Every scientist would like more attention for their work. To what degree is "hype" the fault of the scientist, or the media? On the other side, is criticism of good press fair or does it originate from professional jealousy? All you can do is be truthful, and speak out against this behavior when you see it. Many thanks,