Monday, September 10, 2012

Science Writes Eulogy for Junk DNA

Elizabeth Pennisi is a science writer for Science, the premiere American science journal. She's been writing about "dark matter" for years focusing on how little we know about most of the human genome and ignoring all of the data that says it's mostly junk [see SCIENCE Questions: Why Do Humans Have So Few Genes? ].

It doesn't take much imagination to guess what Elizabeth Pennisi is going to write when she heard about the new ENCODE Data. Yep, you guessed it. She says that the ENCODE Project Writes Eulogy for Junk DNA.


Genomes & Junk DNA
Let's look at the opening paragraph in her "eulogy."
When researchers first sequenced the human genome, they were astonished by how few traditional genes encoding proteins were scattered along those 3 billion DNA bases. Instead of the expected 100,000 or more genes, the initial analyses found about 35,000 and that number has since been whittled down to about 21,000. In between were megabases of “junk,” or so it seemed.
This is just a repetition of the Myth Concerning the Historical Estimates of the Number of Genes in the Human Genome. The truth is that knowledgeable scientists knew decades ago that the human genome could only have about 30,000 genes or else the mutation load would be too high. Data collected in the 70s and 80s confirmed that we could only have about 30,000 functional genes because genes were bloated by introns and our genome was full of transposons.

The idea that much of the genome was junk was also a conclusion based on genetic load arguments and on the discovery that most of our genome was littered with defective transposons (pseudogenes). The concept of junk DNA also explained the C-Value "Paradox." None of that evidence has disappeared with the publication of the ENCODE results. The "dark matter" really is junk.

Pennisi continues ...
This week, 30 research papers, including six in Nature and additional papers published by Science, sound the death knell for the idea that our DNA is mostly littered with useless bases. A decadelong project, the Encyclopedia of DNA Elements (ENCODE), has found that 80% of the human genome serves some purpose, biochemically speaking. “I don't think anyone would have anticipated even close to the amount of sequence that ENCODE has uncovered that looks like it has functional importance,” says John A. Stamatoyannopoulos, an ENCODE researcher at the University of Washington, Seattle.

Beyond defining proteins, the DNA bases highlighted by ENCODE specify landing spots for proteins that influence gene activity, strands of RNA with myriad roles, or simply places where chemical modifications serve to silence stretches of our chromosomes. These results are going “to change the way a lot of [genomics] concepts are written about and presented in textbooks,” Stamatoyannopoulos predicts.
There's nothing new in the ENCODE results. Pennisi should remember the controversy when the pilot project results were published in 2007. Many scientists pointed out, correctly, that a transcribed region is not necessarily indicative of a biological function. They also pointed out that DNA binding proteins are EXPECTED to bind to many non-functional loci, especially in a genome full of junk DNA. A binding site does not equate to biological function.

The death of junk DNA has been greatly exaggerated but it fits in nicely with a preconceived notion of mysterious dark matter and blinders that prevent you from seeing any evidence supporting junk DNA.

There's something seriously wrong when the two leading science journals openly support a radical change in our understanding of genomes based entirely on an incorrect interpretation of the data. Even when that misinterpretation is promoted by the authors of the papers that's no reason for prominent science journalists to stop being skeptical.

This is the time for Nature and Science to ask themselves how they could have been taken in by such hype and how they are going to prevent it in the future. They should also ask themselves whether they should retract the articles they wrote on the death of junk DNA. They would do no less for the science papers they publish if they found that the results were misleading.


  1. Yep. I wonder if the journals will at least suck it up enough to publish critical letters?

    1. NickM: I wonder if the journals will at least suck it up enough to publish critical letters

      Well, at least some of them did! I just posted the following comment, on Elizabeth Pennisi’s perspective, in Science (if nothing else, the comment might have the merit of abbreviating ‘junk DNA’ and ‘functional DNA’ to jDNA and fDNA, which should be useful, as I have a hunch that a lot of ink and bits will be flowing about these concepts in the coming weeks):

      Multiple eulogies for junk DNA?

      Before writing a eulogy for junk DNA (jDNA), we need to know more about it. So what is jDNA?

      All genomic sequences that code for proteins and functional RNA, or are involved in regulating gene expression (e.g. promoter elements) are functional DNA (fDNA). However, there are many other sequences that are functional, such as those participating in DNA replication, chromosome organization, etc.

      By definition, jDNA is non-functional. However, by its bare presence in the genome, jDNA gets replicated and can undergo recombination, transcription and transposition, and it can be targeted by diverse DNA binding proteins.

      ENCODE has been a logical follow up of the Human Genome project, which found that less than 2% of our genome codes for proteins and functional RNAs. Even by including generous estimates of regulatory sequences, the fDNA has been considered just a fraction of the genome; the rest, 90% or more, remained jDNA.

      ENCODE has challenged all that, by suggesting that 80% or more of the human genome is fDNA. Accordingly, most of jDNA has evaporated. Whether this interpretation of the data, which involved a change in the definition of fDNA, was a hasty decision that reflects poorly on an otherwise remarkable project remains to be seen.

      Here, I want to point out that a previous eulogy for jDNA was penned more than two decades ago (1), when it was proposed that jDNA functions as a sink for the integration of proviruses, transposons and other inserting elements, thereby protecting fDNA from inactivation or alteration of its expression.

      Considering that at least 50% of the human genome is composed of transposable elements, and that the rate of their transposition is very high, this protective mechanism makes evolutionary sense. The evolution of alternative protective mechanisms against insertion mutagenesis such as specific integration sites in species that have little jDNA, (e.g. Bacteria) is strong evidence for this selective pressure. However, this pressure enters a new dimension in humans and other multicellular species, in which the number of integration events in somatic cells (including those by retroviruses) that would lead to cancer would be enormous without a protective mechanism. This model is fully consistent with the current data, makes evolutionary sense, and, statistically, is a fact.

      1. Bandea CI. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. 1990.

    2. Davin: That doesn't make any sense.

      No surprisingly, I beg to differ! As a matter of fact, I think it makes so much sense that (similar to other common sense issues that are highly inconvenient, such as the Onion Test) the only way to deal with it is to pretend that it doesn’t exist, or to say: “That doesn't make any sense,” or that “it is silly” (see Birney thinks the Onion Test is silly)

    3. See a follow up explanation of my model on the function of junk DNA as a sink for the integration of proviruses, transposons and other inserting elements here:

      Also, see how my model compares to ENCODE theory here:

  2. Is someone keeping a file of URLs of science media reports that the notion of junk DNA is dead? That would be useful in the future when they start claiming that they never said any such thing, or when the ENCODE people start saying that they are innocent, that they did not set off this media frenzy.

    I realize that Larry has been recording major reports one by one in posts here (and all praise to him for that). But some repository would be helpful when we need to make up a slide or two with collages of science media reports, when we say to an audience "Now, some of you may have heard that there isn't any 'junk DNA' ..."

    1. For the record Joe, where do stay on the junk DNA issue?

    2. Yes. I have an extensive collection of both primary literature and media reports from 1970-present.

    3. Claudiu -- I have been going tediously on the record on this one in this very blog for a while, for example: here and some earlier comments on Larry's recent posts too. (That one is also about whether it is sensible to conclude from the presence of lots of junk DNA that morphological traits are also not subject to natural selection).

      Ryan -- Thanks, I saw some of your fine posts but missed that. It is a great service to us all. A list of all the science writers who are going to "have egg on their face", in effect.

    4. I note the on Carl Zimmer's "Loom" blog at Discover Magazine's site he says he "would be all over" this story if he weren't overloaded by another one. But he does give links to both sides of the argument. Which means he is one possible writer who could comment on the Emperor's New Jeans (pun intended).

    5. Claudiu -- you will also find me shooting off my mouth on September 5 at Panda's Thumb, in a thread started by Nick Matzke's comment. Nick, Larry, Sean Eddy, and Ryan Gregory deserve most of the honors here but I am happy to chime in.

    6. Thanks for your response, Joe. Although I have a slightly different position on junk DNA (see my comments on Larry’s post: A Tribute to Stephen Jay Gould), which is by no means homologous to ENCODE’s extravagant and unfounded position, I appreciate your evolutionary perspective, Darwinian style, on defining junk DNA as the “DNA whose variation is not constrained by natural selection.”


  3. When ENCODE looks at the next batch of transciption factors I would suggest that they throw in yeast GAL4 and maybe a few others. If GAL4 lights up all over the genome that would suggest they're not really looking at fucntionality.
    Another idea: if there are TF that we know have very few target would be ideal....can we see binding sites scattered evenly over every chromosome?

  4. Ars Technica weighs in (John Timmer) and gets it right:

  5. Yeah, I saw her article a few days ago. I love her writing, but she, like so many others, didn't dig very deep on this one. I'm probably naive in thinking that Science will issue a correction.

  6. "The death of junk DNA has been greatly exaggerated but it fits in nicely with a preconceived notion of mysterious dark matter"

    It also nicely fits in with a heart-warming view of the progression of science in which scientists slowly but steadily discover signal in what was thought to be noise. The reality of science is that sometimes noise is noise.

  7. 10 members of the ENCODE consortium took part in an AskScience AMA over at Reddit:

  8. It seems most (not all) science journalists view their job as simply interpreting new results for the average joe. They figure any skepticism was already done by the peer reviewers. Their job is to make the stuff intelligible, and on deadline. One can understand this, since (unlike, say, politics) the subject matter can be pretty arcane. In cases like this, I blame the scientists and their institutional PR machines.

    1. Most science journalists do, indeed, claim that their job is to INTERPRET science for the general public. Like all reporters, they claim that they can see through lies and distortions and report the news correctly. No science journalist will admit that all they do is paraphrase the conclusions of the paper and the thoughts of the authors.

      While I put a great deal of blame on the authors, science journalists can't claim that they are doing their job correctly when they have so many recent failures to their credit.

  9. %80 of our genome may have some sort of biochemical functions although I am highly spectical about this claim. But does this mean that %80 of mutations are NOT neutral? Creationists loved this ENCODE project result. Because their famous claim is this: ''most mutations are harmful''. If %80 of base pairs are really functionally important to us, this means that most mutations may be deleterious rather than being neutral. What I want to learn is this: How much percentage of base pairs are under selection? Do ENCODE results show that most mutations are harmful? I still feel that most mutations are neutral because every human gets 50-100 mutations from parents. If most mutations were really deleterious, all of us would be genetically ill. What is your opinion? Are most mutations are deleterious or effectively neutral. Somebody please answer.

    1. All of us are genetically ill. The human genome contains so many defunct genes, no wonder we grow old and die. Try to grow an organism from a haploid genome: failure guaranteed. It is because we have genetic backups (found in diploid genomes and genetic redundancy), that we are still around.

  10. If its poor understanding by science reporters then why should the public and creationists EVER have confidence in science writers or researchers?
    If creationists opposed these conclusions we would be charged with denying science and dangerous to science research!
    It means there must be a greater liberality about confidence in conclusions from "science" researchers.
    This is why evolution and company are not settled facts just because its written that way.
    These are slippery subjects relative to hard data , origin issues, and the historic demand that scientists can't be wrong is exploding before our eyes here for somebody.

    There should be a methodology about conclusions in the natural sciences that would make things like this present contention less likely.
    We could call it the scientific methodology.
    This creationist insists evolutionary biology has never been put under the scrutiny of actual standards of the scientific method.
    this because past and gone events and processes can't be studied in the present.
    Its all lines of reasoning and fossils of data points.

  11. If its poor understanding by science reporters then why should the public and creationists EVER have confidence in science writers or researchers?

    Exactly. You should always be very skeptical about statements made by scientists and science journalists.

    The problem with creationists is that they are skeptical to the point of ridiculous about science that supports evolution but credulous to the point of IDiocy about claims that challenge evolution and support their worldview.

    It's called having your cake and eating it too.

    If creationists opposed these conclusions we would be charged with denying science and dangerous to science research!

    Isn't it strange that creationists oppose 99% of the research in biology but they fall all over themselves glorifying the ENCODE claims?

    1. Well its not biology but evolutionary biology.
      Your right about about the acceptance of researchers ideas when it accords with ones own.
      they are making a mistake here.
      If the researchers had said otherwise they wouldn't agree with them.

      Id'ers would probably say a case like this is about hard data. A discovery that can be repeated in any investigation.
      Its not an interpretation of data but discovered data.
      I think they see it like this.

      Yet the equation of consent to their conclusions when it suits you is something to be intellectually aware of and wary of.

  12. Larry--I see a lot of 20/20 hindsight (or I knew things were this way) in your post. The gene number canard, for example--you write, "The truth is that knowledgeable scientists knew decades ago that the human genome could only have about 30,000 genes or else the mutation load would be too high". Yet your own blog post cited said that the majority of the bets in the 90s were in the 40.000 to 50,000 range (were those all scientists without "knowledge"?), and the current number ins under 25,000--half that. Yes, some/a few people got it right but no one at the time had "proof" of what the number was--they had arguments based on minimal amount of data, but that's far from saying there was a consensus that there were 20-30,000 genes. You offer a revisionist history that you accuse ENCODE and the media of doing, to my eye. For the record, what is your position on whether ENCODE should have been done--much of the backlash seems to stem from those who opposed the project and would rather have seen the money go to PI grants--much like many opposed the human genome being sequenced for similar reasons. The anti-ENCODE faction lost the debate when NIH went ahead, but now is obviously a chance to replay it.

    1. ... were those all scientists without "knowledge"

      Yes. Most of them were graduate students and postdocs whose main focus was on the technology and not on trying to understand genomes. They were heavily influenced by a back-of-the-envelope calculation done by Wally Gilbert in the 1980s. However, the point is that even among this group there was not an obvious bias toward huge numbers of genes as some people would have you believe.

      Look at that chart. How many of those people do you think were really "surprised" at the low number of genes initially reported? Not many, I bet.

      Were you one of the people who was surprised?

      ... that's far from saying there was a consensus that there were 20-30,000 genes

      I did not say that there was such a consensus among molecular biologists. My point is that it's wrong to imply that "everyone" thought there were at least 100,000 genes and they were "surprised" by the publication of the human genome sequence. That's revisionist history.

  13. To summarize this blog and the responses:

    It does not matter what we observe, evolutionism is always true. Why do research, guys?