Sandwalk: Junk RNA or Imaginary RNA?

Thursday, May 20, 2010

Junk RNA or Imaginary RNA?

RNA is very popular these days. It seems as though new varieties of RNA are being discovered just about every month. There have been breathless reports claiming that almost all of our genome is transcribed and most of the this RNA has to be functional even though we don't yet know what the function is. The fervor with which some people advocate a paradigm shift in thinking about RNA approaches that of a cult follower [see Greg Laden Gets Suckered by John Mattick].

We've known for decades that there are many types of RNA besides messenger RNA (mRNA encodes proteins). Besides the standard ribosomal RNAs and transfer RNAs (tRNAs), there are a variety of small RNAs required for splicing and many other functions. There's no doubt that some of the new discoveries are important as well. This is especially true of small regulatory RNAs.

However, the idea that a huge proportion of our genome could be devoted to synthesizing functional RNAs does not fit with the data showing that most of our genome is junk [see Shoddy But Not "Junk"?]. That hasn't stopped RNA cultists from promoting experiments leading to the conclusion that almost all of our genome is transcribed.

Late to the Party

Several people have already written about this paper including Carl Zimmer and PZ Myers. There are also summaries in Nature News and PLoS Biology.That may change. A paper just published in PLoS Biology shows that the earlier work was prone to artifacts. Some of those RNAs may not even be there and others are present in tiny amounts.

The work was done by Harm van Bakel in Tim Hughes' lab, right here in Toronto. It's only a few floors, and a bridge, from where I'm sitting right now. The title of their paper tries to put a positive spin on the results: "Most 'Dark Matter' Transcripts Are Associated With Known Genes" [van Bakel et. al. (2010)]. Nobody's buying that spin. They all recognize that the important result is not that non-coding RNAs are mostly associated with genes but the fact that they are not found in the rest of the genome. In other words, most of our genome is not transcribed in spite of what was said in earlier papers.

Van Bekal compared two different types of analysis. The first, called "tiling arrays," is a technique where bulk RNA (cDNA, actually) is hybridized to a series of probes on a microchip. The probes are short pieces of DNA corresponding to genomic sequences spaced every few thousand base pairs along each chromosome. When some RNA fragment hybridizes to one of these probes you score that as a "hit." The earlier experiments used this technique and the results indicated that almost every probe could hybridize an RNA fragment. Thus, as you scanned the chip you saw that almost every spot recorded a "hit." The conclusion is that almost all of the genome is transcribed even though only 2% corresponds to known genes.

The second type of analysis is called RNA-Seq and it relies on direct sequencing of RNA fragments. Basically, you copy the RNA into DNA, selecting for small 200 bp fragments. Using new sequencing technology, you then determine the sequence of one (single end) or both ends (paired end) of this cDNA. You may only get 30 bp of good sequence information but that's sufficient to place the transcript on the known genome sequence. By collecting millions of sequence reads, you can determine what parts of the genome are transcribed and you can also determine the frequency of transcription. The technique is much more quantitative than tiling experiments.

Van Bekel et al. show that using RNA-Seq they detect very little transcription from the regions between genes. On the other hand, using tiling arrays they detect much more transcription from these regions. They conclude that the tiling arrays are producing spurious results—possibly due to cross-hybridization or possibly due to detection of very low abundance transcripts. In other words, the conclusion that most of our genome is transcribed may be an artifact of the method.

The parts of the genome that are presumed to be transcribed but for which there is no function is called "dark matter." Here's the important finding in the author's own words.

To investigate the extent and nature of transcriptional dark matter, we have analyzed a diverse set of human and mouse tissues and cell lines using tiling microarrays and RNA-Seq. A meta-analysis of single- and paired-end read RNA-Seq data reveals that the proportion of transcripts originating from intergenic and intronic regions is much lower than identified by whole-genome tiling arrays, which appear to suffer from high false-positive rates for transcripts expressed at low levels.

Many of us dismissed the earlier results as transcriptional noise or "junk RNA." We thought that much of the genome could be transcribed at a very low level but this was mostly due to accidental transcription from spurious promoters. This low level of "accidental" transcription is perfectly consistent with what we know about RNA polymerase and DNA binding proteins [What is a gene, post-ENCODE?, How RNA Polymerase Binds to DNA]. Although we might have suspected that some of the "transcription" was a true artifact, it was difficult to see how the papers could have failed to consider such a possibility. They had been through peer review and the reviewers seemed to be satisfied with the data and the interpretation.

That's gonna change. I suspect that from now on everybody is going to ignore the tiling array experiments and pretend they don't exist. Not only that, but in light of recent results, I suspect more and more scientists will announce that they never believed the earlier results in the first place. Too bad they never said that in print.

van Bakel, H., Nislow, C., Blencowe, B. and Hughes, T. (2010) Most "Dark Matter" Transcripts Are Associated With Known Genes. PLoS Biology 8: e1000371 [doi:10.1371/journal.pbio.1000371]

17 comments :

Georgi Marinov said...: Just to correct one thing: 30bp of sequence is what people were getting in 2008, now it's 75-100bp and you can sequence both ends of the fragment. The paper in question uses 2x50bp reads, however they only sequenced 23M reads per sample on average. Which is significant (and outdated too, the new HiSeq/SOLID4 instruments get 100M reads in a single lane so massive improvements in read numbers are are coming soon), as I will explain in a second.

One of the fundamental differences between tiling arrays and RNA-Seq is that RNA-Seq is a digital measurement, while arrays are an analog one. So with arrays there is the possibility of compressing the dynamic range of the assay and seeing a lot more of the truly rare stuff that you would have to sequence billions of RNA-Seq reads to get to. Which will happen in the not so distant future, but we don't have now, and the paper in question certainly hasn't done either.

While I am no fan of the "The whole genome is transcribed, let's celebrate" spin of the data, the paper in question by no means puts and end to the discussion. The genome may very well be transcribed at relatively low levels, with those transcripts being degraded very quickly so that they become very hard to detect. Which does not mean that those transcripts have any function, or that even the process of transcription itself is important (we know it is for some types of heterochromatin assembly processes, for example), which is a more likely possibility, but still not supported by sufficient evidence.; Thursday, May 20, 2010 11:55:00 AM
Sean Eddy said...: Georgi, Figure 1A and 1B in the paper already address your point. The authors show strong evidence that RNAseq is far more sensitive than tiled arrays.; Thursday, May 20, 2010 12:05:00 PM
Georgi Marinov said...: I don't see how it does that.; Thursday, May 20, 2010 12:17:00 PM
Larry Moran said...: Georgi Marinov says,

Just to correct one thing: 30bp of sequence is what people were getting in 2008, now it's 75-100bp and you can sequence both ends of the fragment.

I'm aware of the optimistic claims in the latest papers. However, in this paper the authors were concerned about the stringency of their data so they restricted their hits to the first 25-28 bases allowing for one mismatch.

I agree with the rest of your comment. What this paper claims is that any remaining intergenic RNA must be confined to the occasional transcript every few cell generations. Such rare transcripts are much more compatible with accident than design, don't you think?; Thursday, May 20, 2010 2:24:00 PM
Larry Moran said...: to Sean Eddy,

You were the "academic" editor for this paper. I know you have an interest in the topic so what's your take on the earlier literature?

Papers were being published without any attempt to account for possible artifacts and without any attempt to mention that accidental transcription was a serious possibility. How did those papers get by reviewers and editors?

Why did real scientific papers become indistinguishable from press releases?; Thursday, May 20, 2010 2:30:00 PM
Georgi Marinov said...: Such rare transcripts are much more compatible with accident than design, don't you think?

Where have I mentioned anything about design? And I didn't say that they are extremely rarely transcribed, there is a difference between RNA levels and transcriptional activity. Things may be getting transcribed, because for some reason the process of transcription itself is important or for no reason at all, and then degraded very quickly.; Thursday, May 20, 2010 2:41:00 PM
Georgi Marinov said...: Also, remember that ENCODE is being done genome-wide right now so there will be more on the subject in the near future. Here is some of the data that has been publicly released and you may want to take a look at:

http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=160136038&c=chrX&g=wgEncodeCshlShortRnaSeq

http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=160136038&c=chrX&g=wgEncodeRikenCage; Thursday, May 20, 2010 2:45:00 PM
Alex said...: What amuses me is that Dr. Blencowe is an author on the paper you're praising right now, whereas not two weeks ago you expressed scepticism about his paper on the "splicing code".

Speaking of big biology papers, there's this one, doi:10.1126/science.1176495, which I haven't read yet but looks really neat; Thursday, May 20, 2010 11:17:00 PM
Anonymous said...: It's good to finally see results that make sense in the context of what is already known. "The crazier it sounds the better" attitude of Nature's editors really is a powerful force.; Thursday, May 20, 2010 11:32:00 PM
Larry Moran said...: Dunbar says,

What amuses me is that Dr. Blencowe is an author on the paper you're praising right now, whereas not two weeks ago you expressed scepticism about his paper on the "splicing code".

It amuses me too. Blencowe seems to have never met a splicing alternative that he doesn't believe and yet here he is on a paper that questions the significance of most low level transcripts.

Go figure.

I note that his contribution on the van Bakel paper is minor and he is not listed as one of the people who actually wrote the paper.; Friday, May 21, 2010 10:45:00 AM
jbw said...: Reading the article raised the following question in my mind:

Are the parts of the DNA that are not transcribed capable of being transcribed? Is there some defect in the sequence or are they not transcribed because of the machinery of transcription?

If the sequence is OK, do they code for proteins?; Friday, May 21, 2010 11:36:00 AM
DK said...: jbw:
Are the parts of the DNA that are not transcribed capable of being transcribed? Is there some defect in the sequence or are they not transcribed because of the machinery of transcription?

Any part of DNA has *some* potential to be transcribed. With some low probability, various transcription factors can bind to any piece of DNA and lead to the formation of what is called "transcription initiation complex". Once this happens, there will be some RNA made. Most will be very short but some will look "normal". This is what Larry refers to as "transcription noise".

If the sequence is OK, do they code for proteins?

Some low percentage may even have sequence that's enough to encode a polypeptide (i.e., ATG followed by >50 in-frame codons before hitting TAA/TAG). Majority of those will code for "garbage" proteins that won't fold into anything functional.; Friday, May 21, 2010 1:23:00 PM
jbw said...: Thanks for the answer. Another question. Which evolved first, DNA or proteins?; Friday, May 21, 2010 1:31:00 PM
Georgi Marinov said...: Most likely proteins, if RNA was first; Friday, May 21, 2010 1:32:00 PM
jbw said...: So RNA, then proteins, then DNA. Does this mean the first RNA was junk RNA and that protein coding RNA evolved from junk RNA?; Friday, May 21, 2010 2:37:00 PM
Georgi Marinov said...: Junk RNA probably arose very early; Friday, May 21, 2010 2:42:00 PM
Anonymous said...: I'll sign.

Stephen Anstey
Student, Memorial University
St. John's, NL; Saturday, May 22, 2010 2:50:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, May 20, 2010

Junk RNA or Imaginary RNA?

17 comments :