Friday, November 14, 2008

Making Sense in Biology

When I teach students how to read the scientific literature, I caution them not to believe everything they read. Science is, by it's very nature, tentative and exploratory. Much of what is published doesn't get confirmed and is quietly ignored. Many of the ideas and speculations that are published never amount to anything. Some experiments are flawed. Many skim over hidden assumptions so that the conclusions aren't valid.

How do you tell the difference between the wheat and the chaff? Well, for one thing, you ask yourself whether the results "make sense" in light of what you already know. Are there any basic principles of biology that conflict with the conclusions? You always have to be on the lookout for papers that just don't fit in with your current model of how things work.

There are two potential problems with this approach. First, your model may not be correct. Maybe you don't know enough to make a judgment. Second, it prevents you from recognizing truly novel results that may change your idea of what makes sense.

The first problem is curable. The second is more serious. Science is basically conservative in its acceptance of new ideas. This may seem like a bad thing but, in fact, it's the only way to do good science. You simply can't afford to believe in several paradigm shifts every day before breakfast because most of them will turn out to be wrong. Today, when scientists want to convince their colleagues of something new that may not "make sense", they are obliged to present solid evidence that will convince the skeptics. It's an uphill fight. And it should be.

One of my colleagues has been following the discussion about alternative splicing and he directed my attention to a paper he just published in Nature Genetics. He pointed out that far from being an overestimate of alternative splicing, the EST data actually underestimates the extent of alternative splicing.

The paper by Pan et al. (2008) makes two extraordinary claims.
  1. Their data indicates that about 95% of all multiexon human genes undergo alternative splicing.
  2. They estimate that there are, on average, seven (7) alternative splicing events per multiexon human gene.

Neither of these claims make sense. It's not reasonable to assume that most conserved housekeeping genes produce variants by alternative splicing yet that's exactly what would have to happen if 95% of all genes undergo alternative splicing. It means that most most genes for things like metabolic enzymes, RNA polymerase, ribosomal proteins and transport proteins will have variants due to alternative splicing. This doesn't make sense from an understanding of biochemistry and it doesn't make sense in light of evolution.

That's good reason to be skeptical.

But surely the data must be convincing? Surely the proponents of these extraordinary claims have extraordinary data to back their cease?

Frankly, I don't know. I can't evaluate the Pan et al. (2008) paper because I have no idea how they actually do their experiments and whether those experiments are reliable. Part of the problem is that the authors don't tell me enough and part of it is that this is unfamiliar technology (to me).

All I know is that it doesn't make sense. I've asked the author to give me some specific examples of alternative splicing predictions for common genes, like those in the citric acid cycle. By looking at specific, rather than global, data it might be possible to see whether the results make sense.

Pan, Q., Shai, O., Lee, L.J., Frey, B.J. and Blencowe, B.J. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics, published online Nov. 2, 2008. [DOI:10.1038/ng.259]


  1. The technology used in these papers actually underestimates the true number of splice junctions because they only map their reads against the known exon-exon junction. This excludes unknown exons, possible trans-splicing events and other exotic stuff that might be going on.

    It is pretty clear what the data says, one can't argue against it. Microarrays sucked, but this is direct sequencing. Further advances in sequencing technology will only take us further down the rabbit hole. The question is how biologically relevant these splices are, i.e whether they represent noise from the splicing machinery or real functional proteins product of extensive alternative splicing

  2. If (as one of the commentors on the other post suggested), a large proportion of these variant transcripts are in fact partially-processed intermediates, this should be testable.

    Intermediate products should be mainly nuclear, while cytoplasmic RNA should contain mainly the final products. It would be very interesting to try the deep-sequencing approach on separated nuclear and cytoplasmic RNAs.

    Of course, if it turns out that most of the variant transcripts never make it out of the nucleus, this doesn't necessarily preclude them having some function within the nucleus (example: Xist / Tsix), but it would pretty much confirm Larry's contention that they don't produce functional alternative protein variants.

  3. Something else that can be very informative about the relevance of these transcripts is their relative frequency. If they are abundant, then either the splicing machinery is very imprecise and that's why they are generated, or they are functional. And the more abundant they are, the less likely the former possibility becomes:

    A quote from the other paper that looks at alternative splicing:

    Wang E.T., Sandberg R., Luo S., Khrebtukova I., Zhang L., Mayr C., Kingsmore S.F., Schroth G.P., Burge C.B. (2008) Alternative isoform regulation in human tissue transcriptomes
    Nature, published online Nov. 2, 2008.

    Analyses in which sequence reads are mapped to exon–exon junctions indicated that 92–94% of human genes undergo alternative splicing, approx86% with a minor isoform frequency of 15% or more.

  4. I can't speak to biology but the notion that a theory must make sense would exclude quantum mechanics as a theory of sub-microscopic physics. Quantum mechanics is a theory of physics which has immense explanatory power and is almost centainly correct but, unfortunately, makes no sense at all. As Lawrence Krauss put it in an interview with Richard Dawkins, nobody understands quantum mechanics.

  5. SLC says,

    I can't speak to biology but the notion that a theory must make sense would exclude quantum mechanics as a theory of sub-microscopic physics. Quantum mechanics is a theory of physics which has immense explanatory power and is almost certainly correct but, unfortunately, makes no sense at all.

    Excuse me?

    The standard model of physics makes sense, by definition. It is the best way to explain the data.

    When you read a paper that conflicts with the standard model your BS detectors should go off.

  6. Yet again an excellent post, Larry.

    "The standard model of physics makes sense, by definition. It is the best way to explain the data."

    I think what SLC was saying, is that although it is definitely true that Quantum mechanic is very good at describing the quantum world and that it is very successful in making predictions, physicists do not _understand_ why!

    In other words, they know how quanta behave, but they don't know _why_ they behave as they do.

    Think for instance of nonlocality. In our world, a force (gravity, electromagnetism, nuclear) needs to travel some distance in order to have an effect on something, so the effect is not immediate (locality), but in the quantum world, the effect is instantaneous, no matter the distance between the objects that are interacting (nonlocality)! Even if two quanta were separated by light-years, they would affect one another without delay, as though they were in contact! _That_ is just weird! And yet, it is true!

    Robert M.

  7. My BS detectors go off whenever someone uses the term "housekeeping gene".

  8. Anonymous says,

    My BS detectors go off whenever someone uses the term "housekeeping gene".

    I find it to be a good descriptive metaphor for those genes that are expressed and required in all cells. (With some exceptions.)

    Why do you object to the term and what would you suggest it's place?

  9. The metaphor is diminutive. How about "Genes that I don't study and therefore don't know much about." Wouldn't that be more accurate?