Wednesday, November 12, 2008

Two Examples of "Alternative Splicing"

THEME:
Transcription

Last week I bumped into a colleague who teaches in our third year molecular biology course. I was lamenting about the sad state of science these days and we got to talking about alternative splicing. I repeated my complaint that much of the predicted alternative splice variants are artifacts. It makes no sense that conserved genes would be producing alternative protein variants that are species specific. I am convinced that the EST databases are full of artifacts and that most predicted splice variants do not exist.

My colleague was shocked. He is firmly convinced that most human genes express a number of different protein products that are produced as the result of alternatively spliced mRNA precursors. I asked him if he had ever looked at his favorite genes to see if the predicted variants make any sense. The ones that I've looked at certainly don't. (Join in the fun: see the challenge below.)

My colleague is very knowledgeable about the genes for the major subunits of eukaryotic RNA polymerase since it was his lab that cloned the first one. I suggested that he look at the predicted alternative splice variants of the two human genes and let me know if he is still convinced that these variants make biological sense. I'm not sure he will do it so let's take a look ourselves.

Eukaryotic RNA polymerase is a complex protein machine consisting of ten different subunits. Two of the subunits, Rpb1 and Rbp2, are more commonly known as A and B. In the human genome they are encoded by the genes POLR2A and POLR2B respectively [RNA Polymerase Genes in the Human Genome].

If you click on the Entrez Gene URLs you will end up at a page that summarizes what is known about the gene. Down the right-hand side of the page there are links to several other webpages, including a link to AceView, a database of alternative splice variants. Before following this link to the POLR1A variants, let's note that on the annotated Entrez Gene website there are no alternative splice variants listed. Apparently someone has decided that the predicted variants are probably artifacts.

Go to the AceView page for AceView POLR2A. The first thing you see is a short explanation.
RefSeq annotates one representative transcript (NM included in AceView variant.a), but Homo sapiens cDNA sequences in GenBank, filtered against clone rearrangements, coaligned on the genome and clustered in a minimal non-redundant way by the manually supervised AceView program, support at least 11 spliced variants.

AceView summary
Note that this locus is complex: it appears to produce several proteins with no sequence overlap.
Expression: According to AceView, this gene is expressed at very high level, 4.8 times the average gene in this release. The sequence of this gene is defined by 537 GenBank accessions from 518 cDNA clones, some from breast (seen 40 times), marrow (29), head neck (19), brain (18), eye (18), leukopheresis (18), lung tumor (18) and 132 other tissues. We annotate structural defects or features in 13 cDNA clones.
Alternative mRNA variants and regulation: The gene contains 29 different introns (28 gt-ag, 1 gc-ag). Transcription produces 13 different mRNAs, 11 alternatively spliced variants and 2 unspliced forms. There are 7 probable alternative promotors and 5 non overlapping alternative last exons (see the diagram). The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, overlapping exons with different boundaries, alternative splicing or retention of 4 introns. 337 bp of this gene are antisense to spliced gene pluvu, raising the possibility of regulated alternate expression.
Protein coding potential: 10 spliced and the unspliced mRNAs putatively encode good proteins, altogether 11 different isoforms (3 complete, 4 COOH complete, 4 partial), some containing domains RNA polymerase Rpb1, domain 1, RNA polymerase, alpha subunit, RNA polymerase Rpb1, domain 3, RNA polymerase Rpb1, domain 4, RNA polymerase Rpb1, domain 5, RNA polymerase Rpb1, domain 6, RNA polymerase Rpb1, domain 7, Eukaryotic RNA polymerase II heptapeptide repeat [Pfam]. The remaining 2 mRNA variants (1 spliced, 1 unspliced) appear not to encode good proteins.
Here's the figure showing the various predicted alternatively spliced transcripts and the various different proteins.


It's really difficult to imagine that any of these are biologically relevant. How could a small bit of the large RNA polymerase subunit ever be part of the RNA polymerase protein complex? It's not a surprise that the Entrez Gene annotators have ignored these predictions.

If, as I believe, most of the small ESTs on which these predictions are based are artifacts, then the overall pattern makes sense. What you see are examples of splicing errors where an intron has not been correctly removed. These extremely rare splicing errors are copied into cDNA during construction of EST libraries and specifically selected by screening out all the correctly spliced mRNAs. (That's how you make most EST libraries.)

Here's what AceView says about the gene for the other large subbunit [AceView: POLR2B].
RefSeq annotates one representative transcript (NM included in AceView variant.a), but Homo sapiens cDNA sequences in GenBank, filtered against clone rearrangements, coaligned on the genome and clustered in a minimal non-redundant way by the manually supervised AceView program, support at least 9 spliced variants.
One again, AceView notes that the annotated human genome has ignored the predicted alternative plice variants but maintains that there are at least nine of them.

Here's the figure, decide for yourself whether this is credible.


There are several well-known examples of human genes producing different protein variants due to alternative splicing. The ones I can think of off the top of my head are the genes for class I antigens, α-tropomyosin, and calcitonin. I'm sure there are half a dozen others.

Here's the challenge. See if you can find a human gene for a well-studied protein where the structure of the protein is known and there are multiple protein variants derived by alternative splicing. I bet that readers of Sandwalk can't find very many where the predicted variants many any sense and are likely to be biologically significant.

What does this mean? Whenever you look at your favorite well-studied gene you see that the predictions of alternative splicing are silly. So why should we believe the genome wide analyses? Is it just a coincidence that the more we learn about a given gene the most we become willing to reject the ESTs as artifacts? Or is it possible that alternative splicing is mostly confined to those genes that have not been well studied?


10 comments :

  1. Sorry for the laundry list of citations.

    This protein is well-known to some of us, at least.

    Some more examples from a very quick spin around the interwebs (guess what one of the Pubmed search handles was):

    Schöning JC, Streitner C, Meyer IM, Gao Y, Staiger D.
    Reciprocal regulation of glycine-rich RNA-binding proteins via an interlocked feedback loop coupling alternative splicing to nonsense-mediated decay in Arabidopsis.
    Nucleic Acids Res. 2008 Nov 4. [Epub ahead of print]

    Dinkins RD, Majee SM, Nayak NR, Martin D, Xu Q, Belcastro MP, Houtz RL, Beach CM, Downie AB.
    Changing transcriptional initiation sites and alternative 5'- and 3'-splice site selection of the first intron deploys Arabidopsis protein isoaspartyl methyltransferase2 variants to different subcellular compartments.
    Plant J. 2008 Jul;55(1):1-13.

    Puyaubert J, Denis L, Alban C.
    Dual targeting of Arabidopsis holocarboxylase synthetase1: a small upstream open reading frame regulates translation initiation and protein targeting.
    Plant Physiol. 2008 Feb;146(2):478-91.

    Bove J, Kim CY, Gibson CA, Assmann SM.
    Characterization of wound-responsive RNA-binding proteins and their splice variants in Arabidopsis.
    Plant Mol Biol. 2008 May;67(1-2):71-88.

    Bocobza S, Adato A, Mandel T, Shapira M, Nudler E, Aharoni A.
    Riboswitch-dependent gene regulation and its evolution in the plant kingdom.
    Genes Dev. 2007 Nov 15;21(22):2874-9.

    Muralla R, Chen E, Sweeney C, Gray JA, Dickerman A, Nikolau BJ, Meinke D.
    A bifunctional locus (BIO3-BIO1) required for biotin biosynthesis in Arabidopsis.
    Plant Physiol. 2008 Jan;146(1):60-73.

    Zhang XC, Gassmann W.
    Alternative splicing and mRNA levels of the disease resistance gene RPS4 are induced during defense responses.
    Plant Physiol. 2007 Dec;145(4):1577-87.

    Rossignol P, Collier S, Bush M, Shaw P, Doonan JH.
    Arabidopsis POT1A interacts with TERT-V(I8), an N-terminal splicing variant of telomerase.
    J Cell Sci. 2007 Oct 15;120(Pt 20):3678-87.

    Castells E, Puigdomènech P, Casacuberta JM.
    Regulation of the kinase activity of the MIK GCK-like MAP4K by alternative splicing.
    Plant Mol Biol. 2006 Jul;61(4-5):747-56.

    Lee JR, Jang HH, Park JH, Jung JH, Lee SS, Park SK, Chi YH, Moon JC, Lee YM, Kim SY, Kim JY, Yun DJ, Cho MJ, Lee KO, Lee SY.
    Cloning of two splice variants of the rice PTS1 receptor, OsPex5pL and OsPex5pS, and their functional characterization using pex5-deficient yeast and Arabidopsis.
    Plant J. 2006 Aug;47(3):457-66.

    de la Fuente van Bentem S, Vossen JH, Vermeer JE, de Vroomen MJ, Gadella TW Jr, Haring MA, Cornelissen BJ.
    The subcellular localization of plant protein phosphatase 5 isoforms is determined by alternative splicing.
    Plant Physiol. 2003 Oct;133(2):702-12.

    Savaldi-Goldstein S, Aviv D, Davydov O, Fluhr R.
    Alternative splicing modulation by a LAMMER kinase impinges on developmental and transcriptome expression.
    Plant Cell. 2003 Apr;15(4):926-38.

    Jasinski S, Perennes C, Bergounioux C, Glab N.
    Comparative molecular and functional analyses of the tobacco cyclin-dependent kinase inhibitor NtKIS1a and its spliced variant NtKIS1b.
    Plant Physiol. 2002 Dec;130(4):1871-82.

    Macknight R, Duroux M, Laurie R, Dijkwel P, Simpson G, Dean C.
    Functional significance of the alternative transcript processing of the Arabidopsis floral promoter FCA.
    Plant Cell. 2002 Apr;14(4):877-88.

    Dinesh-Kumar SP, Baker BJ.
    Alternatively spliced N resistance gene transcripts: their possible role in tobacco mosaic virus resistance.
    Proc Natl Acad Sci U S A. 2000 Feb 15;97(4):1908-13.

    Zhou DX, Kim YJ, Li YF, Carol P, Mache R.
    COP1b, an isoform of COP1 generated by alternative splicing, has a negative effect on COP1 function in regulating light-dependent seedling development in Arabidopsis.
    Mol Gen Genet. 1998 Feb;257(4):387-91.

    ReplyDelete
  2. well i can understand the great deal of research going on with that alternative splicing and the micro RNAs as well as small RNA. However, like you said i found that according to aceview there are about 34 spliced variants for PFKM, of which only ONE is recognized

    http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?c=geneid&org=9606&l=5213

    ReplyDelete
  3. My experience with Western blots (and clean antibodies!) of quite a number of different proteins shows that typically there is one major band, sometimes couple of isoforms and the rest are degradation products that can be controlled. So I share your views on alternative splicing in general. IMHO, they make a lot more sense from a mechanistic point of view.

    Still, allow me to play devil's advocate and offer one possibility of the importance of rare alternative splice variants.

    Regulation. Or it might be better called "accidental regulation" or "interference regulation". Disfunctional fragments of multidomain proteins retaining enough function to compete with the "normal" protein for its target(s), for example.

    If we are to subscribe to the notion of a cell being a chaotic system where some small changes can potetnially bring about large consequences, this is certainly a possibility that cannot be discounted off hand without experimental evidence.

    Although it's a lot of work, the experimental resolution seems to be straightforward:
    transgenes that don't have [some] introns. If in majority of cases such transgenes don't have a phenotype, then there is much better footing for a sensible notion that majority of detected splice varians are spurious ones without functional role.

    Hopefully someone is up to this arduous task - even with the odds of it being "negative" unpublishable result.

    ReplyDelete
  4. dk, I'm sorry I couldn't summarize all of the studies in my list (or any of them, for that matter) - the list itself was too long. The paper by Dinesh-Kumar and Baker is just the sort of transgenic study you suggest, and it shows quite clearly that alternative splicing is required for the biological activity of the N gene.

    ReplyDelete
  5. To art:
    The paper by Dinesh-Kumar and Baker is just the sort of transgenic study you suggest, and it shows quite clearly that alternative splicing is required for the biological activity of the N gene.

    Yes, but Larry's point (and mine) is not that alternative splicing has no role! OF COURSE it has a role. The issue in question is of relative frequency/importance. Is it a norm or is it rather an exception?

    Correct me if I am wrong but all of the following is well-known to occur but involve only a clear minority of genes/proteins in eukariots:

    - overlapping genes;
    - polycistronic mRNA;
    - intein splicing;
    - polyproteins.

    My gut feeling is that functionally important alternative splicing belongs to this list. And I don't think I saw an unequivocal evidence either pro or contra.

    P.S. Just looked at AceView annotation for human actins. 20 splice variants for gamma gene and 3 for cardiac alpha. Now, that is just rubbish! Without any of its exons, actin is dead, dead, dead - nothing but unfolded garbage. But I suppose that it can be argued that cleaning up garbage is part of the normal cellular life and is thus can be "regulatory" ...

    ReplyDelete
  6. Hi dk,

    Recall that my list was in response to Larry's challenge:

    "Here's the challenge. See if you can find a gene for a well-studied protein where the structure of the protein is known and there are multiple protein variants derived by alternative splicing. I bet that readers of Sandwalk can't find very many where the predicted variants many any sense and are likely to be biologically significant."

    I had no problems finding lots of cases.

    I suspect that some readers are going to get confused by Larry's terminology (I may be one of these!). For me, when Larry claims that the alternatively-spliced RNAs shown here are EST artifacts, he is saying that they are the results of some activity of reverse transcriptase (and/or Taq polymerase) that occurs in the test tube; in other words, these RNA variants do not occur in the cell. I don't buy this claim.

    OTOH, if, by artifact, Larry means non-functional RNA (such as an RNA derived from a splicing error), then I could agree that a sizeable fraction of the RNAs such as shown in the browser displays here are in this class. The relative sizes of the two classes (functional vs error) are open questions. But I don't think the universe of functionally-relevant alternative-spliced RNAs is nearly as tiny as Larry is implying.

    Which brings me to sort of a counter-challenge - how many studies, perhaps analogous to Dinesh-Kumar and Baker, perhaps using other approaches, can readers here cite that show that an alternatively-spliced RNA has no function?

    ReplyDelete
  7. Alternative splicing is, as art pointed out clearly relevant in many cases. I also agree that many of the predicted splice forms are artifacts that could exist within cells. These may not just be due to incorrect splicing, but could also be splicing intermediates. It appears that many introns, especially large ones, are removed not in one step but via a step-wise ratcheting mechanism that involves "stepping-stone" splicing events. Look up "recursive splicing".
    Anyway, I also think that many theoretical predictions of isoforms get fooled by very real splice sites that may only be used for recursive splicing and that may not contribute to to alternative splice-form diversity.

    ReplyDelete
  8. I have a question and I hope someone reads these older posts. Before the human genome was sequenced the number floating around was 100,000 human genes. Now the correct figure is 19,000+ genes but what is the current figure for gene products? If this 100,000 gene products figure is still correct then this becomes a combinational problem. The prediction would be around 50-60% of all genes having at least 1 alternative and some 300 genes with maybe 50 or more alternatives. I would think that looking at a gene family would be a way to go.

    ReplyDelete
  9. Bill asks,

    I have a question and I hope someone reads these older posts. Before the human genome was sequenced the number floating around was 100,000 human genes.

    That number was mere speculation by people who hadn't studied the problem. See Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome.

    Now the correct figure is 19,000+ genes but what is the current figure for gene products?

    Every gene has to make a product so there must be at least 20,000 protein products (plus a good number of RNA products). It's possible for a single gene to make several different protein products by splicing and dicing the primary transcript or the nascent polypeptide. Nobody knows for sure how many different products are synthesized in thie manner. It's still controversial. Personally, I don't think there are more that 25,000 distinct proteins produced in human cells.

    If this 100,000 gene products figure is still correct then this becomes a combinational problem. The prediction would be around 50-60% of all genes having at least 1 alternative and some 300 genes with maybe 50 or more alternatives. I would think that looking at a gene family would be a way to go.

    The hard work is being done. So far, there have only been a few dozen genes that have been shown to produce more than one biologically relevant protein.

    ReplyDelete
  10. Thanks Larry. I have actually seen this 100,000 figure used in calculations in some papers.

    ReplyDelete