Monday, September 15, 2008

DNA Binding Proteins

Proteins that bind to DNA can be divided into two groups: those that bind to a specific DNA sequence and those that bind non-specifically. Proteins in the latter category include those required for DNA replication, repair, and recombination, as well as packaging proteins like the histones.

Proteins that bind to specific DNA sequences are often activators or repressors involved in regulating gene expression. These are proteins that interact with a short, well-defined, nucleotide sequence found near the start site for transcription. The purpose of this posting is to review some of the basic characteristics of such proteins using lac repressor as a well studied example.

The figure below shows the structure of a lac repressor dimer bound to DNA.

The rate and strength of binding of lac repressor to DNA has been the subject of many papers over the course of several decades. Lac repressor binds tightly to a specific DNA sequence at the beginning of the lac operon. When it is bound to DNA it prevents, or represses, transcription of the operon.

Now, here's the important point: all specific DNA binding proteins also bind DNA non-specifically. In many cases it's part of the search mechanism for the specific binding site. In the case of lac repressor, for example, the protein binds to any old place on the DNA molecule and slides along the DNA searching for a specific binding sequence. After sliding for a second or so it falls off and re-binds to another part of the DNA molecule.

Once the repressor finds its specific binding site it remains bound for about twenty minutes. In biochemical terms we say that it's bound half-life is 20 minutes. The strength of binding is described by an equilibrium binding constant (KB) that reflects the ratio of free repressor to bound repressor. For lac repressor the binding constant is one of the highest measured for any DNA binding protein (= 1013 M-1). What this means is that lac repressor binds very tightly to its specific binding site.

The equilibrium binding constant for non-specific binding is only 4 × 104 M-1. Thus, repressor binds nearly one billion (109) times more strongly to its specific binding site than to any old stretch of DNA. That's very impressive. In fact, it's one of the largest differences known for any DNA binding protein. When bound non-specifically the half-life is measured in seconds. It falls off (dissociates) rapidly.

These measurements have interesting consequences. There are about ten repressor molecules in each cell (E. coli). At any given time one of them will almost certainly be at its binding site near the lac operon but the other nine will be bound to DNA somewhere else. There are millions of places where the repressor can bind non-specifically but only one where it can bind specifically.1

Some of these non-specific binding sites will, by chance, resemble the sequence of the specific binding site so lac repressor will linger longer at those sites than at sites that are completely unrelated to the specific binding sites. The point is that even for a highly specific DNA binding protein like lac repressor, most of the protein is bound to other sites most of the time.

For lac repressor, this fundamental property doesn't have serious consequences but for activators it's a different story. An activator is a protein that binds near a gene and recruits RNA polymerase to the site where it can begin transcription. Since most activators will be bound to random DNA sequences most of the time, the chances of accidentally recruiting RNA polymerase to begin a spurious transcript are quite high. From what we know about basic biochemistry, we expect that random spurious transcription should be quite common.

Tomorrow we'll look at RNA polymerase binding in the presence and absence of a specific DNA binding activator.

1. The specific binding sites are called operators. There are actually three different operator sequences to which lac repressor can bind but that doesn't make much different for the point I trying to make.


  1. Neat! I'm enjoying this...

    Lee <- Wonders if there will be a test in a week or so

  2. Fortunately, the initiation of transcription is highly regulated by requiring multiple simultaneous binding events by several different proteins, not just one activator molecule, particularly in eukaryotic cells. The initiation of replication is even more highly regulated, even in relatively genomically non-complex organisms like E. coli. Spurious initiation of DNA replication is generally a major disaster for the cell.

  3. anonymous says,

    Fortunately, the initiation of transcription is highly regulated by requiring multiple simultaneous binding events by several different proteins, not just one activator molecule, particularly in eukaryotic cells.

    True, but I'm not addressing normal "highly" regulated transcription in tomorrow's posting.

    Low levels of spurious transcription will occur whenever RNA polymerase binds to DNA and the probability is increased if there's an activator bound nearby. This will happen even when RNAP isn't at a promoter.

    You seem to think that transcription initiation will only occur when multiple independent binding events occur simultaneously. That's not correct. Some transcription occurs when only one activator is bound and the probability of an initiation event increases when more factors bind.

    It's a cumulative effect not an all-or-none effect.

  4. This is a great post. Larry, do you have references for papers that describe the experiments that were done to measure the binding constants? I'd love to read about them.

  5. ...we expect that random spurious transcription should be quite common.

    Hmm - as a layperson, I'm wondering whether the eventual point will be a counter to the "It is transcribed, therefore it must be meaningful and not 'junk'" crowd.

  6. Why do most of the non-specific DNA-binding proteins interact with DNA primarily through electrostatic interactions, whereas site-specific DNA-binding proteins interact with DNA primarily through the formation of non-covalent bonds between amino acid side-chains in the protein and bases in the major groove of the DNA?