[ 4581=> | Structure Tool | Williams Page ]

Transcription (updated: 12/30/99)

Background
All cellular RNA, including rRNA (ribosomal RNA), tRNA (transfer RNA) and mRNA (messenger RNA), is synthesized (transcribed) from DNA templates. RNA transcription is a step in the process by which information encoded in DNA is used to synthesize proteins. The information in DNA is transcribed to mRNA, which is then translated to protein. Transcription requires a DNA template, a DNA-directed RNA polymerase, 5' nucleotide triphosphates (ATP+GTP+UTP+CTP=NTP) and magnesium dication (Mg^+2).

Eukaryotic Transcription. Transcription of RNA in eukaryotes is carried out by three different polymerases. RNA pol I synthesizes rRNAs, except for the 5S rRNA species. RNA pol II synthesizes mRNAs and some small nuclear RNAs (snRNAs) involved in RNA splicing. RNA pol III synthesizes 5S rRNA and tRNAs.

One can conceptually break transcription into four steps: binding, initiation, elongation and termination.

Binding: An RNA polymerase binds to a specific base sequence, known as a promoter. A promoter is located upstream (on the 5' side) of the transcription start site. The RNA polymerase (this can an associated protein for organisms more complex than a phage) must melt out (separate the strands) of the template duplex.

Initiation: The reaction is initiated by the coupling of the first two NTP's:

5'pppN(1) + 5'pppN(2) -> 5' pppN(1)pN(2)

All transcription reactions are template directed. The identities of ppp(N1), pppN(2), etc., are controlled by the DNA template strand.

Elongation: Chain growth occurs in the 5' -> 3' direction. In vivo the rate is 20 to 50 nucleotides per second. Immediately after a polymerase has moved away from the promoter, a second polymerase can re-initiate. The error rate is low, about 0.01%. Elongation is processive.

Termination: Transcription is terminated by a stop codon or by running off the end of DNA template, as in this exercise.

Gene expression in eukaryotics is regulated primarily at the transcriptional level, by promoters and enhancers. Enhancers are nucleotide sequences from fifty to two hundred base pairs in length that are located either upstream or downstream from a gene. Promoters are nucleotide sequences that are located upstream of a gene and contain some or all of (i) a TATA box, (ii) a CAAT box, and (iii) a GC box. A given promoter may lack one of these sequence elements. Transcription factors recognize and bind to promoters and facilitate transcription by RNA polymerase. When transcription factors are present, the promoter is recognized by RNA polymerase, and transcription proceeds. Gene regulation is primarily positive in that transcription factors activate transcription.

The CCAAT-box (consensus GGT/CCAATCT) is located 50 to 130 residues upstream of the transcriptional start site. Protein such as C/EBP (for CCAAT-box/Enhancer Binding Protein) bind to the CCAAT-box element. The TATA-box is a promoter located 20 to 30 residues upstream of the transcription start site. Proteins such as TFIIA, B, C, etc. (for transcription factors regulating RNA pol II) interact with the TATA-box.

The TATA box-binding protein (TBP) is an essential component of the RNA polymerase transcription apparatus. TBP is required by all three eukaryotic RNA polymerases for correct initiation of transcription of ribosomal, messenger, small nuclear, and transfer RNAs. TBP looks like a molecular "saddle' that sits astride the DNA. The DNA-binding surface is a curved, antiparallel beta-sheet. TBP tracks the minor groove and induces a dramatic conformational change in the DNA, inducing two sharp kinks at either end of the preferred recognition sequence, TATAAAAG. Between the kinks, the DNA is smoothly curved and partly unwound, presenting a widened minor groove to TBP's concave, antiparallel beta-sheet. Side chain-base interactions are completely restricted to the minor groove, and include hydrogen bonds, van der Waals contacts and phenylalanine-base stacking interactions.