MCB 201 Gene Expression - Spring Semester 2003


Lecture 8 (Molecular Structure of Genes and Chromosomes)

Section 9.1 (Molecular definition of a gene)

1. Definition of a gene. You would think that a term as fundamental to biology as 'gene' would have a generally accepted definition, but surprisingly perhaps, it is defined differently in different disciplines and by different scientists. Your text gives a common molecular definition of a gene as the entire nucleic acid sequence that is necessary for the synthesis of a functional polypeptide. In addition, we have already discussed the fact that rRNA and tRNA are encoded by genes, so gene products are not proteins only. And some DNA is mobile and self-replicating, but does not encode a functional product of either RNA or protein. These units are called selfish genes.

2. Figure 9-1. Comparison of bacterial operons and simple eukaryotic transcription units. This diagram emphasizes the fact that genes are often initially defined by mutations that alter them. The sites of mutation used in these examples are shown as lower case letters and vertical arrows. In the prokaryotic transcription unit shown on top, a mutation at site 'a' in the transcription control region may lower the binding affinity of this sequence for RNA polymerase and result in all of the proteins encoded in this operon being produced in lower amounts or not at all. In contrast, mutation 'b' could result in production of only inactive protein B. A simple eukaryotic transcription unit is shown in the lower part of the figure. Since this region encodes only one protein, the mutations only affect this gene product. Mutations in the control region can reduce or inhibit transcription and consequently reduce or eliminate production of the encoded protein. Mutation 'c' in exon 2 could result in production of a defective protein. Mutation 'd' within an intron may create a new RNA splice site which could result in production of an abnormally spliced mRNA encoding a nonfunctional protein. Note that in the globin mRNA used in this example, the introns, shown by dotted red lines, have been spliced out or removed.

3. Figure 9-2. Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. Complex transcriptional units are common in multicellular organisms. The primary RNA transcript can be processed in more than one way. For example, alternative poly(A) sites may be used (shown in panel A), or different splice sites may be used (shown in Panel B), producing mRNAs containing different exons, as in 'exon skipping'. These alternative ways of expressing the same gene are gaining importance in human genome expression since about 30,000 genes produce about 100,000 proteins indicating extensive post-transcriptional processing must occur during human gene expression. Mutational analysis of complex transcription units can become quite complicated. Here in Panel A, mutation 'a' which occurs in exons shared by both mRNAs affects the proteins encoded by both mRNAs. In contrast, mutations 'b' and 'c' within exons unique to one of the alternatively processed RNAs affect only the protein encoded by that mRNA.

Section 9.2 (Chromosomal organization of genes and noncoding DNA)

4. Figure 9-3. Diagrams of 80 kb region from chromosome III of yeast S. cerevisiae and the beta-globin gene cluster on human chromosome 11 (also about 80 kb long). This comparison of a region from a yeast chromosome and a region from a human chromosome shows dramatically the difference in the proportion of coding to noncoding sequence. Note that the density of genes in the yeast DNA is much higher than in the human segment. Note that 'open reading frames' are segments of DNA that do not contain stop codons, and therefore, could encode a gene product. In the globin gene cluster, each blue box contains a similar pattern of introns and exons, suggesting that these regions arose from the duplication of an ancestral gene that already had this pattern. There are also two nonfunctional pseudogenes in this cluster, which are related to the globin genes, but are not transcribed. Red arrows mark Alu sites. An Alu sequence is about 300 base pairs of noncoding sequence that is present at about 1 million sites in the human genome (about 10% of the total genomic DNA). This sequence is named for the presence of cutting sites recognized by the restriction endonuclease AluI.

5. Figure 4-1 (Cooper text). The C-value paradox. The total amount of DNA per haploid cells in an organism (this would be sperm and egg cells in humans) is called the C value. It turns out that cellular DNA content does not correlate with the apparent complexity of organisms nor with phylogenetic relationships among organisms. This failure to correlate is called the C- value paradox. Even within groups of organisms with the same complexity, even as closely related as broad beans and kidney beans, the variation in DNA content is 3-4 fold. It is likely these organisms carry a considerable amount of DNA, perhaps most of their DNA, that is noncoding.

6. Table 9-1. Classification of eukaryotic DNA. We now know that the C-value paradox arises from the fact that eukaryotic chromosomes contain variable amounts of DNA with no demonstrated function, both between genes (regions called intergenic sequences) and within genes (regions called intervening sequences or introns). Approximately 25-50% of the protein-coding genes are single-copy genes in the haploid genomes of multicellular eukaryotic organisms, i.e. they are solitary genes. The rest of the protein coding genes are members of families of two or more genes having similar sequences, i.e. gene families. Most gene families are thought to have arisen by gene duplication followed by the accumulation of mutations. Beneficial mutations that conferred some improvement in function were then retained by natural selection. The human genome draft sequence, about 3 billion base pairs, is now published. It contains only about 1% (some estimates are higher, about 4%) of the sequence as predicted coding sequence, or about 30,000 genes, many fewer than formerly predicted.

7. Figure 9-4. The chicken lysozyme gene and its surrounding regions. Here is an example of a simple transcription unit occupying about 15 kb of DNA. It contains four exons (blue boxes), three introns (tan boxes), and the positions marked by the red arrows are repeated Alu sequences found at many sites elsewhere in the chicken genome as well.

8. Figure 9-5. Gene duplication resulting from unequal crossing over. Here is a mechanism by which gene duplication can arise. The parental chromosomes pair so that L1 repetitive sequences are aligned, but in fact the chromosomes are displaced relative to each other. Homologous recombination between L1 sequences would produce one recombinant chromosome with two copies of the globin gene and one chromosome with a deletion of the globin gene. Unequal crossing over, as this is called, may also arise from rare recombination events between unrelated sequences.


Return to Lecture Index Page