Part I
Foundations of Biochemistry

Facing page: Supernova SN 1987a (the bright “star” at the lower right) resulted
from the explosion of a blue supergiant star in the Large Magellanic Cloud, a galaxy
near the Milky Way. Energy released by nuclear explosions in such supernovae
brought about the fusion of simple atomic nuclei, forming the more complex elements
of which the earth, its atmosphere, and all living things are composed.

Fifteen to twenty billion years ago the universe arose with a cataclysmic explosion that hurled hot, energy-rich subatomic particles into all space. Within seconds, the simplest elements (hydrogen and helium) were formed. As the universe expanded and cooled, galaxies condensed under the influence of gravity. Within these galaxies, enormous stars formed and later exploded as supernovae, releasing the energy needed to fuse simpler atomic nuclei into the more complex elements. Thus were produced, over billions of years, the chemical elements found on earth today. Biochemistry asks how the thousands of different biomolecules formed from these elements interact with each other to confer the remarkable properties of living organisms.

In Part I we will summarize the biological and chemical background to biochemistry. Living organisms operate within the same physical laws that apply to all natural processes, and we begin by discussing those laws and several axioms that flow from them (Chapter 1). These axioms make up the molecular logic of life. They define the means by which cells transform energy to accomplish work, catalyze the chemical transformations that typify them, assemble molecules of great complexity from simpler subunits, form supramolecular complexes that are the machinery of life, and store and pass on the instructions for the assembly of all future generations of organisms from simple, nonliving precursors.

Cells, the units of all living organisms, share certain features; but the cells of different organisms, and the various cell types within a single organism, are remarkably diverse in structure and function. Chapter 2 is a brief description of the common features and the diverse specializations of cells, and of the evolutionary processes that lead to such diversity.

Nearly all of the organic compounds from which living organisms are constructed are products of biological activity. These biomolecules were selected during the course of biological evolution for their fitness in performing specific biochemical and cellular functions. The biomolecules can be characterized and understood in the same terms that apply to the molecules of inanimate matter: the types of bonds between atoms, the factors that contribute to bond formation and bond strength, the three-dimensional structure of molecules, and chemical reactivities. Three-dimensional structure is especially important in biochemistry; the specificity of biological interactions, such as those between enzyme and substrate, antibody and antigen, hormone and receptor, is achieved by close steric complementarity between molecules. Prominent among the forces that stabilize three-dimensional

structure are noncovalent interactions, individually weak but with significant cumulative effects on the structure of biological macromolecules. Chapter 3 provides the chemical basis for later discussions of the structure, catalysis, and metabolic interconversions of individual classes of biomolecules.

Water is the medium in which the first cells arose, and the solvent in which most biochemical transformations occur. The properties of water have shaped the course of evolution and exert a decisive influence on the structure of biomolecules in aqueous solution. Many of the weak interactions within and between biomolecules are strongly affected by the solvent properties of water. Even water-insoluble components of cells, such as membrane lipids, interact with each other in ways dictated by the polar properties of water. In Chapter 4 we consider the properties of water, the weak noncovalent interactions that occur in aqueous solutions of biomolecules, and the ionization of water and of solutes in aqueous solution.

These initial chapters are intended to provide a chemical backdrop for the later discussions of biochemical structures and reactions, so that whatever your background in chemistry or biology, you can immediately begin to follow, and to enjoy, the action.

www.bioinfo.org.cn/book/biochemistry/chapt01/bio0.htm
Chapter 1
The Molecular Logic of Life

Living organisms are composed of lifeless molecules. When these molecules are isolated and examined individually, they conform to all the physical and chemical laws that describe the behavior of inanimate matter. Yet living organisms possess extraordinary attributes not shown by any random collection of molecules. In this chapter, we first consider the properties of living organisms that distinguish them from other collections of matter. After arriving at a broad definition of life, we can describe a set of principles that characterize all living organisms. These principles underlie the organization of organisms and the cells that make them up, and they provide the framework for this book. They will help you to keep the larger picture in mind while exploring the illustrative examples presented in the text.

Figure 1–1  Some characteristics of living matter.
(a) Microscopic complexity and organization are apparent in this thin section of vertebrate muscle tissue, viewed with the electron microscope. (b) The lion uses organic compounds obtained by eating other animals to fuel intense bursts of muscular activity. The zebra derives energy from compounds in the plants it consumes; the plants derive their energy from sunlight. (c) Biological reproduction occurs with near-perfect fidelity.

What distinguishes all living organisms from all inanimate objects? First, they are structurally complicated and highly organized. They possess intricate internal structures (Fig. 1–1a) and contain many kinds of complex molecules. By contrast, the inanimate matter in our environment – clay, sand, rocks, seawater – usually consists of mixtures of relatively simple chemical compounds.

Second, living organisms extract, transform, and use energy from their environment (Fig. 1–1b), usually in the form of either chemical nutrients or the radiant energy of sunlight. This energy enables living organisms to build and maintain their own intricate structures and to do mechanical, chemical, osmotic, and other types of work. By contrast, inanimate matter does not use energy in a systematic way to maintain structure or to do work. Inanimate matter tends to decay toward a more disordered state, to come to equilibrium with its surroundings.

The third and most characteristic attribute of living organisms is the capacity for precise self-replication and self-assembly (Fig. 1–1c), a property that can be regarded as the quintessence of the living state. A single bacterial cell placed in a sterile nutrient medium can give rise to a billion identical “daughter” cells in 24 hours. Each of the cells contains thousands of different molecules, some extremely complex; yet each bacterium is a faithful copy of the original, constructed entirely from information contained within the genetic material of the original cell. By contrast, mixtures of inanimate matter show no capacity to grow and reproduce in forms identical in mass, shape, and internal structure, generation after generation.

The ability to self-replicate has no true analog in the nonliving world, but there is an instructive analogy in the growth of crystals in saturated solutions. Crystallization produces more material identical in lattice structure with the original "seed" crystal. Crystals are much less complex than the simplest living organisms, and their structure is static, not dynamic as are living cells. Nonetheless, the ability of crystals to "reproduce" themselves led the physicist Erwin Schrödinger to propose in his famous essay "What Is Life?" that the genetic material of cells must have some of the properties of a crystal. Schrödinger’s 1944 notion (years before the modern understanding of gene structure was achieved) describes rather accurately some of the properties of deoxyribonucleic acid, the material of genes.

Erwin Schrödinger
1887–1961

Each component of a living organism has a specific function. This is true not only of macroscopic structures such as leaves and stems or hearts and lungs, but also of microscopic intracellular structures such as the nucleus or chloroplast. Even individual chemical compounds in cells have specific functions. The interplay among the chemical components of a living organism is dynamic; changes in one component cause coordinating or compensating changes in another, with the result that the whole ensemble displays a character beyond that of the individual constituents. The collection of molecules carries out a program, the end result of which is the reproduction of the program and the self-perpetuation of that collection of molecules.

The molecules of which living organisms are composed conform to all the familiar laws of chemistry, but they also interact with each other in accordance with another set of principles, which we shall refer to collectively as the molecular logic of life. These principles do not involve new or as yet undiscovered physical laws or forces. Instead, they are a set of relationships characterizing the nature, function, and interactions of biomolecules.

If living organisms are composed of molecules that are intrinsically inanimate, how do these molecules confer the remarkable combination of characteristics we call life? How is it that a living organism appears to be more than the sum of its inanimate parts? Philosophers once answered that living organisms are endowed with a mysterious and divine life force, but this doctrine (vitalism) has been firmly rejected by modern science. The basic goal of the science of biochemistry is to determine how the collections of inanimate molecules that constitute living organisms interact with each other to maintain and perpetuate life. Although biochemistry yields important insights and practical applications in medicine, agriculture, nutrition, and industry, it is ultimately concerned with the wonder of life itself.

Figure 1–2  Diverse living organisms share common chemical features. The eagle, the oak tree, the soil bacterium, and the human share the same basic structural units (cells), the same kinds of macromolecules (DNA, RNA, proteins) made up of the same kinds of monomeric subunits (nucleotides, amino acids), the same pathways for synthesis of cellular components, and the same genetic code and evolutionary ancestors.

A massive oak tree, an eagle that soars above it, and a soil bacterium that grows among its roots appear superficially to have very little in common. However, a hundred years of biochemical research has revealed that living organisms are remarkably alike at the microscopic and chemical levels (Fig. 1–2). Biochemistry seeks to describe in molecular terms those structures, mechanisms, and chemical processes shared by all organisms and to discover the organizing principles that underlie life in all of its diverse forms.

Although there is a fundamental unity to life, it is important to recognize at the outset that very few generalizations about living organisms are absolutely correct for every organism under every condition. The range of habitats in which organisms live, from hot springs to Arctic tundra, from animal intestines to college dormitories, is matched by a correspondingly wide range of specific biochemical adaptations. These adaptations are integrated within the fundamental chemical framework shared by all organisms. Although generalizations are not perfect, they remain useful. In fact, exceptions often illuminate scientific generalizations.
Figure 1–3  Monomeric subunits in linear sequences can spell infinitely complex messages. The number of different sequences possible (S) depends on the number of different kinds of subunits (N) and the length of the linear sequence (L): S = NL. For polymers the size of proteins (L ≈ 1,000), S is very large, and for nucleic acids, for which L may be many millions, S is astronomical.

Most of the molecular constituents of living systems are composed of carbon atoms covalently joined with other carbon atoms and with hydrogen, oxygen, or nitrogen. The special bonding properties of carbon permit the formation of a great variety of molecules. Organic compounds of molecular weight (Mr) less than about 500, such as amino acids, nucleotides, and monosaccharides, serve as monomeric subunits of proteins, nucleic acids, and polysaccharides, respectively. A single protein molecule may have 1,000 or more amino acids, and deoxyribonucleic acid has millions of nucleotides.

Each cell of the bacterium Escherichia coli (E. coli) contains more than 6,000 different kinds of organic compounds, including about 3,000 different proteins and a similar number of different nucleic acid molecules. In humans there may be tens of thousands of different kinds of proteins, as well as many types of polysaccharides (chains of simple sugars), a variety of lipids, and many other compounds of lower molecular weight.

To purify and to characterize thoroughly all of these molecules would be an insuperable task were it not for the fact that each class of macromolecules (proteins, nucleic acids, polysaccharides) is composed of a small, common set of monomeric subunits. These monomeric subunits can be covalently linked in a virtually limitless variety of sequences (Fig. 1–3), just as the 26 letters of the English alphabet can be arranged into a limitless number of words, sentences, or books.

Deoxyribonucleic acids (DNA) are constructed from only four different kinds of simple monomeric subunits, the deoxyribonucleotides, and ribonucleic acids (RNA) are composed of just four types of ribonucleotides. Proteins are composed of 20 different kinds of amino acids. The eight kinds of nucleotides from which all nucleic acids are built and the 20 different kinds of amino acids from which all proteins are built are identical in all living organisms.

Most of the monomeric subunits from which all macromolecules are constructed serve more than one function in living cells. The nucleotides serve not only as subunits of nucleic acids, but also as energy-carrying molecules. The amino acids are subunits of protein molecules, and also precursors of hormones, neurotransmitters, pigments, and many other kinds of biomolecules.

From these considerations we can now set out some of the principles in the molecular logic of life:

All living organisms have the same kinds of monomeric subunits.

There are underlying patterns in the structure of biological macromolecules.

The identity of each organism is preserved by its possession of distinctive sets of nucleic acids and of proteins.

www.bioinfo.org.cn/book/biochemistry/chapt01/bio1.htm

Energy is a central theme in biochemistry: cells and organisms depend upon a constant supply of energy to oppose the inexorable tendency in nature for decay to the lowest energy state. The synthetic reactions that occur within cells, like the synthetic processes in any factory, require the input of energy. Energy is consumed in the motion of a bacterium or an Olympic sprinter, in the flashing of a firefly or the electrical discharge of an eel. The storage and expression of information cost energy, without which structures rich in information inevitably become disordered and meaningless. Cells have evolved highly efficient mechanisms for capturing the energy of sunlight, or extracting the energy of oxidizable fuels, and coupling the energy thus obtained to the many energy-consuming processes they carry out.

Figure 1–4  Living organisms are not at equilibrium with their surroundings. Death and decay restore the equilibrium. During growth, energy from food is used to build complex molecules and to concentrate ions from the surroundings. When the organism dies, it loses its ability to derive energy from food. Without energy, the dead body cannot maintain concentration gradients; ions leak out. Inexorably, macromolecular components decay to simpler compounds. These simple compounds serve as nutritional sources for phytoplankton, which are then eaten by larger organisms. (By convention, square brackets denote concentration – in this case, of ionic species.)

In the course of biological evolution, one of the first developments must have been an oily membrane that enclosed the water-soluble molecules of the primitive cell, segregating them and allowing them to accumulate to relatively high concentrations. The molecules and ions contained within a living organism differ in kind and in concentration from those in the organism’s surroundings. The cells of a freshwater fish contain certain inorganic ions at concentrations far different from those in the surrounding water (Fig. 1–4). Proteins, nucleic acids, sugars, and fats are present in the fish but essentially absent from the surrounding water, which instead contains carbon, hydrogen, and oxygen atoms only in simpler molecules such as carbon dioxide and water. When the fish dies, its contents eventually come to equilibrium with those of its surroundings.

Figure 1–5  A dynamic steady state results when the rate of appearance of a cellular component is exactly matched by the rate of its disappearance. In (a), a protein (hemoglobin) is synthesized, then degraded. In (b), glucose derived from food (or from carbohydrate stores) enters the bloodstream in some tissues (intestine, liver), then leaves the blood to be consumed by metabolic processes in other tissues (heart, brain, skeletal muscle). In this scheme, r1, r2, etc., represent the rates of the various processes. The dynamic steady-state concentrations of hemoglobin and glucose are maintained by complex mechanisms regulating the relative rates of the processes shown here.

Although the chemical composition of an organism may be almost constant through time, the population of molecules within a cell or organism is far from static. Molecules are synthesized and then broken down by continuous chemical reactions, involving a constant flux of mass and energy through the system. The hemoglobin molecules carrying oxygen from your lungs to your brain at this moment were synthesized within the past month; by next month they will have been degraded and replaced with new molecules. The glucose you ingested with your most recent meal is now circulating in your bloodstream; before the day is over these particular glucose molecules will have been converted into something else, such as carbon dioxide or fat, and will have been replaced with a fresh supply of glucose. The amount of hemoglobin and glucose in the blood remains nearly constant because the rate of synthesis or intake of each just balances the rate of its breakdown, consumption, or conversion into some other product (Fig. 1–5). The constancy of concentration does not, therefore, reflect chemical inertness of the components, but is rather the result of a dynamic steady state.

Figure 1–6  (Top) The downward motion of an object releases potential energy that can do work. The potential energy made available by spontaneous downward motion (an exergonic process, represented by the pink box) can be coupled to the upward movement of another object (an endergonic process, represented by the blue box). (Bottom) A spontaneous (exergonic) chemical reaction (B→C) releases free energy, which can pull or drive an endergonic reaction (A→B) when the two reactions share a common intermediate, B. The exergonic reaction B→C has a large, negative free-energy change (ΔGB→C), and the endergonic reaction A→B has a smaller, positive free-energy change (ΔGA→B). The free-energy change for the overall reaction A→C is the arithmetic sum of these two values (ΔGA→C). Because the value of ΔGA→C is negative, the overall reaction is exergonic and proceeds spontaneously.

Living cells and organisms must perform work to stay alive and to reproduce themselves. The continual synthesis of cellular components requires chemical work; the accumulation and retention of salts and various organic compounds against a concentration gradient involves osmotic work; and the contraction of a muscle or the motion of a bacterial flagellum represents mechanical work. Biochemistry examines the processes by which energy is extracted, channeled, and consumed, so it is essential to develop an understanding of the fundamental principles of bioenergetics.

Consider the simple mechanical example shown in Figure 1–6. An object at the top of an inclined plane has a certain amount of potential energy as a result of its elevation. It tends spontaneously to slide down the plane, losing its potential energy of position as it approaches the ground. When an appropriate string-and-pulley device is attached to the object, the spontaneous downward motion can accomplish a certain

amount of work, an amount never greater than the change in potential energy of position. The amount of energy actually available to do work (called the free energy) will always be somewhat less than the total change in energy, because some energy is dissipated as the heat of friction. The greater the elevation of the object relative to its final position, the greater the change in energy as it slides downward, and the greater the amount of work that can be accomplished.

In the chemical analog of this mechanical example (Fig. 1–6, bottom), a reactant, B, is converted into a product, C. The compounds B and C each contain a certain amount of potential energy, related to the kind and number of bonds in each type of molecule. This energy is analogous to the potential energy in an elevated object. Some of the energy is available to do work when B is converted into C by a chemical reaction that involves no change in temperature or pressure. This portion of the energy, the free energy, is designated G (for J. Willard Gibbs, who developed much of the theory of chemical energetics), and the change in free energy during the conversion of B to C is ΔG.

We can define a system as all of the reactants and products, the solvent, and the immediate atmosphere – in short, everything within a defined region of space. The system and its surroundings together constitute the universe. If the system exchanges neither matter nor energy with its surroundings, it is said to be closed. The magnitude of the free-energy change for a process proceeding toward equilibrium depends upon how far from equilibrium the system was in its initial state. In the mechanical example, no spontaneous sliding will occur once the object has reached the ground; the object is then at equilibrium with its surroundings, and the free-energy change for sliding along the horizontal surface is zero.

In chemical reactions in closed systems, the process also proceeds spontaneously until equilibrium is reached. The free-energy changeG) for a chemical reaction is a quantitative expression of how far the system is from chemical equilibrium. Reactions that proceed with the release of free energy are exergonic, and because the products of such reactions have less free energy than the reactants, ΔG is negative. Chemical reactions in which the products have more free energy than the reactants are endergonic, and for these reactions ΔG is positive. When all of the chemical species in the system are at equilibrium, the free-energy change for the reaction is zero, and no further net conversion of reactants into products will occur without the input of energy or matter from outside the system.

As in the mechanical example, some of the energy released in a spontaneous process can accomplish work – chemical work in this case. In living systems, as in mechanical processes, part of the total energy change in the chemical reaction is unavailable to accomplish work. Some is dissipated as heat, and some is lost as entropy, a measure of energy due to randomness, which we will define more rigorously later.

How is free energy from a chemical reaction channeled into energy-requiring processes in living organisms? In the mechanical example in Figure 1–6, it is clear that if one sliding object is coupled to another object on another inclined plane, the energy released by the spontaneous downward sliding of one may be harnessed to produce upward motion of the other, a motion that cannot occur spontaneously. This is a direct analogy to a biochemical process in which the energy released in an exergonic chemical reaction can be used to drive another reaction that is endergonic and would not proceed spontaneously. The reactions

in this system are coupled because the product of one (compound B) is a reactant in the other. This coupling of an exergonic reaction with an endergonic one is absolutely central to the free-energy exchanges that occur in all living systems. In biological energy coupling, the simultaneous occurrence of two reactions is not enough. The two reactions must be coupled in the sense of Figure 1–6 (bottom); the two reactions share an intermediate, B.

A living organism is an open system; it exchanges both matter and energy with its surroundings. Living organisms use either of two strategies to derive free energy from their surroundings: (1) they take up chemical components from the environment (fuels), extract free energy by means of exergonic reactions involving these fuels, and couple these reactions to endergonic reactions; or (2) they use energy absorbed from sunlight to bring about exergonic photochemical reactions, to which they couple endergonic reactions.

Living organisms create and maintain their complex, orderly structures at the expense of free energy from their environment.

Exergonic chemical or photochemical reactions are coupled to endergonic processes through shared chemical intermediates, channeling the free energy to do work.

www.bioinfo.org.cn/book/biochemistry/chapt01/bio2.htm
Figure 1–7  During metabolic transductions, entropy increases as the potential energy of complex nutrient molecules decreases. Living organisms (a) extract energy from their environment, (b) convert some of it into useful forms of energy to produce work, and (c) return some energy to the environment as heat, together with end-product molecules that are less well organized than the starting fuel, increasing the entropy of the universe.

The first law of thermodynamics, developed from physics and chemistry but fully valid for biological systems as well, describes the energy conservation principle:

In any physical or chemical change, the total amount of energy in the universe remains constant, although the form of the energy may change.

Not until the nineteenth century did physicists discover that energy can be transduced (converted from one form to another), yet living cells have been using that principle for eons. Cells are consummate transducers of energy, capable of interconverting chemical, electromagnetic, mechanical, and osmotic energy with great efficiency (Fig. 1–7). Biological energy transducers differ from many familiar machines that depend on temperature or pressure differences. The steam engine, for example, converts the chemical energy of fuel into heat, raising the temperature of water to its boiling point to produce steam pressure that drives a mechanical device. The internal combustion engine, similarly, depends upon changes in temperature and pressure. By contrast, all parts of a living organism must operate at about the same temperature and pressure, and heat flow is therefore not a useful source of energy. Cells are isothermal, or constant-temperature, systems.

Living cells are chemical engines that function at constant temperature.
Figure 1–8  Sunlight is the ultimate source of all biological energy. Thermonuclear reactions in the sun produce energy that is transmitted to the earth as light and converted into chemical energy by plants and certain microorganisms.

Virtually all of the energy transductions in cells can be traced to a flow of electrons from one molecule to another, in the oxidation of fuel or in the trapping of light energy during photosynthesis. This electron flow is “downhill”, from higher to lower electrochemical potential; as such, it is formally analogous to the flow of electrons in an electric circuit driven by an electrical battery. Nearly all living organisms derive their energy, directly or indirectly, from the radiant energy of sunlight, which arises from the thermonuclear fusion reactions that form helium in the sun (Fig. 1–8). Photosynthetic cells absorb the sun’s radiant energy and use it to drive electrons from water to carbon dioxide, forming energy-rich products such as starch and sucrose. In doing so, most photosynthetic organisms release molecular oxygen into the atmosphere. Ultimately, nonphotosynthetic organisms obtain energy for their needs by oxidizing the energy-rich products of photosynthesis, passing electrons to atmospheric oxygen to form water, carbon dioxide, and other end products, which are recycled in the environment. All of these reactions involving electron flow are oxidation-reduction reactions. Thus, other principles of the living state emerge:

The energy needs of virtually all organisms are provided, directly or indirectly, by solar energy.

The flow of electrons in oxidation-reduction reactions underlies energy transduction and energy conservation in living cells.

All living organisms are dependent on each other through exchanges of energy and matter via the environment.

Figure 1–9  The energetic course of a chemical reaction. A high activation barrier, representing the transition state, must be overcome in the conversion of reactants (A) into products (B), even though the products are more stable than the reactants – as indicated by a large, negative free-energy change (ΔG). The energy required to overcome the activation barrier is the activation energy (ΔG). Enzymes catalyze reactions by lowering the activation barrier. They bind the transition-state intermediates tightly, and the binding energy of this interaction effectively reduces the activation energy from ΔGuncat to ΔGcat. (Note that the activation energy is unrelated to the free-energy change of the reaction, ΔG.)
Figure 1–10  An enzyme increases the rate of a specific chemical reaction. In the presence of an enzyme specific for the conversion of reactant A into product B, the rate of the reaction may increase a millionfold or more over that of the uncatalyzed reaction. The enzyme is not consumed in the process; one enzyme molecule can act repeatedly to convert many molecules of A to B.
Figure 1–11  An example of a typical synthetic (anabolic) pathway. In the bacterium E. coli, threonine is converted to isoleucine in five steps, each catalyzed by a separate enzyme. (Only the main reactants and products are shown here.) Threonine, in turn, was synthesized from a simpler precursor. Both threonine and isoleucine are precursors of much larger and more complex molecules: the proteins. (The letters A to F correspond to those in Fig. 1–14.)

The fact that a reaction is exergonic does not mean that it will necessarily proceed rapidly. The reaction coordinate diagram in Figure 1–6 (bottom) is actually an oversimplification. The path from reactant to product almost invariably involves an energy barrier, called the activation barrier (Fig. 1–9), that must be surmounted for any reaction to occur. The breaking and joining of bonds generally requires the prior bending or stretching of existing bonds, creating a transition state of higher free energy than either reactant or product. The highest point in the reaction coordinate diagram represents the transition state.

Activation barriers are crucial to the stability of biomolecules in living systems. Although, when isolated from other cellular components, most biomolecules are stable for days or even years, inside cells they often undergo chemical transformations within milliseconds. Without activation barriers, biomolecules within cells would rapidly break down to simple, low-energy forms. The lifetime of complex molecules would be very short, and the extraordinary continuity and organization of life would be impossible.

Virtually every cellular chemical reaction occurs because of enzymes – catalysts that are capable of greatly enhancing the rate of specific chemical reactions without being consumed in the process (Fig. 1–10). Enzymes, as catalysts, act by lowering this energy barrier between reactant and product. The activation energyG; Fig. 1–9) required to overcome this energy barrier could in principle be supplied by heating the reaction mixture, but this option is not available in living cells. Instead, during a reaction, enzymes bind reactant molecules in the transition state, thereby lowering the activation energy and enormously accelerating the rate of the reaction. The relationship between the activation energy and reaction rate is exponential; a small decrease in ΔG results in a very large increase in reaction rate. Enzyme-catalyzed reactions commonly proceed at rates up to 1010- to 1014-fold greater than the uncatalyzed rates.

Enzymes are, with a few exceptions we will consider later, proteins. Each enzyme protein is specific for the catalysis of a specific reaction, and each reaction in a cell is catalyzed by a different enzyme. Thousands of different types of enzymes are therefore required by each cell. The multiplicity of enzymes, their high specificity for reactants, and their susceptibility to regulation give cells the capacity to lower activation barriers selectively. This selectivity is crucial in the effective regulation of cellular processes.

The thousands of enzyme-catalyzed chemical reactions in cells are functionally organized into many different sequences of consecutive reactions called pathways, in which the product of one reaction becomes the reactant in the next (Fig. 1–11). Some of these sequences of enzyme-catalyzed reactions degrade organic nutrients into simple end products, in order to extract chemical energy and convert it into a form useful to the cell. Together these degradative, free-energy-yielding reactions are designated catabolism. Other enzyme-catalyzed pathways start from small precursor molecules and convert them to progressively larger and more complex molecules, including proteins and nucleic acids; such synthetic pathways invariably require the input of energy, and taken together represent anabolism. The network of enzyme-catalyzed pathways constitutes cellular metabolism.
www.bioinfo.org.cn/book/biochemistry/chapt01/bio3.htm
Figure 1–12  (a) Structural formula and (b) ball-and-stick model for adenosine triphosphate (ATP). The removal of the terminal phosphate of ATP is highly exergonic, and this reaction is coupled to many endergonic reactions in the cell.
Figure 1–13  ATP is the chemical intermediate linking energy-releasing to energy-requiring cell processes. Its role in the cell is analogous to that of money in an economy: it is "earned/produced" in exergonic reactions and "spent/consumed" in endergonic ones.

Cells capture, store, and transport free energy in a chemical form. Adenosine triphosphate (ATP) (Fig. 1–12) functions as the major carrier of chemical energy in all cells. ATP carries energy between metabolic pathways by serving as the shared intermediate that couples endergonic reactions to exergonic ones. The terminal phosphate group of ATP is transferred to a variety of acceptor molecules, which are thereby activated for further chemical transformation. The adenosine diphosphate (ADP) that remains after the phosphate transfer is recycled to become ATP, at the expense of either chemical energy (during oxidative phosphorylation) or solar energy in photosynthetic cells (by the process of photophosphorylation). ATP is the major connecting link (the shared intermediate) between the catabolic and anabolic networks of enzyme-catalyzed reactions in the cell (Fig. 1–13).

These linked networks of enzyme-catalyzed reactions are virtually identical in all living organisms.
Figure 1–14  Regulation of a biosynthetic pathway by feedback inhibition. In the pathway by which isoleucine is formed in five steps from threonine (Fig. 1–11), the accumulation of the product isoleucine (F) causes inhibition of the first reaction in the pathway by binding to the enzyme catalyzing this reaction and reducing its activity. (The letters A to F represent the corresponding compounds shown in Fig. 1–11.)

Not only can living cells simultaneously synthesize thousands of different kinds of carbohydrate, fat, protein, and nucleic acid molecules and their simpler subunits, they can also do so in the precise proportions required by the cell. For example, when rapid cell growth occurs, the precursors of proteins and nucleic acids must be made in large quantities, whereas in nongrowing cells the requirement for these precursors is much reduced. Key enzymes in each metabolic pathway are regulated so that each type of precursor molecule is produced in a quantity appropriate to the current requirements of the cell. Consider the pathway shown in Figure 1–14 (see also Fig. 1–11), which leads to the synthesis of isoleucine (one of the amino acids, the monomeric subunits of proteins). If a cell begins to produce more isoleucine than is needed for protein synthesis, the unused isoleucine accumulates. High concentrations of isoleucine inhibit the catalytic activity of the first enzyme in the pathway, immediately slowing the production of the amino acid. Such negative feedback keeps the production and utilization of each metabolic intermediate in balance.

Living cells also regulate the synthesis of their own catalysts, the enzymes. Thus a cell can switch off the synthesis of an enzyme required to make a given product whenever that product is available ready-made in the environment. These self-adjusting and self-regulating properties allow cells to maintain themselves in a dynamic steady state, despite fluctuations in the external environment.

Living cells are self-regulating chemical engines, adjusted for maximum economy.
www.bioinfo.org.cn/book/biochemistry/chapt01/bio4.htm

The continued existence of a biological species requires that its genetic information be maintained in a stable form and, at the same time, expressed with very few errors. Effective storage and accurate expression of the genetic message defines individual species, distinguishes them from one another, and assures their continuity over successive generations.

Among the seminal discoveries of twentieth-century biology are the chemical nature and the three-dimensional structure of the genetic material, DNA. The sequence of deoxyribonucleotides in this linear polymer encodes the instructions for forming all other cellular components and provides a template for the production of identical DNA molecules to be distributed to progeny when a cell divides.

Figure 1–15  Two ancient scripts. (a) The Prism of Sennacherib, inscribed in about 700 B.C., describes in characters of the Assyrian language some historical events during the reign of King Sennacherib. The Prism contains about 20,000 characters, weighs about 50 kg, and has survived almost intact for about 2,700 years. (b) The single DNA molecule of the bacterium E. coli, seen leaking out of a disrupted cell, is hundreds of times longer than the cell itself and contains all of the encoded information necessary to specify the cell’s structure and functions. The bacterial DNA contains about 10 million characters (nucleotides), weighs less than 10-10 g, and has undergone only relatively minor changes during the past several million years. The black spots and white specks are artifacts of the preparation.

Perhaps the most remarkable of all the properties of living cells and organisms is their ability to reproduce themselves with nearly perfect fidelity for countless generations. This continuity of inherited traits implies constancy, over thousands or millions of years, in the structure of the molecules that contain the genetic information. Very few historical records of civilization, even those etched in copper or carved in stone, have survived for a thousand years (Fig. 1–15). But there is good evidence that the genetic instructions in living organisms have remained nearly unchanged over very much longer periods; many bacteria have nearly the same size, shape, and internal structure and contain the same kinds of precursor molecules and enzymes as those that lived a billion years ago.

Hereditary information is preserved in DNA, a long, thin organic polymer so fragile that it will fragment from the shear forces arising in a solution that is stirred or pipetted. A human sperm or egg, carrying the accumulated hereditary information of millions of years of evolution, transmits these instructions in the form of DNA molecules, in which the linear sequence of covalently linked nucleotide subunits encodes the genetic message.
Figure 1–16  The complementary structure of double-stranded DNA accounts for its accurate replication. DNA is a linear polymer of four subunits, the deoxyribonucleotides deoxyadenylate (A), deoxyguanylate (G), deoxycytidylate (C), and deoxythymidylate (T), joined covalently. Each nucleotide has the intrinsic ability, due to its precise three-dimensional structure, to associate very specifically but noncovalently with one other nucleotide: A always associates with its complement T, and G with its complement C. In the doublestranded DNA molecule, the sequence of nucleotides in one strand is complementary to the sequence in the other; wherever G occurs in strand 1, C occurs in strand 2; wherever A occurs in strand 1, T occurs in strand 2. The two strands of the DNA, held together by a large number of hydrogen bonds (represented here by vertical blue lines) between the pairs of complementary nucleotides, twist about each other to form the DNA double helix. In DNA replication, prior to cell division, the two strands of the original DNA separate and two new strands are synthesized, each with a sequence complementary to one of the original strands. The result is two double-helical DNA molecules, each identical to the original DNA.

The capacity of living cells to preserve their genetic material and to duplicate it for the next generation results from the structural complementarity between the two halves of the DNA molecule (Fig. 1–16). The basic unit of DNA is a linear polymer of four different monomeric subunits, deoxyribonucleotides (see Fig. 1–3), arranged in a precise linear sequence. It is this linear sequence that encodes the genetic information. Two of these polymeric strands are twisted about each other to form the DNA double helix, in which each monomeric subunit in one strand pairs specifically with the complementary subunit in the opposite strand. In the enzymatic replication or repair of DNA, one of the two strands serves as a template for the assembly of another, structurally complementary DNA strand. Before a cell divides, the two DNA strands separate and each serves as a template for the synthesis of a complementary strand, generating two identical double-helical molecules, one for each daughter cell. If one strand is damaged, continuity of information is assured by the information present on the other strand.

Genetic information is encoded in the linear sequence of four kinds of subunits of DNA.

The double-helical DNA molecule contains an internal template for its own replication and repair.

Figure 1–17  The gradual accumulation of mutations over long periods of time results in new biological species, each with a unique DNA sequence. At top is shown a short segment of a gene in a hypothetical progenitor organism. With the passage of time, changes in nucleotide sequence (mutations, indicated here by colored boxes) occur, one at a time, resulting in progeny with different DNA sequences. These mutant progeny themselves undergo occasional mutations, yielding their own progeny differing by two or more nucleotides from the original sequence.

Despite the near-perfect fidelity of genetic replication, infrequent, unrepaired mistakes in the replication process produce changes in the nucleotide sequence of DNA, representing a genetic mutation (Fig. 1–17). Incorrectly repaired damage to one of the DNA strands has the same effect. Mutations can change the instructions for producing cellular components. Many mutations are deleterious or even lethal to the organism; they may, for example, cause the synthesis of a defective enzyme that is not able to catalyze an essential metabolic reaction.

Occasionally the mutation better equips an organism or cell to survive in its environment. The mutant enzyme might, for example, have acquired a slightly different specificity, so that it is now able to use as a reactant some compound that the cell was previously unable to metabolize. If a population of cells were to find itself in an environment where that compound was the only available source of fuel, the mutant cell would have an advantage over the other, unmutated (wild-type) cells in the population. The mutant cell and its progeny would survive in the new environment, whereas wild-type cells would starve and be eliminated.

Chance genetic variations in individuals in a population, combined with natural selection (survival of the fittest individuals in a challenging or changing environment), have resulted in the evolution of an enormous variety of organisms, each adapted to life in a particular ecological niche.

www.bioinfo.org.cn/book/biochemistry/chapt01/bio5.htm

Biochemistry has confirmed and greatly extended evolutionary theory. Carolus Linnaeus recognized the anatomic similarities and differences among living organisms and provided a framework for assessing the relatedness of different species. Charles Darwin gave us a unifying hypothesis to explain the phylogeny of modern organisms – the origin of different species from a common ancestor. Biochemistry has begun to reveal the molecular anatomy of cells of different species – the sequences of subunits in nucleic acids and proteins and the three-dimensional structures of individual molecules of nucleic acid and protein. There is a reasonable prospect that when the twenty-first century dawns, we will know the entire nucleotide sequence of all of the genes that make up the biological heritage of a human.

At the molecular level, evolution is the emergence over time of different sequences of nucleotides within genes. With new genetic sequences being experimentally determined almost daily, biochemists have an enormously rich treasury of evidence with which to analyze evolutionary relationships and to refine evolutionary theory. The molecular phylogeny derived from gene sequences is consistent with, but in many cases more precise than, the classical phylogeny based on macroscopic structures.

Molecular structures and mechanisms have been conserved in evolution even though organisms have continuously diverged at the level of gross anatomy. At the molecular level, the basic unity of life is readily apparent; crucial molecular structures and mechanisms are remarkably similar from the simplest to the most complex organisms. Biochemistry makes it possible to discover the unifying features common to all life. This book examines many of these features: the mechanisms for energy conservation, biosynthesis, gene replication, and gene expression.

Figure 1–18  Linear sequences of deoxyribonucleotides in DNA, arranged into units known as genes, are transcribed into ribonucleic acid (RNA) molecules with complementary ribonucleotide sequences. The RNA sequences are then translated into linear protein chains, which fold spontaneously into their native three-dimensional shapes. Individual proteins sometimes associate with other proteins to form supramolecular complexes, stabilized by numerous weak interactions.

The information in DNA is encoded as a linear (one-dimensional) sequence of the nucleotide units of DNA, but the expression of this information results in a three-dimensional cell. This change from one to three dimensions occurs in two phases. A linear sequence of deoxyribonucleotides in DNA codes (through the intermediary, RNA) for the production of a protein with a corresponding linear sequence of amino acids (Fig. 1–18). The protein folds itself into a particular three-dimensional shape, dictated by its amino acid sequence. The precise three-dimensional structure (native conformation) is crucial to the protein’s function as either catalyst or structural element. This principle emerges:

The linear sequence of amino acids in a protein leads to the acquisition of a unique three-dimensional structure by a self-assembly process.

Once a protein has folded into its native conformation, it may associate noncovalently with other proteins, or with nucleic acids or lipids,

to form supramolecular complexes such as chromosomes, ribosomes, and membranes (Fig. 1–18). These complexes are in many cases self-assembling. The individual molecules of these complexes have specific, high-affinity binding sites for each other, and within the cell they spontaneously form functional complexes.

Individual macromolecules with specific affinity for other macromolecules self-assemble into supramolecular complexes.

The forces that provide stability and specificity to the three-dimensional structures of macromolecules and supramolecular complexes are mostly noncovalent interactions. These interactions, individually weak but collectively strong, include hydrogen bonds, ionic interactions among charged groups, van der Waals interactions, and hydrophobic interactions among nonpolar groups. These weak interactions are transient; individually they form and break in small fractions of a second. The transient nature of noncovalent interactions confers a flexibility on macromolecules that is critical to their function. Furthermore, the large number of noncovalent interactions in a single macromolecule makes it unlikely that at any given moment all the interactions will be broken; thus macromolecular structures are stable over time.

Three-dimensional biological structures combine the properties of flexibility and stability.

The flexibility and stability of the double-helical structure of DNA are due to the complementarity of its two strands and the many weak interactions between them. The flexibility of these interactions allows strand separation during DNA replication (see Fig. 1–16); the complementarity of the double helix is essential to genetic continuity.

Noncovalent interactions are also central to the specificity and catalytic efficiency of enzymes. Enzymes bind transition-state intermediates through numerous weak but precisely oriented interactions. Because the weak interactions are flexible, the complex survives the structural distortions as the reactant is converted into product.

The formation of noncovalent interactions provides the energy for self-assembly of macromolecules by stabilizing native conformations relative to unfolded, random forms. The native conformation of a protein is that in which the energetic advantages of forming weak interactions counterbalance the tendency of the protein chain to assume random forms. Given a specific linear sequence of amino acids and a specific set of conditions (temperature, ionic conditions, pH), a protein will assume its native conformation spontaneously, without a template or scaffold to direct the folding.

We can now summarize the various principles of the molecular logic of life:

A living cell is a self-contained, self-assembling, self-adjusting, self-perpetuating isothermal system of molecules that extracts free energy and raw materials from its environment.

The cell carries out many consecutive reactions promoted by specific catalysts, called enzymes, which it produces itself.

The cell maintains itself in a dynamic steady state, far from equilibrium with its surroundings. There is great economy of parts and processes, achieved by regulation of the catalytic activity of key enzymes.

Self-replication through many generations is ensured by the self-repairing, linear information-coding system. Genetic information encoded as sequences of nucleotide subunits in DNA and RNA specifies the sequence of amino acids in each distinct protein, which ultimately determines the three-dimensional structure and function of each protein.

Many weak (noncovalent) interactions, acting cooperatively, stabilize the three-dimensional structures of biomolecules and supramolecular complexes.

At no point in our examination of the molecular logic of living cells have we encountered any violation of known physical laws; nor have we needed to define new physical laws. The organic machinery of living cells functions within the same set of laws that governs the operation of inanimate machines, but the chemical reactions and regulatory processes of cells have been highly refined during evolution.

This set of principles has been most thoroughly validated in studies of unicellular organisms (such as the bacterium E. coli), which are exceptionally amenable to biochemical and genetic study. Although multicellular organisms must solve certain problems not encountered by unicellular organisms, such as the differentiation of the fertilized egg into specialized cell types, the same principles have been found to apply. Can such simple and mechanical statements apply to humans as well, with their extraordinary capacity for thought, language, and creativity? The pace of recent biochemical progress toward understanding such processes as gene regulation, cellular differentiation, communication among cells, and neural function has been extraordinarily fast, and is accelerating. The success of biochemical methods in solving and redefining these problems justifies the hope that the most complex functions of the most highly developed organism will eventually be explicable in molecular terms.

The relevant facts of biochemistry are many; the student approaching this subject for the first time may occasionally feel overwhelmed. Perhaps the most encouraging development in twentieth-

century biology is the realization that, for all of the enormous diversity in the biological world, there is a fundamental unity and simplicity to life. The organizing principles, the biochemical unity, and the evolutionary perspective of diversity, provided at the molecular level, will serve as helpful frames of reference for the study of biochemistry.
Further Reading

Asimov, I. (1962) Life and Energy: An Exploration of the Physical and Chemical Basis of Modern Biology, Doubleday & Co., Inc., New York. 
An engaging account of the role of energy transformations in biology, written for the intelligent layman.

Blum, H.F. (1968) Time’s Arrow and Evolution, 3rd edn, Princeton University Press, Princeton, NJ. 
An excellent discussion of the way the second law of thermodynamics has influenced biological evolution.

Dulbecco, R. (1987) The Design of Life, Yale University Press, New Haven, CT. 
An unusual and excellent introduction to biology.

Fruton, J.S. (1972) Molecules and Life. Historical Essays on the Interplay of Chemistry and Biology, Wiley-Interscience, New York. 
This series of essays describes the development of biochemistry from Pasteur’s studies of fermentation to the present studies of metabolism and information transfer. You may want to refer to these essays through this textbook.

Fruton, J.S. (1992) A Skeptical Biochemist, Harvard University Press, Cambridge, MA. 

Hawking, S. (1988) A Brief History of Time, Bantam Books, Inc., New York. 

Jacob, F. (1973) The Logic of Life: A History of Heredity, Pantheon Books, Inc., New York. Originally published (1970) as La logique du vivant: une histoire de l’hérédité, Editions Gallimard, Paris. 
A fascinating historical and philosophical account of the route by which we came to the present molecular understanding of life.

Kornberg, A. (1987) The two cultures: chemistry and biology. Biochem. 26, 6888–6891. 
The importance of applying chemical tools to biological problems, described by an eminent practitioner.

Monod, J. (1971) Chance and Necessity, Alfred A. Knopf, Inc., New York. [Paperback version (1972) Vintage Books, New York.] Originally published (1970) as Le hasard et la necessité, Editions du Seuil, Paris. 
An exploration of the philosophical implications of biological knowledge.

Schrödinger, E. (1944) What is Life? Cambridge University Press, New York. [Reprinted (1956) in What is Life? and Other Scientific Essays, Doubleday Anchor Books, Garden City, NY.] 
A thought-provoking look at life, written by a prominent physical chemist.

www.bioinfo.org.cn/book/biochemistry/chapt01/bio6.htm
Figure 2–1  The universal features of all living cells: a nucleus or nucleoid, a plasma membrane, and cytoplasm.
The cytosol is that portion of the cytoplasm that remains in the supernatant after centrifugation of a cell extract of 150,000 g for 1 h.
Chapter 2
Cells

Cells are the structural and functional units of all living organisms. The smallest organisms consist of single cells and are microscopic, whereas larger organisms are multicellular. The human body, for example, contains at least 1014 cells. Unicellular organisms are found in great variety throughout virtually every environment from Antarctica to hot springs to the inner recesses of larger organisms. Multicellular organisms contain many different types of cells, which vary in size, shape, and specialized function. Yet no matter how large and complex the organism, each of its cells retains some individuality and independence.

Despite their many differences, cells of all kinds share certain structural features (Fig. 2–1). The plasma membrane defines the periphery of the cell, separating its contents from the surroundings. It is composed of enormous numbers of lipids and protein molecules, held together primarily by noncovalent hydrophobic interactions (p. 18), forming a thin, tough, pliable, hydrophobic bilayer around the cell. The membrane is a barrier to the free passage of inorganic ions and most other charged or polar compounds; transport proteins in the plasma membrane allow the passage of certain ions and molecules. Other membrane proteins are receptors that transmit signals from the outside to the inside of the cell, or are enzymes that participate in membrane-associated reaction pathways.

Because the individual lipid and protein subunits of the plasma membrane are not covalently linked, the entire structure is remarkably flexible, allowing changes in the shape and size of the cell. As a cell grows, newly made lipid and protein molecules are inserted into its plasma membrane; cell division produces two cells, each with its own membrane. Growth and fission occur without loss of membrane integrity. In a reversal of the fission process, two separate membrane surfaces can fuse, also without loss of integrity. Membrane fusion and fission are central to mechanisms of transport known as endocytosis and exocytosis.

The internal volume bounded by the plasma membrane, the cytoplasm, is composed of an aqueous solution, the cytosol, and a variety of insoluble, suspended particles (Fig. 2–1). The cytosol is not simply a dilute aqueous solution; it has a complex composition and gel-like consistency. Dissolved in the cytosol are many enzymes and the RNA molecules that encode them; the monomeric subunits (amino acids and nucleotides) from which these macromolecules are assembled; hundreds of small organic molecules called metabolites, intermediates in biosynthetic and degradative pathways; coenzymes, compounds of

intermediate molecular weight (Mr 200 to 1,000) that are essential participants in many enzyme-catalyzed reactions; and inorganic ions.

Among the particles suspended in the cytosol are supramolecular complexes and, in higher organisms but not in bacteria, a variety of membrane-bounded organelles in which specialized metabolic machinery is localized. Ribosomes, complexes of over 50 different protein and RNA molecules, are small particles, 18 to 22 nm in diameter. Ribosomes are the enzymatic machines on which protein synthesis occurs; they often occur in clusters called polysomes (polyribosomes) held together by a strand of messenger RNA. Also present in the cytoplasm of many cells are granules containing stored nutrients such as starch and fat. Nearly all living cells have either a nucleus or a nucleoid, in which the genome (the complete set of genes, composed of DNA) is stored and replicated. The DNA molecules are always very much longer than the cells themselves, and are tightly folded and packed within the nucleus or nucleoid as supramolecular complexes of DNA with specific proteins. The bacterial nucleoid is not separated from the cytoplasm by a membrane, but in higher organisms, the nuclear material is enclosed within a double membrane, the nuclear envelope. Cells with nuclear envelopes are called eukaryotes (Greek eu, "true," and karyon, "nucleus"); those without nuclear envelopes – bacterial cells – are prokaryotes (Greek pro, "before"). Unlike bacteria, eukaryotes have a variety of other membrane-bounded organelles in their cytoplasm, including mitochondria, lysosomes, endoplasmic reticulum, Golgi complexes, and, in photosynthetic cells, chloroplasts.

In this chapter we review briefly the evolutionary relationships among some commonly studied cells and organisms, and the structural features that distinguish cells of various types. Our main focus is on eukaryotic cells. Also discussed in brief are the cellular parasites known as viruses.

Figure 2–2  Smaller cells have larger ratios of surface area to volume, and their interiors are therefore more accessible to substances diffusing into the cell through the surface. When the large cube (representing a large cell) is subdivided into many smaller cubes (cells), the total surface area increases greatly without a change in the total volume, and the surface-to-volume ratio increases accordingly.
Figure 2–3  Convolutions of the plasma membrane, or long, thin extensions of the cytoplasm, increase the surface-to-volume ratio of cells. (a) Cells of the intestinal mucosa (the inner lining of the small intestine) are covered with microvilli, increasing the area for absorption of nutrients from the intestine. (b) Neurons of the hippocampus of the rat brain are several millimeters long, but the long extensions (axons) are only about 10 nm wide.

Most cells are of microscopic size. Animal and plant cells are typically 10 to 30 μm in diameter, and many bacteria are only 1 to 2 μm long.

What limits the dimensions of a cell? The lower limit is probably set by the minimum number of each of the different biomolecules required by the cell. The smallest complete cells, certain bacteria known collectively as mycoplasma, are 300 nm in diameter and have a volume of about 10-14 mL. A single ribosome is about 20 nm in its longest dimension, so a few ribosomes take up a substantial fraction of the cell’s volume. In a cell of this size, a 1 μM solution of a compound represents only 6,000 molecules.

The upper limit of cell size is set by the rate of diffusion of solute molecules in aqueous systems. The availability of fuels and essential nutrients from the surrounding medium is sometimes limited by the rate of their diffusion to all regions of the cell. A bacterial cell that depends upon oxygen-consuming reactions for energy production (an aerobic cell) must obtain molecular oxygen (O2) from the surrounding medium by diffusion through its plasma membrane. The cell is so small, and the ratio of its surface area to its volume is so large, that every part of its cytoplasm is easily reached by O2 diffusing into the cell. As the size of a cell increases, its surface-to-volume ratio decreases (Fig. 2–2), until metabolism consumes O2 faster than diffusion can

supply it. Aerobic metabolism thus becomes impossible as cell size increases beyond a certain point, placing a theoretical upper limit on the size of the aerobic cell.

There are interesting exceptions to this generalization that cells must be small. The giant alga Nitella has cells several centimeters long. To assure the delivery of nutrients, metabolites, and genetic information (RNA) to all of its parts, each cell is vigorously “stirred” by active cytoplasmic streaming (p. 43). The shape of a cell can also help to compensate for its large size. A smooth sphere has the smallest surface-to-volume ratio possible for a given volume. Many large cells, although roughly spherical, have highly convoluted surfaces (Fig. 2–3a), creating larger surface areas for the same volume and thus facilitating the uptake of fuels and nutrients and release of waste products to the surrounding medium. Other large cells (neurons, for example) have large surface-to-volume ratios because they are long and thin, star-shaped, or highly branched (Fig. 2–3b), rather than spherical.

Because all living cells have evolved from the same progenitors, they share certain fundamental similarities. Careful biochemical study of just a few cells, however different in biochemical details and varied in superficial appearance, ought to yield general principles applicable to all cells and organisms. The burgeoning knowledge in biology in the past 150 years has supported these propositions over and over again. Certain cells, tissues, and organisms have proved more amenable to experimental studies than others. Knowledge in biochemistry, and much of the information in this book, continues to be derived from a few representative tissues and organisms, such as the bacterium Escherichia coli, the yeast Saccharomyces, photosynthetic algae, spinach leaves, the rat liver, and the skeletal muscle of several different vertebrates.

In the isolation of enzymes and other cellular components, it is ideal if the experimenter can begin with a plentiful and homogeneous source of the material. The component of interest (such as an enzyme or nucleic acid) often represents only a miniscule fraction of the total material, and many grams of starting material are needed to obtain a few micrograms of the purified component. Certain types of physical and chemical studies of biomolecules are precluded if only microgram quantities of the pure substance are available. A homogeneous source of an enzyme or nucleic acid, in which all of the cells are genetically and biochemically identical, leaves no doubt about which cell type yielded the purified component, and makes it safer to extrapolate the results of in vitro studies to the situation in vivo. A large culture of bacterial or protistan cells (E. coli, Saccharomyces, or Chlamydomonas, for example), all derived by division from the same parent and therefore genetically identical, meets the requirement for a plentiful and homogeneous source. Individual tissues from laboratory animals (rat liver, pig brain, rabbit muscle) are plentiful sources of similar, though not identical; cells. Some animal and plant cells proliferate in cell culture, producing populations of identical (cloned) cells in quantities suitable for biochemical analysis.

Genetic mutants, in which a defect in a single gene produces a specific functional defect in the cell or organism, are extremely useful in establishing that a certain cellular component is essential to a particular cellular function. Because it is technically much simpler to produce and detect mutants in bacteria and yeast, these organisms (E. coli and Saccharomyces cerevisiae, for example) have been favorite experimental targets for biochemical geneticists.

An organism that is easy to culture in the laboratory, with a short generation time, offers significant advantages to the research biochemist. An organism that requires only a few simple precursor molecules in its growth medium can be cultured in the presence of a radioisotopically labeled precursor, and the metabolic fate of that precursor can then be conveniently traced by following the incorporation of the radioactive atoms into its metabolic products. The short generation time (minutes or hours) of microorganisms allows the investigator to follow a labeled precursor or a genetic defect through many generations in a few days. In higher organisms with generation times of months or years, this is virtually impossible.

Some highly specialized tissues of multicellular organisms are

remarkably enriched in some particular component related to their specialized function. Vertebrate skeletal muscle is a rich source of actin and myosin; pancreatic secretory cells contain high concentrations of rough endoplasmic reticulum; sperm cells are rich in DNA and in flagellar proteins; liver (the major biosynthetic organ of vertebrates) contains high concentrations of many enzymes of biosynthetic pathways; spinach leaves contain large numbers of chloroplasts; and so on. For studies on such specific components or processes, biochemists commonly choose a specialized tissue for their experimental systems.

Sometimes simplicity of structure or function makes a particular cell or organism attractive as an experimental system. For studies of plasma membrane structure and function, the mature erythrocyte (red blood cell) has been a favorite; it has no internal membranes to complicate purification of the plasma membrane. Some bacterial viruses (bacteriophages) have few genes. Their DNA molecules are therefore smaller and much simpler than those of humans or corn plants, and it has proved easier to study replication in these viruses than in human or corn chromosomes.

The biochemical description of living cells in this book is a composite, based on studies of many types of cells. The biochemist must always exercise caution in generalizing from results obtained in studies of selected cells, tissues, and organisms, and in relating what is observed in vitro to what happens within the living cell.

Figure 2–4  Organisms can be classified according to their source of energy (shaded red) and the form in which they obtain carbon atoms (shaded blue) for the synthesis of cellular material. Organic compounds are both energy source and carbon source for chemoheterotrophs such as ourselves. Some, but not all, chemoheterotrophs consume O2 and produce CO2, and some photoautotrophs produce O2 (shaded green).
Figure 2–5  Landmarks in the evolution of life on earth.

All of the organisms alive today are believed to have evolved from ancient, unicellular progenitors. Two large groups of extant prokaryotes evolved from these early forms: archaebacteria (Greek, arché, "origin") and eubacteria. Eubacteria inhabit the soil, surface waters, and the tissues of other living or decaying organisms. Most common and well-studied bacteria, including E. coli and the cyanobacteria (formerly called blue–green algae), are eubacteria. The archaebacteria are more recently discovered and less well studied. They inhabit more extreme environments – salt brines, hot acid springs, bogs, and the deep regions of the ocean.

Within each of these two large groups of bacteria are subgroups distinguished by the habitats to which they are best adapted. In some habitats there is a plentiful supply of oxygen, and the resident organisms live by aerobic metabolism; their catabolic processes ultimately result in the transfer of electrons from fuel molecules to oxygen. Other environments are virtually devoid of oxygen, forcing resident organisms to conduct their catabolic business without it. Many of the organisms that have evolved in these anaerobic environments are obligate anaerobes; they die when exposed to oxygen.

All organisms, including bacteria, can be classified as either chemotrophs (those obtaining their energy from a chemical fuel) or phototrophs (those using sunlight as their primary energy source). Certain organisms can synthesize some or all of their monomeric subunits, metabolic intermediates, and macromolecules from very simple starting materials such as CO2 and NH3; these are the autotrophs. Others must acquire some of their nutrients from the environment preformed (by autotrophic organisms, for example); these are heterotrophs. There are therefore four general modes of obtaining fuel and energy, and four general groups of organisms distinguished by these

modes: chemoheterotrophs, chemoautotrophs, photoheterotrophs, and photoautotrophs (Fig. 2–4).

As shown in Figure 2–5, the earliest cells probably arose about 3.5 billion (3.5 × 109) years ago in the rich mixture of organic compounds, the “primordial soup”, of prebiotic times; they were almost certainly chemoheterotrophs. The organic compounds were originally synthesized from such components of the early earth’s atmosphere as CO, CO2, N2, and CH4 by the nonbiological actions of volcanic heat and lightning (Chapter 3). Primitive heterotrophs gradually acquired the capability to derive energy from certain compounds in their environment and to use that energy to synthesize more and more of their own precursor molecules, thereby becoming less dependent on outside sources of these molecules – less extremely heterotrophic. A very significant evolutionary event was the development of pigments capable of capturing visible light from the sun and using the energy to reduce or “fix” CO2 into more complex organic compounds. The original electron (hydrogen) donor for these photosynthetic organisms was probably H2S, yielding elemental sulfur as the byproduct, but at some point cells developed the enzymatic capacity to use H2O as the electron donor in photosynthetic reactions, producing O2. The cyanobacteria are the modern descendants of these early photosynthetic O2 producers.

The atmosphere of the earth in the earliest stages of biological evolution was nearly devoid of O2, and the earliest cells were therefore anaerobic. With the rise of O2-producing photosynthetic cells, the earth’s atmosphere became progressively richer in O2, allowing the evolution of aerobic organisms, which obtained energy by passing electrons from fuel molecules to O2 (that is, by oxidizing organic compounds). Because electron transfers involving O2 yield energy (they are very exergonic; see Chapter 1), aerobic organisms enjoyed an energetic advantage over their anaerobic counterparts when both competed in an environment containing O2. This advantage translated into the predominance of aerobic organisms in O2-rich environments.

Modern bacteria inhabit almost every ecological niche in the biosphere, and there are bacterial species capable of using virtually every type of organic compound as a source of carbon and energy. Perhaps three-fourths of all the living matter on the earth consists of microscopic organisms, most of them bacteria.

Bacteria play an important role in the biological exchanges of matter and energy. Photosynthetic bacteria in both fresh and marine waters trap solar energy and use it to generate carbohydrates and other cell materials, which are in turn used as food by other forms of life. Some bacteria can capture molecular nitrogen (N2) from the atmosphere and use it to form biologically useful nitrogenous compounds, a process known as nitrogen fixation. Because animals and most plants cannot do this, bacteria form the starting point of many food chains in the biosphere. They also participate as ultimate consumers, degrading the organic structures of dead plants and animals and recycling the end products to the environment.

www.bioinfo.org.cn/book/biochemistry/chapt02/bio1.htm
Figure 2–6  Common structural features of bacterial cells. Because of differences in cell envelope structure, some eubacteria (gram-positive bacteria) retain Gram’s stain, and others (gram-negative bacteria) do not. E. coli is gram-negative. Cyanobacteria are also eubacteria, but are distinguished by their extensive internal membrane system, in which photosynthetic pigments are localized.

Bacterial cells share certain common structural features, but also show group-specific specializations (Fig. 2–6). E. coli is a usually harmless inhabitant of the intestinal tract of human beings and many other mammals. The E. coli cell is about 2 μm long and a little less than 1 μm in diameter. It has a protective outer membrane and an inner plasma membrane that encloses the cytoplasm and the nucleoid. Between the inner and outer membranes is a thin but strong layer of peptidoglycans (sugar polymers cross-linked by amino acids), which gives the cell its shape and rigidity. The plasma membrane and the layers outside it constitute the cell envelope. Differences in the cell envelope account for the different affinities for the dye Gentian violet, which is the basis for Gram’s stain; gram-positive bacteria retain the dye, and gram-negative bacteria do not. The outer membrane of E. coli, like that of other gram-negative eubacteria, is similar to the plasma membrane in structure but is different in composition. In gram-positive bacteria (Bacillus subtilis and Staphylococcus aureus, for example) there is no outer membrane, and the peptidoglycan layer surrounding the plasma membrane is much thicker than that in gram-negative bacteria. The plasma membranes of eubacteria consist of a thin bilayer of lipid molecules penetrated by proteins. Archaebacterial membranes have a similar architecture, although their lipids differ from those of the eubacteria.

The plasma membrane contains proteins capable of transporting certain ions and compounds into the cell and carrying products and waste out. Also in the plasma membrane of most eubacteria are electron-carrying proteins (cytochromes) essential in the formation of ATP from ADP (Chapter 1). In the photosynthetic bacteria, internal membranes derived from the plasma membrane contain chlorophyll and other light-trapping pigments.

From the outer membrane of E. coli cells and some other eubacteria protrude short, hairlike structures called pili, by which cells adhere to the surfaces of other cells. Strains of E. coli and other motile bacteria have one or more long flagella, which can propel the bacterium through its aqueous surroundings. Bacterial flagella are thin, rigid, helical rods, 10 to 20 nm thick. They are attached to a protein structure that spins in the plane of the cell surface, rotating the flagellum.

The cytoplasm of E. coli contains about 15,000 ribosomes, thousands of copies of each of several thousand different enzymes, numerous metabolites and cofactors, and a variety of inorganic ions. Under some conditions, granules of polysaccharides or droplets of lipid accumulate. The nucleoid contains a single, circular molecule of DNA. Although the DNA molecule of an E. coli cell is almost 1,000 times longer

than the cell itself, it is packaged with proteins and tightly folded into the nucleoid, which is less than 1 μm in its longest dimension. As in all bacteria, no membrane surrounds the genetic material. In addition to the DNA in the nucleoid, the cytoplasm of most bacteria contains many smaller, circular segments of DNA called plasmids. These nonessential segments of DNA are especially amenable to experimental manipulation and are extremely useful to the molecular geneticist. In nature, some plasmids confer resistance to toxins and antibiotics in the environment.

There is a primitive division of labor within the bacterial cell. The cell envelope regulates the flow of materials into and out of the cell, and protects the cell from noxious environmental agents. The plasma membrane and the cytoplasm contain a variety of enzymes essential to energy metabolism and the synthesis of precursor molecules; the ribosomes manufacture proteins; and the nucleoid stores and transmits genetic information. Most bacteria lead existences that are nearly independent of other cells, but some bacterial species tend to associate in clusters or filaments, and a few (the myxobacteria, for example) demonstrate primitive social behavior. Only eukaryotic cells, however, form true multicellular organisms with a division of labor among cell types.

Fossils older than 1.5 billion years are limited to those from small and relatively simple organisms, similar in size and shape to modern prokaryotes. Starting about 1.5 billion years ago, the fossil record begins to show evidence of larger and more complex organisms, probably the earliest eukaryotic cells (see Fig. 2–5). Details of the evolutionary path from prokaryotes to eukaryotes cannot be deduced from the fossil record alone, but morphological and biochemical comparison of modern organisms has suggested a reasonable sequence of events consistent with the fossil evidence.

Figure 2–7  One view of how modern plants, animals, fungi, protists, and bacteria share a common evolutionary precursor.

Three major changes must have occurred as prokaryotes gave rise to eukaryotes (Fig. 2–7). First, as cells acquired more DNA (Table 2–1), mechanisms evolved to fold it compactly into discrete complexes with specific proteins and to divide it equally between daughter cells at cell division. These DNA-protein complexes, chromosomes, (Greek chroma, "color" and soma, "body"), become especially compact at the time of cell division, when they can be visualized with the light microscope as threads of chromatin. Second, as cells became larger, a system of intracellular membranes developed, including a double membrane surrounding the DNA. This membrane segregated the nuclear process of RNA synthesis using a DNA template from the cytoplasmic process of protein synthesis on ribosomes. Finally, primitive eukaryotic cells, which were incapable of photosynthesis or of aerobic metabolism, pooled their assets with those of aerobic bacteria or photosynthetic bacteria to form symbiotic associations that became permanent. Some aerobic bacteria evolved into the mitochondria of modern eukaryotes, and some photosynthetic cyanobacteria became the chloroplasts of modern plant cells. Prokaryotic and eukaryotic cells are compared in Table 2–2.

With the rise of primitive eukaryotic cells, further evolution led to a tremendous diversity of unicellular eukaryotic organisms (protists). Some of these (those with chloroplasts) resembled modern photosynthetic protists such as Euglena and Chlamydomonas; other, nonphotosynthetic protists were more like Paramecium or Dictyostelium. Unicellular eukaryotes are abundant, and the cells of all multicellular animals, plants, and fungi are eukaryotic; there are only a few thousand prokaryotic species, but millions of species of eukaryotic organisms.

Figure 2–8  Schematic illustration of the two types of eukaryotic cell: a representative animal cell (a) and a representative plant cell (b).

Typical eukaryotic cells (Fig. 2–8) are much larger than prokaryotic cells – commonly 10 to 30 μm in diameter, with cell volumes 1,000 to 10,000 times larger than those of bacteria. The distinguishing characteristic of eukaryotes is the nucleus with a complex internal structure, surrounded by a double membrane. The other striking difference between eukaryotes and prokaryotes is that eukaryotes contain a number of other membrane-bounded organelles. The following sections describe the structures and roles of the components of eukaryotic cells in more detail.

Figure 2–9  Proteins in the plasma membrane serve as transporters, signal receptors, and ion channels. Extracellular signals are amplified by receptors, because binding of a single ligand molecule to the surface receptor causes many molecules of an intracellular signal molecule to be formed, or many ions to flow through the opened channel. Transporters carry substances into and out of the cell, but do not act as signal amplifiers.

The external surface of a cell is in contact with other cells, the extracellular fluid, and the solutes, nutrient molecules, hormones, neurotransmitters, and antigens in that fluid. The plasma membranes of all cells contain a variety of transporters, proteins that span the width of the membrane and carry nutrients into and waste products out of the cell. Cells also have surface membrane proteins (signal receptors) that present highly specific binding sites for extracellular signaling molecules (receptor ligands). When an external ligand binds to its specific receptor, the receptor protein transduces the signal carried by that ligand into an intracellular message (Fig. 2–9). For example, some surface receptors are associated with ion channels that open when the receptor is occupied; others span the membrane and activate or inhibit cellular enzymes on the inner membrane surface. Whatever the mode of signal transduction, surface receptors characteristically act as signal amplifiers – a single ligand molecule bound to a single receptor may cause the flux of thousands of ions through an opened channel, or the synthesis of thousands of molecules of an intracellular messenger molecule by an activated enzyme.

Some surface receptors recognize ligands of low molecular weight, and others recognize macromolecules. For example, binding of acetylcholine (Mr 146) to its receptor begins a cascade of cellular events that underlie the transmission of signals for muscle contraction. Blood proteins (Mr > 20,000) that carry lipids (lipoproteins) are recognized by specific cell surface receptors and then transported into the cells. Antigens (proteins, viruses, or bacteria, recognized by the immune system as foreign) bind to specific receptors and trigger the production of antibodies. During the development of multicellular organisms, neighboring cells influence each other’s developmental paths, as signal molecules from one cell type react with receptors of other cells. Thus the surface membrane of a cell is a complex mosaic of different kinds of highly specific “molecular antennae” through which cells receive, amplify, and react to external signals.

Most cells of higher plants have a cell wall outside the plasma membrane (Fig. 2–8b), which serves as a rigid, protective shell. The cell wall, composed of cellulose and other carbohydrate polymers, is thick but porous. It allows water and small molecules to pass readily, but swelling of the cell due to the accumulation of water is resisted by the rigidity of the wall.

www.bioinfo.org.cn/book/biochemistry/chapt02/bio2.htm

Endocytosis is a mechanism for transporting components of the surrounding medium deep into the cytoplasm. In this process (Fig. 2–10), a region of the plasma membrane invaginates, enclosing a small volume of extracellular fluid within a bud that pinches off inside the cell by membrane fission. The resulting small vesicle (endosome) can move into the interior of the cell, delivering its contents to another organelle bounded by a single membrane (a lysosome, for example; see p. 34) by fusion of the two membranes. The endosome thus serves as an intracellular extension of the plasma membrane, effectively allowing intimate contact between components of the extracellular medium and regions deep within the cytoplasm, which could not be reached by diffusion alone. Phagocytosis is a special case of endocytosis, in which the material carried into the cell (within a phagosome) is particulate, such as a cell fragment or even another, smaller cell. The inverse of endocytosis is exocytosis (Fig. 2–10), in which a vesicle in the cytoplasm moves to the inside surface of the plasma membrane and fuses with it, releasing the vesicular contents outside the membrane. Many proteins destined for secretion into the extracellular space are released by exocytosis after being packaged into secretory vesicles.

Figure 2–10  The endomembrane system includes the nuclear envelope, endoplasmic reticulum, Golgi complex, and several types of small vesicles. This system encloses a compartment (the lumen) distinct from the cytosol. Contents of the lumen move from one region of the endomembrane system to another as small transport vesicles bud from one component and fuse with another. High-magnification electron micrographs of a sectioned cell show rough endoplasmic reticulum, studded with ribosomes, smooth endoplasmic reticulum, and the Golgi complex.
     The endomembrane system is dynamic; newly synthesized proteins move into the lumen of the rough endoplasmic reticulum and thus to the smooth endoplasmic reticulum, then to the Golgi complex via transport vesicles. In the Golgi complex, molecular "addresses" are added to specific proteins to direct them to the cell surface, lysosomes, or secretory vesicles. The contents of secretory vesicles are released from the cell by exocytosis. Endocytosis and phagocytosis bring extracellular materials into the cell. Fusion of endosomes (or phagosomes) with lysosomes, which are full of digestive enzymes, results in the degradation of the extracellular materials.

The small transport vesicles moving to and from the plasma membrane in exocytosis and endocytosis are parts of a dynamic system of intracellular membranes (Fig. 2–10), which includes the endoplasmic reticulum, the Golgi complexes, the nuclear envelope, and a variety of small vesicles such as lysosomes and peroxisomes. Although generally represented as discrete and static elements, these structures are in fact in constant flux, with membrane vesicles continually budding from one of the structures and moving to and merging with another.

The endoplasmic reticulum is a highly convoluted, three-dimensional network of membrane-enclosed spaces extending throughout the cytoplasm and enclosing a subcellular compartment (the lumen of the endoplasmic reticulum) separate from the cytoplasm. The many flattened branches (cisternae) of this compartment are continuous with each other and with the nuclear envelope. In cells specialized for the secretion of proteins into the extracellular space, such as the pancreatic cells that secrete the hormone insulin, the endoplasmic reticulum is particularly prominent. The ribosomes that synthesize proteins destined for export attach to the outer (cytoplasmic) surface of the endoplasmic reticulum, and the secretory proteins are passed through the membrane into the lumen as they are synthesized. Proteins destined for sequestration within lysosomes, or for insertion into the nuclear or plasma membranes, are also synthesized on ribosomes attached to the endoplasmic reticulum. By contrast, proteins that will remain and function within the cytosol are synthesized on cytoplasmic ribosomes unassociated with the endoplasmic reticulum.

The attachment of thousands of ribosomes (usually in regions of large cisternae) gives the rough endoplasmic reticulum its granular appearance (Fig. 2–10) and thus its name. In other regions of the cell, the endoplasmic reticulum is free of ribosomes. This smooth endoplasmic reticulum, which is physically continuous with the rough

endoplasmic reticulum, is the site of lipid biosynthesis and of a variety of other important processes, including the metabolism of certain drugs and toxic compounds. Smooth endoplasmic reticulum is generally tubular, in contrast to the long, flattened cisternae typical of rough endoplasmic reticulum. In some tissues (skeletal muscle, for example) the endoplasmic reticulum is specialized for the storage and rapid release of calcium ions. Ca2+ release is the trigger for many cellular events, including muscle contraction.

Nearly all eukaryotic cells have characteristic clusters of membrane vesicles called dictyosomes. Several connected dictyosomes constitute a Golgi complex. A Golgi complex (also called Golgi apparatus) is most commonly seen as a stack of flattened membrane vesicles (cisternae) (Fig. 2–10). Near the ends of these cisternae are numerous, much smaller, spherical vesicles (transport vesicles) that bud off the edges of the cisternae.

The Golgi complex is asymmetric, structurally and functionally. The cis side faces the rough endoplasmic reticulum, and the trans side, the plasma membrane; between these are the medial elements. Proteins, during their synthesis on ribosomes bound to the rough endoplasmic reticulum, are inserted into the interior (lumen) of the cisternae. Small membrane vesicles containing the newly synthesized proteins bud from the endoplasmic reticulum and move to the Golgi complex, fusing with the cis side. As the proteins pass through the Golgi complex to the trans side, enzymes in the complex modify the protein molecules by adding sulfate, carbohydrate, or lipid moieties to side chains of certain amino acids. One of the functions of this modification of a newly synthesized protein is to "address" it to its proper destination as it leaves the Golgi complex in a transport vesicle budding from the trans side. Certain proteins are enclosed in secretory vesicles, eventually to be released from the cell by exocytosis. Others are targeted for intracellular organelles such as lysosomes, or for incorporation into the plasma membrane during cell growth.

Lysosomes, found in the cytoplasm of animal cells, are spherical vesicles bounded by a single membrane. They are usually about 1 μm in diameter, about the size of a small bacterium (Fig. 2–10). Lysosomes contain enzymes capable of digesting proteins, polysaccharides, nucleic acids, and lipids. They function as cellular recycling centers for complex molecules brought into the cell by endocytosis, fragments of foreign cells brought in by phagocytosis, or worn-out organelles from the cell’s own cytoplasm. These materials selectively enter the lysosomes by fusion of the lysosomal membrane with endosomes, phagosomes, or defective organelles, and are then degraded to their simple components (amino acids, monosaccharides, fatty acids, etc.), which are released into the cytosol to be recycled into new cellular components or further catabolized.

The degradative enzymes within lysosomes would be harmful if not confined by the lysosomal membrane; they would be free to act on all cellular components. The lysosomal compartment is more acidic (pH ≤ 5) than the cytoplasm (pH ≈ 7); the acidity is due to the action of an ATP-fueled proton pump in the lysosomal membrane. Lysosomal enzymes are much less active at pH 7 than at pH ≤ 5, which provides a second line of defense against destruction of cytosolic macromolecules, should these enzymes escape into the cytosol.

Figure 2–11  The vacuole of a plant cell contains high concentrations of a variety of stored compounds and waste products. Water enters the vacuole by osmosis and increases the vacuolar volume. The resulting turgor pressure forces the cytoplasm out against the cell wall. The rigidity of the cell wall prevents expansion and rupture of the plasma membrane.

Plant cells do not have organelles identical to lysosomes, but their vacuoles carry out similar degradative reactions as well as other functions

not found in animal cells. Growing plant cells contain several small vacuoles, vesicles bounded by a single membrane, which fuse and become one large vacuole in the center of the mature cell (Fig. 2–11; see also Fig. 2–8b). The surrounding membrane, the tonoplast, regulates the entry into the vacuole of ions, metabolites, and cellular structures destined for degradation. In the mature cell, the vacuole may represent as much as 90% of the total cell volume, pressing the cytoplasm into a thin layer between the tonoplast and the plasma membrane. The liquid within the vacuole, the cell sap, contains digestive enzymes that degrade and recycle macromolecular components no longer useful to the cell. In some plant cells, the vacuole contains high concentrations of pigments (anthocyanins) that give the deep purple and red colors to the flowers of roses and geraniums and the fruits of grapes and plums. Like the contents of lysosomes, the cell sap is generally more acidic than the surrounding cytosol. In addition to its role in storage and degradation of cellular components, the vacuole also provides physical support to the plant cell. Water passes into the vacuole by osmosis because of the high solute concentration of the cell sap, creating outward pressure on the cytosol and the cell wall. This turgor pressure within cells stiffens the plant tissue (Fig. 2–11).

Some of the oxidative reactions in the breakdown of amino acids and fats produce free radicals and hydrogen peroxide (H2O2), very reactive chemical species that could damage cellular machinery. To protect the cell from these destructive byproducts, such reactions are segregated within small membrane-bounded vesicles called peroxisomes. The hydrogen peroxide is degraded by catalase, an enzyme present in large quantities in peroxisomes and glyoxysomes; it catalyzes the reaction 2H2O2 → 2H2O + O2.

Glyoxysomes are specialized peroxisomes found in certain plan cells. They contain high concentrations of the enzymes of the glyoxylate cycle, a metabolic pathway unique to plants that allows the conversion of stored fats into carbohydrates during seed germination. Lysosomes, peroxisomes, and glyoxysomes are sometimes referred to collectively as microbodies.
Figure 2–12  The nucleus and nuclear envelope.
(a) Scanning electron micrograph of the surface of the nuclear envelope, showing numerous nuclear pores.
(b) Electron micrograph of the nucleus of the alga Chlamydomonas. The dark body in the center of the nucleus is the nucleolus, and the granular material that fills the rest of the nucleus is chromatin. The nuclear envelope has paired membranes with nuclear pores; two are shown by arrows.
Figure 2–13  Chromosomes are visible in the electron microscope during mitosis. Shown here is one of the 46 human chromosomes. Every chromosome is composed of two chromatids, each consisting of tightly folded chromatin fibers. Each chromatin fiber is in turn formed by the packaging of a DNA molecule wrapped about histone proteins to form a series of nucleosomes.  (Adapted from Becker, W.M. & Deamer, D.W. (1991) The World of the Cell, 2nd edn, Fig. 13–20, The Benjamin/Cummings Publishing Company, Menlo Park, CA.)
Figure 2–14  Mitosis and cell division in animal cells. In the interphase (nondividing) nucleus (a), the chromosomes are in the form of dispersed chromatin. As mitosis begins (b), chromatin condenses into chromosomes and the mitotic spindle begins to form; centrosomes, which typically contain centriole pairs, dictate the orientation of the spindle. The nuclear envelope disintegrates and the nucleolus disappears (c), and the chromosomes align at the center of the cell (d). The chromatids of each chromosome move to opposite poles of the cell, pulled by spindle fibers attached to their centromeres (e), and a nuclear envelope forms around each new set of chromosomes (f). Finally, two daughter cells form by cell division (cytokinesis) (g). Although the same basic process occurs in all eukaryotes, there are differences in details of mitosis in plants, fungi, and protists.

The eukaryotic nucleus is very complex in both its structure and its biological activity, compared with the relatively simple nucleoid of prokaryotes. The nucleus contains nearly all of the cell’s DNA, typically 1,000 times more than is present in a bacterial cell; a small amount of DNA is also present in mitochondria and chloroplasts. The nucleus is surrounded by a nuclear envelope, composed of two membranes separated by a narrow space and continuous with the rough endoplasmic reticulum (Fig. 2–12; see also Fig. 2–10). At intervals the two nuclear membranes are pinched together around openings (nuclear pores), which have a diameter of about 90 nm. Associated with the pores are protein structures (nuclear pore complexes), specific macromolecule transporters that allow only certain molecules to pass between the cytoplasm and the aqueous phase of the nucleus (the nucleoplasm), such as enzymes synthesized in the cytoplasm and required in the nucleoplasm for DNA replication, transcription, or repair. Messenger RNA precursors and associated proteins also pass out of the nucleus through the nuclear pore complexes, to be translated on ribosomes in the cytoplasm; the nucleoplasm contains no ribosomes.

Inside the nucleus is the nucleolus, which appears dense in electron micrographs (Fig. 2–12b) because of its high content of RNA. The nucleolus is a specific region of the nucleus, in which the DNA contains many copies of the genes encoding ribosomal RNA. To produce the large number of ribosomes needed by the cell, these genes are continually copied into RNA (transcribed). The nucleolus is the visible evidence of the transcriptional machinery and the RNA product. Ribosomal RNA produced in the nucleolus passes into the cytoplasm through the nuclear pores. The rest of the nucleus contains chromatin, so called because early microscopists found that it stained brightly with certain dyes. Chromatin consists of DNA and proteins bound tightly to the DNA, and represents the chromosomes, which are decondensed in the interphase (nondividing) nucleus and not individually visible.

Before division of the cell (cytokinesis), nuclear division (mitosis) occurs. The chromatin condenses into discrete bodies, the chromosomes (Fig. 2–13). Cells of each species have a characteristic number of chromosomes with specific sizes and shapes. The protist Tetrahymena has 4; cabbage has 20, humans have 46, and the plant Ophioglossum, about 1,250! Usually each cell has two copies of each chromosome; such cells are called diploid. Gametes (egg and sperm, for example) produced by meiosis (Chapter 24) have only one copy of each chromosome and are called haploid. During sexual reproduction, two haploid gametes combine to regenerate a diploid cell in which each chromosome pair consists of a maternal and a paternal chromosome.

Chromosomes and chromatin are composed of DNA and a family of positively charged proteins, histones, which associate strongly with DNA by ionic interactions with its many negatively charged phosphate groups. About half of the mass of chromatin is DNA and half is histones. When DNA replicates prior to cell division, large quantities of histones are also synthesized to maintain this 1:1 ratio. The histones and DNA associate in complexes called nucleosomes, in which the DNA strand winds around a core of histone molecules (Fig. 2–13). The DNA of a single human chromosome forms about a million nucleosomes; nucleosomes associate to form very regular and compact supramolecular complexes. The resulting chromatin fibers, about 30 nm in diameter, condense further by forming a series of looped regions, which cluster with adjacent looped regions to form the chromosomes visible during cell division. This tight packing of DNA into nucleosomes achieves a remarkable condensation of the DNA molecules. The DNA in the chromosomes of a single diploid human cell would have a combined length of about 2 m if fully stretched as a DNA double helix, but the combined length of all 46 chromosomes is only about 200 nm.

Before the beginning of mitosis, each chromosome is duplicated to form paired, identical chromatids, each of which is a double helix of DNA. During mitosis (Fig. 2–14), the two chromatids move to opposite ends (poles) of the cell, each becoming a new chromosome. Small cylindrical particles called centrioles, composed of the protein tubulin, provide the spatial organization for the migration of chromatids to opposite ends of the dividing cell. To allow the separation of chromatids, the nuclear envelope breaks down, dispersing into membrane vesicles. When the separation of the two sets of chromosomes is complete, a nuclear envelope derived from the endoplasmic reticulum re-forms around each set. Finally, the two halves of the cell are separated by cytokinesis, and each daughter cell has a complete diploid complement of chromosomes. After mitosis is complete the chromosomes decondense to form dispersed chromatin, and the nucleoli, which disappeared early in mitosis, reappear.
Figure 2–15  Structure of a mitochondrion. This electron micrograph of a mitochondrion shows the smooth outer membrane and the numerous infoldings of the inner membrane, called cristae. (Note the extensive rough endoplasmic reticulum surrounding the mitochondrion.)

Mitochondria (singular, mitochondrion) are very conspicuous in the cytoplasm of most eukaryotic cells (Fig. 2–15). These membrane-bounded organelles vary in size, but typically have a diameter of about 1 μm, similar to that of bacterial cells. Mitochondria also vary widely in shape, number, and location, depending on the cell type or tissue function. Most plant and animal cells contain several hundred to a thousand mitochondria. Generally, cells in more metabolically active tissues devote a larger proportion of their volume to mitochondria.

Each mitochondrion has two membranes. The outer membrane is unwrinkled and completely surrounds the organelle. The inner membrane has infoldings called cristae, which give it a large surface area. The inner compartment of mitochondria, the matrix, is a very concentrated aqueous solution of many enzymes and chemical intermediates involved in energy-yielding metabolism. Mitochondria contain many enzymes that together catalyze the oxidation of organic nutrients by molecular oxygen (O2); some of these enzymes are in the matrix and some are embedded in the inner membrane. The chemical energy released in mitochondrial oxidations is used to generate ATP, the major energy-carrying molecule of cells. In aerobic cells, mitochondria are the

principal producers of ATP, which diffuses to all parts of the cell and provides the energy for cellular work.

Unlike other membranous structures such as lysosomes, Golgi complexes, and the nuclear envelope, mitochondria are produced only by division of previously existing mitochondria; each mitochondrion contains its own DNA, RNA, and ribosomes. Mitochondrial DNA codes for certain proteins specific to the mitochondrial inner membrane, but other mitochondrial proteins are encoded in nuclear DNA. This and other evidence supports the theory that mitochondria are the descendants of aerobic bacteria that lived symbiotically with early eukaryotic cells.

Figure 2–16  A chloroplast in a photosynthetic cell. The thylakoids are flattened membranous sacs that contain chlorophyll, the light-harvesting pigment.

Plastids are specialized organelles in the cytoplasm of plants; they have two surrounding membranes. Most conspicuous of the plastids and characteristically present in all green plant cells and eukaryotic algae are the chloroplasts (Fig. 2–16). Like mitochondria, the chloroplasts may be considered power plants, with the important difference that chloroplasts use solar energy, whereas mitochondria use the chemical energy of oxidizable molecules. Pigment molecules in chloroplasts absorb the energy of light and use it to make ATP and, ultimately, to reduce carbon dioxide to form carbohydrates such as starch and sucrose. The photosynthetic process in eukaryotes and in cyanobacteria produces O2 as a byproduct of the light-capturing reactions. Photosynthetic plant cells contain both chloroplasts and mitochondria. Chloroplasts transduce energy only in the light, but mitochondria function independently of light, oxidizing carbohydrates generated by photosynthesis during daylight hours.

Chloroplasts are generally larger (diameter 5 μm) than mitochondria and occur in many different shapes. Because chloroplasts contain a high concentration of the pigment chlorophyll, photosynthetic cells are usually green, but their color depends on the relative amounts of other pigments present. These pigment molecules, which together can absorb light energy over much of the visible spectrum, are localized in the internal membranes of the chloroplast, which form stacks of closed cisternae known as thylakoids (Fig. 2–16). Like mitochondria, chloroplasts contain DNA, RNA, and ribosomes. Chloroplasts appear to have had their evolutionary origin in symbiotic ancestors of the cyanobacteria.

www.bioinfo.org.cn/book/biochemistry/chapt02/bio3.htm
Figure 2–17  A plausible theory for the evolutionary origin of mitochondria and chloroplasts. It is based on a number of striking biochemical and genetic similarities between certain aerobic bacteria and mitochondria, and between certain cyanobacteria and chloroplasts. During the evolution of eukaryotic cells, the invading bacteria became symbiotic with the host cell. Ultimately the cytoplasmic bacteria became the mitochondria and chloroplasts of modern cells.

Several independent lines of evidence suggest that the mitochondria and chloroplasts of modern eukaryotes were derived during evolution from aerobic bacteria and cyanobacteria that took up endosymbiotic residence in early eukaryotic cells (Fig. 2–17; see also Fig. 2–7). Mitochondria are always derived from preexisting mitochondria, and chloroplasts from chloroplasts, by simple fission, just as bacteria multiply by fission. Mitochondria and chloroplasts are in fact semiautonomous; they contain DNA, ribosomes, and the enzymatic machinery to synthesize proteins encoded in their DNA. Sequences in mitochondrial DNA are strikingly similar to sequences in certain aerobic bacteria, and chloroplast DNA shows strong sequence homology with the DNA of certain cyanobacteria. The ribosomes found in mitochondria and chloroplasts are more similar in size, overall structure, and ribosomal RNA sequences to those of bacteria than to those in the cytoplasm of the eukaryotic cell. The enzymes that catalyze protein synthesis in these organelles also resemble those of the bacteria more closely.

If mitochondria and chloroplasts are the descendants of early bacterial endosymbionts, some of the genes present in the original freeliving bacteria must have been transferred into the nuclear DNA of the host eukaryote over the course of evolution. Neither mitochondria nor chloroplasts contain all of the genes necessary to specify all of their proteins. Most of the proteins of both organelles are encoded in nuclear genes, translated on cytoplasmic ribosomes, and subsequently imported into the organelles.
Figure 2–18  The three types of cytoplasmic filaments. The upper panels show epithelial cells photographed after treatment with antibodies that bind to and specifically stain (a) actin filaments bundled together to form "stress fibers", (b) microtubules radiating from the cell center, and (c) intermediate filaments, extending throughout the cytoplasm. For these experiments, antibodies that specifically recognize actin, tubulin, or intermediate filament proteins are covalently attached to a fluorescent compound. When the cell is viewed with a fluorescence microscope, only the stained structures are visible. The lower panels show each type of filament as visualized by electron microscopy.

Several types of protein filaments visible with the electron microscope crisscross the eukaryotic cell, forming an interlocking three-dimensional meshwork throughout the cytoplasm, the cytoskeleton. There are three general types of cytoplasmic filaments: actin filaments, microtubules, and intermediate filaments (Fig. 2–18). They differ in width (from about 6 to 22 nm), composition, and specific function, but all apparently provide structure and organization to the cytoplasm and shape to the cell. Actin filaments and microtubules also help to produce the motion of organelles or of the whole cell.

Each of the cytoskeletal components is composed of simple protein subunits that polymerize to form filaments of uniform thickness. These filaments are not permanent structures; they undergo constant disassembly into their monomeric subunits and reassembly into filaments. Their locations in cells are not rigidly fixed, but may change dramatically with mitosis, cytokinesis, or changes in cell shape. All types of filaments associate with other proteins that cross-link filaments to themselves or to other filaments, influence assembly or disassembly, or move cytoplasmic organelles along the filaments.

Figure 2–19  Individual subunits of actin polymerize to form actin filaments. The protein filamin holds two filaments together where they cross at right angles. The filaments are cross-linked by another protein, fodrin, to form side-by-side aggregates or bundles.

Actin is a protein present in virtually all eukaryotes, from the protists to the vertebrates. In the presence of ATP, the monomeric protein spontaneously associates into linear, helical polymers, 6 to 7 nm in diameter, called actin filaments or microfilaments (Fig. 2–19).

The importance of actin polymerization and depolymerization is clear from the effects of cytochalasins, compounds that bind to actin and block polymerization. Cells treated with a cytochalasin lose actin filaments and their ability to carry out cytokinesis, phagocytosis, and amoeboid movement. However, chromatid separation at mitosis is not affected, ruling out an essential role for actin in this process. Compounds such as cytochalasins, which are naturally occurring poisons or specific toxins, are often very helpful in experimental studies in pinpointing the important participants in a biological process.

Cells contain proteins that bind to actin monomers or filaments and influence the state of actin aggregation (Fig. 2–19). Filamin and fodrin cross-link actin filaments to each other, stabilizing the meshwork and greatly increasing the viscosity of the medium in which the filaments are suspended; a concentrated solution of actin in the presence of filamin is a gel too viscous to pour. Large numbers of actin filaments bound to specific plasma membrane proteins lie just beneath and more or less parallel to the plasma membrane, conferring shape and rigidity on the cell surface.

Figure 2–20  Myosin molecules move along actin filaments using energy from ATP. Cytoplasmic streaming is produced in the giant green alga Nitella as myosin pulls organelles around a track of actin filaments. The chloroplasts of Nitella are located in the layer of stationary cytoplasm that lies between the actin filaments and the cell membrane.

Actin filaments bind to a family of proteins called myosins, enzymes that use the energy of ATP breakdown to move themselves along the actin filament in one direction. The simplest members of this family, such as myosin I, have a globular head and a short tail (Fig. 2–20). The

head binds to and moves along an actin filament, driven by the breakdown of ATP. The tail region binds to the membrane of a cytoplasmic organelle, dragging the organelle behind as the myosin head moves along the actin filament. It appears likely that myosins of this type bind to various organelles, providing specific transport systems to move each type of organelle through the cytoplasm. This motion is readily seen in living cells such as the giant green alga Nitella; endoplasmic reticulum, as well as mitochondria, nucleus, and other membrane-bound organelles and vesicles, move uniformly around the cell at 50 to 75 μm/s in a process called cytoplasmic streaming (Fig. 2–20). This motion has the effect of mixing the cytoplasmic contents of the enormous algal cell much more efficiently than would occur by diffusion alone.

A larger form of myosin is found in muscle cells, and also in the cytoplasm of many nonmuscle cells. This type of myosin also has a globular head that binds to and moves along actin filaments in an ATP-driven reaction, but it has a longer tail, which permits myosin molecules to associate side by side to form thick filaments (see Fig. 7–31). Contractile systems composed of actin and myosin occur in a wide variety of organisms, from slime molds to humans. Actin–myosin complexes form the contractile ring that squeezes the cytoplasm in two during cytokinesis in all eukaryotes. In multicellular animals, muscle cells are filled with highly organized arrays of actin (thin) filaments and myosin (thick) filaments, which produce a coordinated contractile force by ATP-driven sliding of actin filaments past stationary myosin filaments.

Figure 2–21  Microtubules are formed from dimers of the proteins α- and β-tubulin. Colchicine blocks the assembly of microtubules, and can be used to arrest mitosis in cells.
Figure 2–22  Kinesin and dynein are ATP-driven molecular engines that move along microtubular "rails".

Like actin filaments, microtubules form spontaneously from their monomeric subunits, but the polymeric structure of microtubules is slightly more complex. Dimers of α- and β-tubulin form linear polymers (protofilaments), 13 of which associate side by side to form the hollow microtubule, about 22 nm in diameter (Fig. 2–21). Most microtubules undergo continual polymerization and depolymerization in cells by addition of tubulin subunits primarily at one end and dissociation at the other. Microtubules are present throughout the cytoplasm, but are concentrated in specific regions at certain times. For example, when sister chromatids move to opposite poles of a dividing cell during mitosis, a highly organized array of microtubules (the mitotic spindle; Fig. 2–14) provides the framework and probably the motive force for the separation of chromatids. Colchicine, a poisonous alkaloid from meadow saffron, prevents tubulin polymerization. Colchicine treatment reversibly blocks the movement of chromatids during mitosis, demonstrating that microtubules are required for this process.

Microtubules, like actin filaments, associate with a variety of proteins that move along them, form cross-bridges, or influence their state of polymerization. Kinesin and cytoplasmic dynein, proteins found in the cytoplasm of many cells, bind to microtubules and move along them using the energy of ATP to drive their motion (Fig. 2–22). Each protein is capable of associating with specific organelles and pulling them along the microtubule over long distances at rates of about 1μm/s. The beating motion of cilia and eukaryotic flagella also involves dynein and microtubules.

Figure 2–23  Cilia and eukaryotic flagella have the same architecture: nine microtubule doublets surround a central pair of microtubules. Cross section of cilia shows the 9 + 2 arrangement of microtubules.

Cilia and flagella, motile structures extending from the surface of many protists and certain cells of animals and plants, are all constructed on the same microtubule-based architectural plan (Fig. 2–23). (Although they bear the same name, the flagella of bacteria (p. 28) are completely different in structure and in action from the flagella of eukaryotes.) Eukaryotic cilia and flagella, which are sheathed in an extension of the plasma membrane, contain nine fused pairs of microtubules arranged around two central microtubules (the 9 + 2 arrangement; Fig. 2–23). Ciliary and flagellar motion results from the coordinated sliding of outer doublet microtubules relative to their neighbors, driven by ATP. The motions of cilia and flagella propel protists through their surrounding medium, in search of food, or light, or some condition essential to their survival. Sperm are also propelled by flagellar beating. Ciliated cells in tissues such as the trachea and oviduct move extracellular fluids past the surface of the ciliated tissue.

The contraction of skeletal muscle, the propelling action of cilia and flagella, and the intracellular transport of organelles all rely on the same fundamental mechanism: the splitting of ATP by proteins such as kinesin, myosin, and dynein drives sliding motion along microfilaments or microtubules.

The third type of cytoplasmic filament is a family of structures with dimensions (diameter 8 to 10 nm) intermediate between actin filaments and microtubules. Several different types of monomeric protein subunits form intermediate filaments. Some cells contain large amounts of one type; some types of intermediate filament are absent from certain cells; and some cell types apparently lack intermediate filaments altogether. As is the case for actin filaments and microtubules, intermediate filament formation is reversible, and the cytoplasmic distribution of these structures is subject to regulated changes.

The function of intermediate filaments is probably to provide internal mechanical support for the cell and to position its organelles. Vimentin (Mr 57,000) is the monomeric subunit of the intermediate filaments found in the endothelial cells that line blood vessels, and in adipocytes (fat cells). Vimentin fibers appear to anchor the nucleus and fat droplets in specific cellular locations. Intermediate filaments composed of desmin (Mr 55,000) hold the Z disks of striated muscle tissue in place. Neurofilaments are constructed of three different protein subunits (Mr 70,000, 150,000, and 210,000), and provide rigidity to the long extensions (axons) of neurons. In the glial cells that surround neurons, intermediate filaments are constructed from glial fibrillary acidic protein (Mr 50,000).

The intermediate filaments composed of keratins, a family of structural proteins, are particularly prominent in certain epidermal cells of vertebrates, and form covalently cross-linked meshworks that persist even after the cell dies. Hair, fingernails, and feathers are among the structures composed primarily of keratins.

The picture that emerges from this brief survey is of a eukaryotic cell with a cytoplasm crisscrossed by a meshwork of structural fibers, throughout which extends a complex system of membrane-bounded compartments (see Fig. 2–8). Both the filaments and the organelles are dynamic: the filaments disassemble and reassemble elsewhere; membranous vesicles bud from one organelle, move to and join another. Transport vesicles, mitochondria, chloroplasts, and other organelles move through the cytoplasm along protein filaments, drawn by kinesin, cytoplasmic dynein, myosin, and perhaps other similar proteins. Exocytosis and endocytosis provide paths between the cell interior and the surrounding medium, allowing for the secretion of proteins and other components produced within the cell and the uptake of extracellular components. The intracellular membrane systems segregate specific metabolic processes, and provide surfaces on which certain enzyme-catalyzed reactions occur.

Although complex, this organization of the cytoplasm is far from random. The motion and positioning of organelles and cytoskeletal elements are under tight regulation, and at certain stages in a eukaryotic cell’s life, dramatic, finely orchestrated reorganizations occur, such as spindle formation, chromatid migration to the poles, and nuclear envelope disintegration and re-formation during mitosis. The interactions between the cytoskeleton and organelles are noncovalent, reversible, and subject to regulation in response to various intracellular and extracellular signals. Cytoskeletal rearrangements are modulated by Ca2+ and by a variety of proteins.

Figure 2–24  A tissue such as liver is mechanically homogenized to break cells and disperse their contents in an aqueous buffer. The large and small particles in this suspension can be separated by centrifugation at different speeds (a), or particles of different density can be separated by isopycnic centrifugation (b). In isopycnic centrifugation, a centrifuge tube is filled with a solution, the density of which increases from top to bottom; some solute such as sucrose is dissolved at different concentrations to produce this density gradient. When a mixture of organelles is layered on top of the density gradient and the tube is centrifuged at high speed, individual organelles sediment until their buoyant density exactly matches that in the gradient. Each layer can be collected separately.

A major advance in the biochemical study of cells was the development of methods for separating organelles from the cytosol and from each other. In a typical cellular fractionation, cells or tissues are disrupted by gentle homogenization in a medium containing sucrose (about 0.2 M). This treatment ruptures the plasma membrane but leaves most of the organelles intact. (The sucrose creates a medium with an osmotic pressure similar to that within organelles; this prevents diffusion of water into the organelles, which would cause them to swell, burst, and spill their contents.)

Organelles such as nuclei, mitochondria, and lysosomes differ in size and therefore sediment at different rates during centrifugation. They also differ in specific gravity, and they "float" at different levels in a density gradient (Fig. 2–24). Differential centrifugation results in a rough fractionation of the cytoplasmic contents, which may be further purified by isopycnic centrifugation. In this procedure, organelles of different buoyant densities (the result of different ratios of lipid and protein in each type of organelle) are separated on a density gradient. By carefully removing material from each region of the gradient and observing it with a microscope, the biochemist can establish the position of each organelle and obtain purified organelles for further study. In this way it was established, for example, that lysosomes contain degradative enzymes, mitochondria contain oxidative enzymes, and chloroplasts contain photosynthetic pigments. The isolation of an organelle enriched in a certain enzyme is often the first step in the purification of that enzyme.

www.bioinfo.org.cn/book/biochemistry/chapt02/bio4.htm

One of the most effective approaches to understanding a biological process is to study purified individual molecules such as enzymes, nucleic acids, or structural proteins. The purified components are amenable to detailed characterization in vitro; their physical properties and catalytic activities can be studied without "interference" from other molecules present in the intact cell. Although this approach has been remarkably revealing, it must always be remembered that the inside of a

cell is quite different from the inside of a test tube. The "interfering" components eliminated by purification may be critical to the biological function or regulation of the molecule purified. In vitro studies of pure enzymes are commonly done at very low enzyme concentrations in thoroughly stirred aqueous solutions. In the cell, an enzyme is dissolved or suspended in a gel-like cytosol with thousands of other proteins, some of which bind to that enzyme and influence its activity. Within cells, some enzymes are parts of multienzyme complexes in which reactants are channeled from one enzyme to another without ever entering the bulk solvent. Diffusion is hindered in the gel-like cytosol, and the cytosolic composition varies in different regions of the cell. In short, a given molecule may function somewhat differently within the cell than it does in vitro. One of the central challenges of biochemistry is to understand the influences of cellular organization and macromolecular associations on the function of individual enzymes – to understand function in vivo as well as in vitro.
Figure 2–25  A gallery of differentiated cells. (a) Secretory cells of the pancreas, with an extensive endoplasmic reticulum. (b) Portion of a skeletal muscle cell, with organized actin and myosin filaments.
(c) Collenchyma cells of a plant stem. (d) Rabbit sperm cells, with long flagella for motility. (e) Human erythrocyte. (f) Human embryo at the two-celled stage.
Figure 2–26  Three types of junctions between cells.
(a) Tight junctions produce a seal between adjacent cells. (b) Desmosomes, typical of plant cells, weld adjacent cells together and are reinforced by various cytoskeletal elements. (c) Gap junctions allow ions and electric currents to flow between adjacent cells.

All modern unicellular eukaryotes – the protists – contain the organelles and mechanisms that we have described, indicating that these organelles and mechanisms must have evolved relatively early. The protists are extraordinarily versatile. The ciliated protist Paramecium, for example, moves rapidly through its aqueous surroundings by beating its cilia; senses mechanical, chemical, and thermal stimuli from its environment, and responds by changing its path; finds, engulfs, and digests a variety of food organisms, and excretes the indigestible fragments; eliminates excess water that leaks through its membrane; and finds and mates with sexual partners. Nonetheless, being unicellular has its disadvantages. Paramecia probably live out their lives in a very small region of the pond in which they began life, because their motility is limited by the small thrust of their microscopic cilia, and their ability to detect a better environment at a distance is limited by the short range of their sensory apparatus.

At some later stage of evolution, unicellular organisms found it advantageous to cluster together, thereby acquiring greater motility, efficiency, or reproductive success than their free-living single-celled competitors. Further evolution of such clustered organisms led to permanent associations among individual cells and eventually to specialization within the colony – to cellular differentiation.

The advantages of cellular specialization led to the evolution of ever more complex and highly differentiated organisms, in which some cells carried out the sensory functions, others the digestive, photosynthetic, or reproductive functions. Many modern multicellular organisms contain hundreds of different cell types, each specialized for some function that supports the entire organism. Fundamental mechanisms that evolved early have been further refined and embellished through evolution. The simple mechanism responsible for the motion of myosin along actin filaments in slime molds has been conserved and elaborated in vertebrate muscle cells, which are literally filled with actin, myosin, and associated proteins that regulate muscle contraction. The same basic structure and mechanism that underlie the beating motion of cilia in Paramecium and flagella in Chlamydomonas are employed by the highly differentiated vertebrate sperm cell. Figure 2–25 illustrates the range of cellular specializations encountered in multicellular organisms.

The individual cells of a multicellular organism remain delimited by their plasma membranes, but they have developed specialized surface structures for attachment to and communication with each other (Fig. 2–26). At tight junctions, the plasma membranes of adjacent cells are closely apposed, with no extracellular fluid separating them. Desmosomes (occurring only in plant cells) hold two cells together; the small extracellular space between them is filled with fibrous, presumably adhesive, material. Gap junctions provide small, reinforced openings between adjacent cells, through which electric currents, ions, and small molecules can pass. In higher plants, plasmodesmata form channels resembling gap junctions; they provide a path through the cell wall for the movement of small molecules between adjacent cells. Each of these junctions is reinforced by membrane proteins or cytoskeletal filaments. The type of junction(s) between neighboring cells varies from tissue to tissue.

Figure 2–27  Infection of a bacterial cell by a bacteriophage (left), and of an animal cell by a virus (right) results in the formation of many copies of the infecting virus.
Figure 2–28 The structures of several viruses, viewed with the electron microscope. Turnip yellow mosaic virus (small, spherical particles), tobacco mosaic virus (long cylinders), and bacteriophage T4 (shaped like a hand mirror).
Figure 2–29 Human immunodeficiency viruses (HIV), the causative agent of AIDS, leaving an infected T lymphocyte of the immune system.

Viruses are supramolecular complexes that can replicate themselves in appropriate host cells. They consist of a nucleic acid (DNA or RNA) molecule surrounded by a protective shell, or capsid, made up of protein molecules and, in some cases, a membranous envelope. Viruses exist in two states. Outside the host cells that formed them, viruses are simply nonliving particles called virions, which are regular in size, shape, and composition and can be crystallized. Once a virus or its nucleic acid component gains entry into a specific host cell, it becomes an intracellular parasite. The viral nucleic acid carries the genetic message specifying the structure of the intact virion. It diverts the host cell’s enzymes and ribosomes from their normal cellular roles to the manufacture of many new daughter viral particles. As a result, hundreds of progeny viruses may arise from the single virion that infected the host cell (Fig. 2–27). In some host–virus systems, the progeny virions escape through the host cell’s plasma membrane. Other viruses cause cell lysis (membrane breakdown and host cell death) as they are released.

A different type of response results from some viral infections, in which viral DNA becomes integrated into the host’s chromosome and is replicated with the host’s own genes. Integrated viral genes may have little or no effect on the host’s survival, but they often cause profound changes in the host cell’s appearance and activity.

Hundreds of different viruses are known, each more or less specific for a host cell (Table 2–3), which may be an animal, plant, or bacterial cell. Viruses specific for bacteria are known as bacteriophages, or simply phages (Greek phagein, "to eat"). Some viruses contain only one kind of protein in their capsid – the tobacco mosaic virus, for example, a simple plant virus and the first to be crystallized. Other viruses contain dozens or hundreds of different kinds of proteins. Even some of these large and complex viruses have been crystallized, and their detailed molecular structures are known (Fig. 2–28). Viruses differ greatly in size. Bacteriophage ΦX174, one of the smallest, has a diameter of 18 nm. Vaccinia virus is one of the largest; its virions are almost as large as the smallest bacteria. Viruses also differ in shape and complexity of structure. The human immunodeficiency virus (HIV) (Fig. 2–29) is relatively simple in structure, but devastating in action; it causes AIDS.

Table 2–3 summarizes the type and size of the nucleic acid components of a number of viruses. Some viruses are highly pathogenic in humans; for example, those causing poliomyelitis, influenza, herpes, hepatitis, AIDS, the common cold, infectious mononucleosis, shingles, and certain types of cancer.

Biochemistry has profited enormously from the study of viruses, which has provided new information about the structure of the genome, the enzymatic mechanisms of nucleic acid synthesis, and the regulation of the flow of genetic information.

www.bioinfo.org.cn/book/biochemistry/chapt02/bio5.htm
Summary

Cells, the structural and functional units of living organisms, are of microscopic dimensions. Their small size, combined with convolutions of their surfaces, results in high surface-to-volume ratios, facilitating the diffusion of fuels, nutrients, and waste products between the cell and its surroundings. All cells share certain features: DNA containing the genetic information, ribosomes, and a plasma membrane that surrounds the cytoplasm. In eukaryotes the genetic material is surrounded by a nuclear envelope; prokaryotes have no such membrane.

The plasma membrane is a tough, flexible permeability barrier, which contains numerous transporters as well as receptors for a variety of extracellular signals. The cytoplasm consists of the cytosol and organelles. The cytosol is a concentrated solution of proteins, RNA, metabolic intermediates and cofactors, and inorganic ions, in which are suspended various particles. Ribosomes are supramolecular complexes on which protein synthesis occurs; bacterial ribosomes are slightly smaller than those of eukaryotic cells, but are similar in structure and function.

Certain organisms, tissues, and cells offer advantages for biochemical studies. E. coli and yeast can be cultured in large quantities, have short generation times, and are especially amenable to genetic manipulation. The specialized functions of liver, muscle, and fat tissue, and of erythrocytes, make them attractive for the study of specific processes.

The first living cells were prokaryotic and anaerobic; they probably arose about 3.5 billion years ago, when the atmosphere was devoid of oxygen. With the passage of time, biological evolution led to cells capable of photosynthesis, with O2 as a byproduct. As O2 accumulated, prokaryotic cells capable of the aerobic oxidation of fuels evolved. The two major groups of bacteria, eubacteria and archaebacteria, diverged early in evolution. The cell envelope of some types of bacteria includes layers outside the plasma membrane that provide rigidity or protection. Some bacteria have flagella for propulsion. The cytoplasm of bacteria contains no membrane-bounded organelles but does contain ribosomes and granules of nutrients, as well as a nucleoid which contains the cell’s DNA. Some photosynthetic bacteria have extensive intracellular membranes that contain light-capturing pigments.

About 1.5 billion years ago, eukaryotic cells emerged. They were larger than bacteria, and their genetic material was more complex. These early cells established symbiotic relationships with prokaryotes that lived

in their cytoplasm; modern mitochondria and chloroplasts are derived from these early endosymbionts. Mitochondria and chloroplasts are intracellular organelles surrounded by a double membrane. They are the principal sites of ATP synthesis in eukaryotic, aerobic cells. Chloroplasts are found only in photosynthetic organisms, but mitochondria are ubiquitous among eukaryotes.

Modern eukaryotic cells have a complex system of intracellular membranes. This endomembrane system consists of the nuclear envelope, rough and smooth endoplasmic reticulum, the Golgi complex, transport vesicles, lysosomes, and endosomes. Proteins synthesized on ribosomes bound to the rough endoplasmic reticulum pass into the endomembrane system, traveling through the Golgi complex on their way to organelles or to the cell surface, where they are secreted by exocytosis. Endocytosis brings extracellular materials into the cell, where they can be digested by degradative enzymes in the lysosomes. In plants, the central vacuole is the site of degradative processes; it also serves as a storage depot for a variety of side products of metabolism and maintains cell turgor.

The genetic material in eukaryotic cells is organized into chromosomes, highly ordered complexes of DNA and histone proteins. Before cell division (cytokinesis), each chromosome is replicated, and the duplicate chromosomes are separated by the process of mitosis.

The cytoskeleton is an intracellular meshwork of actin filaments, microtubules, and intermediate filaments of several types. The cytoskeleton confers shape on the cell, and reorganization of cytoskeletal filaments results in the shape changes accompanying amoeboid movement and cell division. Intracellular organelles move along filaments of the cytoskeleton, propelled by proteins such as dynein, kinesin, and myosin, using the energy of ATP. Dynein and tubulin are central to the motion and structure of cilia and flagella, and myosin and actin are responsible for the contractile motion of skeletal muscle. The organelles can be separated by differential centrifugation and by isopycnic centrifugation.

In multicellular organisms, there is a division of labor among several types of cells. Individual cells are joined to each other by tight junctions or gap junctions, and (in plants) desmosomes or plasmodesmata. Viruses are parasites of living cells, capable of subverting the cellular machinery for their own replication.

Further Reading

General

Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K., & Watson, J.D. (1989) Molecular Biology of the Cell, 2nd edn, Garland Publishing, Inc., New York. 
A superb textbook on cell structure and function, covering the topics considered in this chapter, and a useful reference for many of the following chapters.

Becker, W.M. & Deamer, D.W. (1991) The World of the Cell, 2nd edn, The Benjamin/Cummings Publishing Company, Redwood City, CA. 
An excellent introductory textbook of cell biology.

Curtis, H. & Barnes, N.S. (1989) Biology, 5th edn, Worth Publishers, Inc., New York. 
A beautifully written and illustrated general biology textbook.

Darnell, J., Lodish, H., & Baltimore, D. (1990) Molecular Cell Biology, 2nd edn, Scientific American Books, Inc., New York. 
Like the book by Alberts and coauthors, a superb text useful for this and later chapters.

Prescott, D.M. (1988) Cells, Jones and Bartlett Publishers, Boston, MA. 
A short, well-illustrated introductory textbook on cell structure and function, with emphasis on structure.

Evolution of Cells

Evolution of Catalytic Function. (1987) Cold Spring Harb. Symp. Quant. Biol. 52. 
A collection of excellent papers on many aspects of molecular and cellular evolution.

Knoll, A.H. (1991) End of the proterozoic eon. Sci. Am. 265 (October), 64–73. 
Discussion of the evidence that an increase in atmospheric oxygen led to the development of multi-cellular organisms, including large animals.

Margulis, L. (1992) Symbiosis in Cell Euolution. Microbial Evolution in the Archean and Proterozoic Eons, 2nd edn, W.H. Freeman and Company, New York. 
Clear discussion of the hypothesis that mitochondria and chloroplasts are descendants of bacteria that became symbiotic with primitive eukaryotic cells.

Schopf, J.W. (1978) The evolution of the earliest cells. Sci. Am. 239 (September), 110–139. 

Vidal, G. (1984) The oldest eukaryotic cell. Sci. Am. 250 (February), 48–57. 

Structure of Cells and Organelles

Bloom, W. & Fawcett, D.W. (1986) A Textbook of Histology, 11th edn, W.B. Saunders Company, Philadelphia, PA. 
A standard textbook, containing detailed descriptions of the structures of animal cells, tissues, and organs.

de Duve, C. (1984) A Guided Tour of the Living Cell, Scientific American Books, Inc., New York. 
An easy-to-read, well-illustrated description of the structure and functions of the organelles of the eukaryotic cell.

Margulis, L. & Schwartz, K.V. (1987) Five Kingdoms: An Illustrated Guide to the Phyla of Life on Earth, 2nd edn, W.H. Freeman and Company, New York. 
Description of unicellular and multicellular organisms, beautifully illustrated with electron micrographs and drawings showing the diversity of structure and function.

Rothman, J.E. (1985) The compartmental organization of the Golgi apparatus. Sci. Am. 253 (September), 74–89. 

Cytoskeleton

Gelfand, V. & Bershadsky, A.D. (1991) Microtubule dynamics: mechanism, regulation, and function. Annu. Rev. Cell Biol. 7, (September), 93–116. 

Organization of the Cytoplasm. (1981) Cold Spring Harb. Symp. Quant. Biol. 46. 
More than 90 excellent papers on microtubules, microfilaments, and intermediate filaments and their biological roles.

Schroer, T.A. & Sheetz, M.P. (1991) Functions of microtubule-based motors. Annu. Rev. Physiol. 53, 629–652. 

Steinert, P.M. & Parry, D.A.D. (1985) Intermediate filaments: conformity and diversity of expression and structure. Annu. Rev. Cell Biol. 1, 41–65. 

Stossel, T.P. (1989) From signal to pseudopod: how cells control cytoplasmic actin assembly. J. Biol. Chem. 264, 18261–18264. 

Vale, R.D. (1990) Microtubule-based motor proteins. Curr. Opinion Cell Biol. 2, 15–22. 

Vallee, R.B. & Shpetner, H.S. (1990) Motor proteins of cytoplasmic microtubules. Annu. Rev. Biochem. 59, 909–932. 

Problems

Some problems on the contents of Chapter 2 follow. They involve simple geometrical and numerical relationships concerning cell structure and activities. (For your reference in solving these problems, please see the tables printed on the inside of the back cover.) Each problem has a title for easy reference and discussion.

1. The Size of Cells and Their Components  Given their approximate diameters, calculate the approximate number of (a) hepatocytes (diameter 20 μm), (b) mitochondria (1.5 μm), and (c) actin molecules (3.6 nm) that can be placed in a single layer on the head of a pin (diameter 0.5 mm). Assume each structure is spherical. The area of a circle is πr2, where π = 3.14.

2. Number of Solute Molecules in the Smallest Known Cells  Mycoplasmas are the smallest known cells. They are spherical and have a diameter of about 0.33 μm. Because of their small size they readily pass through filters designed to trap larger bacteria. One species, Mycoplasma pneumoniae, is the causative organism of the disease primary atypical pneumonia.

       (a) D-Glucose is the major energy-yielding nutrient of mycoplasma cells. Its concentration within such cells is about 1.0 mM. Calculate the number of glucose molecules in a single mycoplasma cell. Avogadro’s number, the number of molecules in 1 mol of a nonionized substance, is 6.02 × 1023. The volume of a sphere is 4πr3/3.
       (b) The first enzyme required for the energy-yielding metabolism of glucose is hexokinase (Mr 100,000). Given that the intracellular fluid of mycoplasma cells contains 10 g of hexokinase per liter, calculate the molar concentration of hexokinase.

3. Components of E. coli  E. coli cells are rodshaped, about 2 μm long and 0.8 μm in diameter. The volume of a cylinder is πr2h, where h is the height of the cylinder.

       (a) If the average density of E. coli (mostly water) is 1.1 × 103 g/L, what is the weight of a single cell?
       (b) The protective cell wall of E. coli is 10 nm thick. What percentage of the total volume of the bacterium does the wall occupy?
       (c) E. coli is capable of growing and multiplying rapidly because of the inclusion of some 15,000 spherical ribosomes (diameter 18 nm) in each cell, which carry out protein synthesis. What percentage of the total cell volume do the ribosomes occupy?

4. Genetic Information in E. coli DNA  The genetic information contained in DNA consists of a linear sequence of successive code words, known as codons. Each codon is a specific sequence of three nucleotides (three nucleotide pairs in doublestranded DNA), and each codon codes for a single amino acid unit in a protein. The molecular weight of an E. coli DNA molecule is about 2.5 × 109. The average molecular weight of a nucleotide pair is 660, and each nucleotide pair contributes 0.34 nm to the length of DNA.

       (a) Calculate the length of an E. coli DNA molecule. Compare the length of the DNA molecule with the actual cell dimensions. How does the DNA molecule fit into the cell?
       (b) Assume that the average protein in E. coli consists of a chain of 400 amino acids. What is the maximum number of proteins that can be coded by an E. coli DNA molecule?

5. The High Rate of Bacterial Metabolism  Bacterial cells have a much higher rate of metabolism than animal cells. Under ideal conditions some bacteria will double in size and divide in 20 min, whereas most animal cells require 24 h. The high rate of bacterial metabolism requires a high ratio of surface area to cell volume.

       (a) Why would the surface-to-volume ratio have an effect on the maximum rate of metabolism?
       (b) Calculate the surface-to-volume ratio for the spherical bacterium Neisseria gonorrhoeae (diameter 0.5 μm), responsible for the disease gonorrhea. Compare it with the surface-to-volume ratio for globular amoeba, a large eukaryotic cell of diameter 150 μm. The surface area of a sphere is 4πr2.

6. A Strategy to Increase the Surface Area of Cells  Certain cells whose function is to absorb nutrients, e.g., the cells lining the small intestine or the root hair cells of a plant, are optimally adapted to their role because their exposed surface area is increased by microvilli. Consider a spherical epithelial cell (diameter 20 μm) lining the small intestine. Since only a part of the cell surface faces the interior of the intestine, assume that a “patch” corresponding to 25% of the cell area is covered with microvilli. Furthermore, assume that the microvilli are cylinders 0.1 μm in diameter, 1.0 μm long, and spaced in a regular grid 0.2 μm on center.

       (a) Calculate the number of microvilli on the patch.
       (b) Calculate the surface area of the patch, assuming it has no microvilli.
       (c) Calculate the surface area of the patch, assuming it does have microvilli.
       (d) What percentage improvement of the absorptive capacity (reflected by the surface-tovolume ratio) does the presence of microvilli provide?

7. Fast Axonal Transport  Some neurons have long, thin extensions (axons) as long as 2 m. Small membrane vesicles carrying materials essential to axonal function move along microtubules from the cell body to the tip of the axon by kinesin-dependent “fast axonal transport”. If the average velocity of a vesicle is 1 μm/s, how long does it take a vesicle to move the 2 m from cell body to axonal tip? What are the possible advantages of this ATP-dependent process over simple diffusion to move materials to the axonal tip?

8. Toxic Effects of Phalloidin  Phalloidin is a toxin produced by the mushroom Amanita phalloides. It binds specifically to actin microfilaments and blocks their disassembly. Cytochalasin B is another toxin, which blocks microfilament assembly from actin monomers (see p. 42).

       (a) Predict the effect of phalloidin on cytokinesis, phagocytosis, and amoeboid movement, given the effects of cytochalasins on these processes.
       (b) A specific antibody (a protein of Mr ≈ 150,000) binds actin tightly and is found to block microfilament assembly in vitro (in the test tube). Would you expect this antibody to mimic the effects of cytochalasin in vivo (in living cells)?

9. Osmotic Breakage of Organelles  In the isolation of cytosolic enzymes, cells are often broken in the presence of 0.2 M sucrose to prevent osmotic swelling and bursting of the intracellular organelles. If the desired enzymes are in the cytosol, why is it necessary to be concerned about possible damage to particulate organelles?

www.bioinfo.org.cn/book/biochemistry/chapt02/bio6.htm
Chapter 3
Biomolecules
The chemical composition of living material, such as this jellyfish, differs from that of its physical environment, which for this organism is salt water.
Biochemistry aims to explain biological form and function in chemical terms. One of the most fruitful approaches to understanding biological phenomena has been to purify an individual chemical component, such as a protein, from a living organism and to characterize its chemical structure or catalytic activity. As we begin the study of biomolecules and their interactions, some basic questions deserve attention. What chemical elements are found in cells? What kinds of molecules are present in living matter? In what proportions do they occur? How did they come to be there? In what ways are the kinds of molecules found in living cells especially suited to their roles?

We review here some of the chemical principles that govern the properties of biological molecules: the covalent bonding of carbon with itself and with other elements, the functional groups that occur in common biological molecules, the three-dimensional structure and stereochemistry of carbon compounds, and the common classes of chemical reactions that occur in living organisms. Next, we discuss the monomeric units and the contribution of entropy to the free-energy changes of reactions in which these units are polymerized to form macromolecules. Finally, we consider the origin of the monomeric units from simple compounds in the earth’s atmosphere during prebiological times – that is, chemical evolution.

Figure 3–1  Elements essential to animal life and health. Bulk elements (shaded orange) are structural components of cells and tissues and are required in the diet in gram quantities daily. For trace elements (shaded yellow), the requirements are much smaller: for humans, a few milligrams per day of Fe, Cu, and Zn, even less of the others. The elemental requirements for plants and microorganisms are very similar to those shown here.

By the beginning of the nineteenth century, it had become clear to chemists that the composition of living matter is strikingly different from that of the inanimate world. Antoine Lavoisier (1743–1794) noted the relative chemical simplicity of the "mineral world", and contrasted it with the complexity of the "plant and animal worlds"; the latter, he knew, were composed of compounds rich in the elements carbon, oxygen, nitrogen, and phosphorus. The development of organic chemistry preceded, and provided invaluable insights for, the development of biochemistry.

We will briefly review some fundamental concepts of organic chemistry: the nature of bonding between atoms of carbon and of hydrogen, oxygen, and nitrogen; the functional groups that result from these combinations; and the diversity of organic compounds that are derived from these elements.

Figure 3–2  Covalent bonding. Two atoms with unpaired electrons in their outer shells can form covalent bonds with each other by sharing electron pairs. Atoms participating in covalent bonding tend to fill their outer electron shells.

Only about 30 of the more than 90 naturally occurring chemical elements are essential to living organisms. Most of the elements in living matter have relatively low atomic numbers; only five have atomic numbers above that of selenium, 34 (Fig. 3–1). The four most abundant elements in living organisms, in terms of the percentage of the total number of atoms, are hydrogen, oxygen, nitrogen, and carbon, which together make up over 99% of the mass of most cells. They are the lightest elements capable of forming one, two, three, and four bonds, respectively (Fig. 3–2). In general, the lightest elements form the strongest bonds. Six of the eight most abundant elements in the

human body are also among the nine most abundant elements in seawater (Table 3–1), and several of the elements abundant in humans are components of the atmosphere and were probably present in the atmosphere before the appearance of life on earth. Primitive seawater was most likely the liquid medium in which living organisms first arose, and the primitive atmosphere was probably a source of methane, ammonia, water, and hydrogen, the starting materials for the evolution of life. The trace elements (Fig. 3–1) represent a miniscule fraction of the weight of the human body, but all are absolutely essential to life, usually because they are essential to the function of specific enzymes (Table 3–2).

Figure 3–3  Versatility of carbon in forming covalent single, double, and triple bonds (in red), particularly between carbon atoms. Triple bonds occur only rarely in biomolecules.

The chemistry of living organisms is organized around the element carbon, which accounts for more than one-half the dry weight of cells. In methane (CH4), a carbon atom shares four electron pairs with four hydrogen atoms; each of the shared electron pairs forms a single bond. Carbon can also form single and double bonds to oxygen and nitrogen atoms (Fig. 3–3). Of greatest significance in biology is the ability of carbon atoms to share electron pairs with each other to form very stable carbon–carbon single bonds. Each carbon atom can form single bonds with one, two, three, or four other carbon atoms. Two carbon atoms also can share two (or three) electron pairs, thus forming carbon–carbon double (or triple) bonds (Fig. 3–3). Covalently linked carbon atoms can form linear chains, branched chains, and cyclic and cagelike structures. To these carbon skeletons are added groups of other atoms, called functional groups, which confer specific chemical properties on the molecule. Molecules containing covalently bonded carbon backbones are called organic compounds; they occur in an almost limitless variety. Most biomolecules are organic compounds; we can therefore infer that the bonding versatility of carbon was a major factor in the selection of carbon compounds for the molecular machinery of cells during the origin and evolution of living organisms.

Figure 3–4  (a) Carbon atoms have a characteristic tetrahedral arrangement of their four single bonds, which are about 0.154 nm long and at an angle of 109.5° to each other. (b) Carbon–carbon single bonds have freedom of rotation, shown for the compound ethane (CH3–CH3). (c) Carbon–carbon double bonds are shorter and do not allow free rotation. The single bonds on each doubly bonded carbon make an angle of 120° with each other. The two doubly bonded carbons and the atoms designated A, B, X, and Y all lie in the same rigid plane.

The four covalent single bonds that can be formed by a carbon atom are arranged tetrahedrally, with an angle of about 109.5° between any two bonds (Fig. 3–4) and an average length of 0.154 nm. There is free rotation around each carbon–carbon single bond unless very large or highly charged groups are attached to both carbon atoms, in which case rotation may be restricted. A carbon–carbon double bond is shorter (about 0.134 nm long) and rigid and allows little rotation about its axis. (Fig. 3–4). No other chemical element can form molecules of such widely different sizes and shapes or with such a variety of functional groups.

Figure 3–5  Some functional groups frequently encountered in biomolecules. All groups are shown in their uncharged (un-ionized) form.
Figure 3–6  Representative biomolecules with multiple functional groups. Note that secondary (s) and tertiary (t) amino groups have, respectively, one and two of their amino hidrogens replaced by other groups.

Most biomolecules can be regarded as derivatives of hydrocarbons, compounds with a covalently linked carbon backbone to which only hydrogen atoms are bonded. The backbones of hydrocarbons are very stable. The hydrogen atoms may be replaced by a variety of functional groups to yield different families of organic compounds. Typical families of organic compounds are the alcohols, which have one or more hydroxyl groups; amines, which have amino groups; aldehydes and ketones, which have carbonyl groups; and carboxylic acids, which have carboxyl groups (Fig. 3–5).

Many biomolecules are polyfunctional, containing two or more different kinds of functional groups (Fig. 3–6), each with its own chemical characteristics and reactions. Amino acids, an important family of molecules that serve primarily as monomeric subunits of proteins, contain at least two different kinds of functional groups: an amino group and a carboxyl group, as shown for histidine in Figure 3–6. The ability of an amino acid to condense (see Fig. 3–14e) with other amino acids to form proteins is dependent on the chemical properties of these two functional groups.

Although the covalent bonds and functional groups of biomolecules are central to their function, they do not tell the whole story. The arrangement in three-dimensional space of the atoms of a biomolecule is also crucially important. Compounds of carbon can often exist in two or more chemically indistinguishable three-dimensional forms, only one of which is biologically active. This specificity for one particular molecular configuration is a universal feature of biological interactions. All biochemistry is three-dimensional.

Figure 3–7  Models of the structure of the amino acid alanine. (a) Structural formula in perspective form. The symbol ◅ represents a bond in which the atom at the wide end projects out of the plane of the paper, toward the reader; dashes represent a bond extending behind the plane of the paper. (b) Ball-and-stick model, showing relative bond lengths and the bond angles. The balls indicate the approximate size of the atomic nuclei.
(c) Space-filling model, in which each atom is shown having its correct van der Waals radius (see Table 3–3).
Figure 3–8  Complementary fit of a substrate molecule to the active or catalytic site on an enzyme molecule. The enzyme shown here is chymotrypsin, an enzyme that acts in the intestine to degrade dietary protein. Its substrate (shown in red) fits into a groove at the active site of the enzyme.

Biomolecules have characteristic sizes and three-dimensional structures, which derive from their backbone structures and their substituent functional groups. Figure 3–7 shows three ways to illustrate the three-dimensional structures of molecules. The perspective diagram specifies unambiguously the three-dimensional structure (stereochemistry) of a compound. Bond angles and center-to-center bond lengths are best represented with ball-and-stick models, whereas the outer contours of molecules are better represented by space-filling models. In space-filling models, the radius of each atom is proportional to its van der Waals radius (Table 3–3), and the contours of the molecule represent the outer limits of the region from which atoms of other molecules are excluded.

The three-dimensional conformation of biomolecules is of the utmost importance in their interactions; for example, in the binding of a substrate (reactant) to the catalytic site of an enzyme (Fig. 3–8), the two molecules must fit each other closely, in a complementary fashion, for biological function. Such complementarity also is required in the binding of a hormone molecule to its receptor on a cell surface, or in the recognition of an antigen by a specific antibody.

The study of the three-dimensional structure of biomolecules with precise physical methods is an important part of modern research on cell structure and biochemical function. The most informative method is x-ray crystallography. If a compound can be crystallized, the diffraction of x rays by the crystals can be used to determine with great precision the position of every atom in the molecule relative to every other atom. The structures of most small biomolecules (those with less than about 50 atoms), and of many larger molecules such as proteins, have been deduced by this means. X-ray crystallography yields a static picture of the molecule within the confines of the crystal. However, biomolecules almost never exist within cells as crystals; rather, they are dissolved in the cytosol or associated with some other component(s) of the cell. Molecules have more freedom of intramolecular motion in solution than in a crystal. In large molecules such as proteins, the small variations allowed in the three-dimensional structures of their monomeric subunits add up to extensive flexibility. Techniques such as nuclear magnetic resonance (NMR) spectroscopy complement x-ray crystallography by providing information about the three-dimensional structure of biomolecules in solution. Knowledge of the detailed three-dimensional structure of a molecule often sheds light on the mechanisms of the reactions in which the molecule participates.

Figure 3–9  Molecular asymmetry: chiral and achiral molecules. (a) When a carbon atom has four difierent substituent groups (A, B, X, Y), they can be arranged in two ways that represent nonsuperimposable mirror images of each other (enantiomers). Such a carbon atom is asymmetric and is called a chiral atom or chiral center. (b) When there are only three dissimilar groups around the carbon atom (i.e., the same group occurs twice), only one configuration in space is possible and the molecule is symmetric, or achiral. In this case the molecule is superimposable on its mirror image: the molecule on the left can be rotated counterclockwise (when looking down its vertical bond from A to C) to create the molecule on the right.
Figure 3–10  Pasteur separated crystals of two stereoisomers
of tartaric acid and showed that solutions of the separated forms each rotated polarized light to the same extent but in opposite directions. Pasteur’s dextrorotatory and levorotatory forms were later shown to be the R,R and S,S isomers shown here. For compounds with more than one chiral center, the RS system of nomenclature is often more useful than the D and L system described in Chapter 5. In the RS system, each group attached to a chiral carbon is assigned a priority. The priorities of some common substituents are: –OCH2 > –OH > –NH2 > –COOH > –CHO > –CH2OH > –CH3 > –H. The chiral carbon atom is viewed with the group of lowest priority pointing away from the viewer. If the priority of the other three groups decreases in counterclockwise order, the configuration is S; if in clockwise order, R. In this way each chiral carbon is designated as either R or S, and the inclusion of these designations in the name of the compound provides an unambiguous description of the stereochemistry at each chiral center.
Figure 3–11  Configurations of stereoisomers. (a) Isomers such as maleic acid and fumaric acid cannot be interconverted without breaking covalent bonds, which requires the input of much energy. (b) In the vertebrate retina, the initial event in light detection is the absorption of visible light by 11-cis-retinal. The energy of the absorbed light (about 250 kJ/mol) converts 11-cis-retinal to all-trans-retinal, triggering electrical changes in the retinal cell that lead to a nerve impulse.
Figure 3–12  Many conformations of ethane are possible because of freedom of rotation around the carbon–carbon single bond. When the front carbon atom (as viewed by the reader) and its three attached hydrogens are rotated relative to the rear carbon atom, the potential energy of the molecule rises in the fully eclipsed conformation (torsion angle 0°, 120°, etc.), then falls in the fully staggered conformation (torsion angle 60°, 180°, etc.). The energy differences are small enough to allow rapid interconversion of the two forms (millions of times per second), thus the eclipsed and staggered forms cannot be isolated separately.

The tetrahedral arrangement of single bonds around a carbon atom confers on some organic compounds another property of great importance in biology. When four different atoms or functional groups are bonded to a carbon atom in an organic molecule, the carbon atom is said to be asymmetric; it can exist in two different isomeric forms (stereoisomers) that have different configurations in space. A special class of stereoisomers, called enantiomers, are nonsuperimposable mirror images of each other (Fig. 3–9). The two enantiomers of a compound have identical chemical properties, but differ in a characteristic physical property, the ability to rotate the plane of plane-polarized light. A solution of one enantiomer rotates the plane of such light to the right, and a solution of the other, to the left. Compounds without an asymmetric carbon atom do not rotate the plane of plane-polarized light.

Louis Pasteur
1822–1895

Louis Pasteur, in 1843, was the first to arrive at the correct explanation for this phenomenon of optical activity. Investigating the crystalline material that accumulated in wine casks ("paratartaric acid," also called racemic acid, from Latin racemus, "grape"), he had used a fine forceps to separate two types of crystals identical in shape, but mirror images of each other (Fig. 3–10). Both proved to have all of the chemical properties of tartaric acid, but one type rotated polarized light to the left, the other, to the right, but to the same extent. He later described the experiment and its interpretation:

In isomeric bodies, the elements and the proportions in which they are combined are the same, only the arrangement of the atoms is different. . . . We know, on the one hand, that the molecular arrangements of the two tartaric acids are asymmetric, and, on the other hand, that these arrangements are absolutely identical, excepting that they exhibit asymmetry in opposite directions. Are the atoms of the dextro acid grouped in the form of a right-handed spiral, or are they placed at the apex of an irregular tetrahedron, or are they disposed according to this or that asymmetric arrangement? We do not know.*

* From Pasteur’s lecture to the Société Chimique de Paris in 1883, quoted in DuBos, R. (1976) Louis Pasteur: Free Lance of Science, p. 95, Charles Scribner’s Sons, New York.

Now we do know. X-ray crystallographic studies in 1951 confirmed that the levorotatory and dextrorotatory forms of tartaric acid are mirror images of each other, and established the absolute configuration of each (Fig. 3–10). The same approach has been used to demonstrate that the amino acid alanine exists in two enantiomeric forms (Chapter 5). The central carbon atom of the alanine molecule is bonded to four different substituent groups: a methyl group, an amino group, a carboxyl group, and a hydrogen atom. The two stereoisomers of alanine are nonsuperimposable mirror images of each other, and thus are enantiomers.

Compounds with asymmetric carbon atoms can be regarded as occurring in left- and right-handed forms, and are therefore called chiral compounds (Greek chiros, "hand"). Correspondingly, the asymmetric atom or center of chiral compounds is called the chiral atom or chiral center (Fig. 3–9). All but one of the 20 amino acids have chiral centers; glycine is the exception.

More generally, variations in the three-dimensional structure of biomolecules are described in terms of configuration and conformation. These terms are not synonyms. Configuration denotes the spatial arrangement of an organic molecule that is conferred by the presence of either (1) double bonds, around which there is no freedom of rotation, or (2) chiral centers, around which substituent groups are arranged in a specific sequence. The identifying characteristic of configurational isomers is that they cannot be interconverted without breaking one or more covalent bonds.

Figure 3–11a shows the configurations of maleic acid, which occurs in some plants, and its isomer fumaric acid, an intermediate in sugar metabolism. These compounds are geometric or cis–trans isomers; they differ in the arrangement of their substituent groups with respect to the nonrotating double bond. Maleic acid is the cis isomer and fumaric acid the trans isomer; each is a well-defined compound that can be isolated and purified. These two compounds are stereoisomers but not enantiomers; they are not mirror images of each other.

Molecular conformation refers to the spatial arrangement of substituent groups that are free to assume different positions in space, without breaking any bonds, because of the freedom of bond rotation. In the simple hydrocarbon ethane, for example, there is nearly complete freedom of rotation around the carbon–carbon single bond. Many different, interconvertible conformations of the ethane molecule are therefore possible, depending upon the degree of rotation (Fig. 3–12). Two conformations are of special interest: the staggered conformation, which is more stable than all others and thus predominates, and the eclipsed form, which is least stable. It is not possible to isolate either of these conformational forms, because they are freely interconvertible and in equilibrium with each other. However, when one or more of the hydrogen atoms on each carbon is replaced by a functional group that is either very large or electrically charged, freedom of rotation around the carbon–carbon single bond is hindered. This limits the number of stable conformations of the ethane derivative.
www.bioinfo.org.cn/book/biochemistry/chapt03/bio1.htm
Figure 3–13  Stereoisomers that are distinguished by sensory receptors for smell and taste in humans.
(a) Two stereoisomers of carvone, designated R and S (see Fig. 3–10, legend). R-carvone (from spearmint oil) has the characteristic fragrance of spearmint; S-carvone (from caraway seed oil) smells like caraway.
(b) Aspartame, the artificial sweetener sold under the trade name NutraSweet, is easily distinguishable by taste from its bitter-tasting stereoisomer, although the two differ only in the configuration about one of the two chiral carbon atoms (in red).

Many biomolecules besides amino acids are chiral, containing one or more asymmetric carbon atoms. The chiral molecules in living organisms are usually present in only one of their chiral forms. For example, the amino acids occur in proteins only as the L isomers. Glucose, the monomeric subunit of starch, has five asymmetric carbons, but occurs biologically in only one of its chiral forms, the D isomer. (The conventions for naming stereoisomers of the amino acids are described in Chapter 5; those for sugars, in Chapter 11). In contrast, when a compound having an asymmetric carbon atom is chemically synthesized in the laboratory, the nonbiological reactions usually produce all possible chiral forms in an equimolar mixture that does not rotate polarized light (a racemic mixture). The chiral forms in such a mixture can be separated only by painstaking physical methods. Chiral compounds in living cells are produced in only one chiral form because the enzymes that synthesize them are also chiral molecules.

Stereospecificity, the ability to distinguish between stereoisomers, is a common property of enzymes and other proteins and a characteristic feature of the molecular logic of living cells. If the binding site on a protein is complementary to one isomer of a chiral compound, it will not be complementary to the other isomer, for the same reason that a left glove does not fit a right hand. Two striking examples of the ability of biological systems to distinguish stereoisomers are shown in Figure 3–13.

Saturated hydrocarbons – molecules with carbon–carbon single bonds and without double bonds or substituent groups – are not easily attacked by most chemical reagents; biomolecules, with their various functional groups, are much more chemically reactive. Functional groups alter the electron distribution and the geometry of neighboring atoms and thus affect the chemical reactivity of the entire molecule. The breakage and formation of chemical bonds during cellular metabolism release energy, some in the form of heat.

It is possible to analyze and predict the chemical behavior and reactions of biomolecules from the functional groups they bear. Enzymes recognize a specific pattern of functional groups in a biomolecule and catalyze characteristic chemical changes in the compound that contains these groups. Although a large number of different chemical reactions occur in a typical cell, these reactions are of only a few types, readily understandable in terms that apply to all reactions of organic compounds.

When the two atoms sharing electrons in a covalent bond have equal affinities for the electrons, as in the case of two carbon atoms, the resulting bond is nonpolar. When two elements that differ in electron affinity, or electronegativity (Table 3–4), form a covalent bond (e.g., C and O), that bond is polarized; the shared electrons are more likely to be in the region of the more electronegative atom (O) than of the less electronegative (C). In the extreme case of two atoms of very different electronegativity (Na and Cl, for example), one of the atoms actually gives up the electron(s) to the other atom, resulting in the formation of ions and ionic interactions such as those in solid NaCl.

The strength of chemical bonds (Table 3–5) depends upon the relative electronegativities of the elements involved, the distance of the bonding electrons from each nucleus, and the nuclear charge. The number of electrons shared also influences bond strength; double bonds are stronger than single bonds, and triple bonds are stronger yet. The strength of a bond is expressed as bond energy, in joules. (In biochemistry, calories have often been used as units of energy – bond energy and free energy, for example. The joule is the unit of energy in the International System of Units, and is used throughout this book. For conversions, 1 cal is equal to 4.18 J.) Bond energy can be thought of as either the amount of energy required to break a bond or the amount of energy gained by the surroundings when two atoms form the bond. One way to put energy into a system is to heat it, which gives the molecules more kinetic energy; temperature is a measurement of the average kinetic energy of a population of molecules. When molecular motion is sufficiently violent, intramolecular vibrations and intermolecular collisions sometimes break chemical bonds. Heating raises the fraction of molecules with energies high enough to react.

In chemical reactions, bonds are broken and new ones are formed. The difference between the energy from the surroundings used to break bonds and the energy gained by the surroundings in the formation of new ones is virtually identical to the enthalpy change for the reaction, ΔH. (The energy difference becomes exactly equal to the enthalpy change after a slight correction for any volume change in the

system at constant pressure.) If heat energy is absorbed by the system as the change occurs (that is, if the reaction is endothermic), then ΔH has, by definition, a positive value; when heat is produced, as in exothermic reactions, ΔH is negative. In short, the change in enthalpy for a covalent reaction reflects the kinds and numbers of bonds that are made and broken. As we shall see later in this chapter, the enthalpy change is one of three factors that determine the free-energy change for a reaction; the other two are the temperature and the change in entropy.
Figure 3–14  Examples of five general types of chemical transformations that occur in cells. The reactions (a) through (d) are enzyme-catalyzed reactions that take place in your tissues as you use glucose as a source of energy (Chapter 14). In (a) a phosphoryl group is transferred from ATP to glucose; (b) an aldehyde is oxidized to a carboxylic acid and an oxidized electron carrier (NADP+) is reduced; (c) a rearrangement converts an aldehyde to a ketone; (d) a molecule is cleaved to form two smaller molecules. Reaction (e) represents the condensation of two amino acids with the elimination of H2O to form a peptide bond; condensation reactions occur in many cellular processes in which larger molecules are assembled from small precursors.

Most cells have the capacity to carry out thousands of specific, enzyme-catalyzed reactions: transformation of simple nutrients such as glucose into amino acids, nucleotides, or lipids; extraction of energy from fuels by oxidation; or polymerization of subunits into macromolecules, for example. Fortunately for the student of biochemistry, there is a pattern in this multitude of reactions; we do not need to learn all of these reactions to comprehend the molecular logic of life.

Most of the reactions in living cells fall into one of five general categories (Fig. 3–14): functional-group transfers (a), oxidations and reductions (b), reactions that rearrange the bond structure around one or more carbons (c), reactions that form or break carbon–carbon bonds (d), and reactions in which two molecules condense, with the elimination of a molecule of water (e). Reactions within one category generally occur by similar mechanisms.

The mechanisms of biochemical reactions are not fundamentally different from other chemical reactions. Many biochemical reactions involve interactions between nucleophiles, functional groups rich in electrons and capable of donating them, and electrophiles, electrondeficient functional groups that seek electrons. Nucleophiles combine with, and give up electrons to, electrophiles. Functional groups containing oxygen, nitrogen, and sulfur are important biological nucleophiles (Table 3–6). Positively charged hydrogen atoms (protons) and positively charged metals (cations) frequently act as electrophiles in cells. A carbon atom can act as either a nucleophilic or an electrophilic center, depending upon which bonds and functional groups surround it.

www.bioinfo.org.cn/book/biochemistry/chapt03/bio2.htm

Many of the molecules found within cells are macromolecules, polymers of high molecular weight assembled from relatively simple precursors. Polysaccharides, proteins, and nucleic acids, which may have molecular weights ranging from tens of thousands to (in the case of DNA) billions, are produced by the polymerization of relatively small subunits with molecular weights of 500 or less. The synthesis of macromolecules is a major energy-consuming activity of cells. Macromolecules themselves may be further assembled into supramolecular complexes, forming functional units such as ribosomes, membranes, and organelles.

Table 3–7 shows the major classes of biomolecules in a representative single-celled organism, Escherichia coli. Water is the most abundant single compound in E. coli and in all other cells and organisms. Inorganic salts and mineral elements, on the other hand, constitute only a very small fraction of the total dry weight, but many of them are in approximate proportion to their distribution in seawater (see Table 3–1). Nearly all of the solid matter in all kinds of cells is organic and is present in four forms: proteins, nucleic acids, polysaccharides, and lipids.

Proteins, long polymers of amino acids, constitute the largest fraction (besides water) of cells. Some proteins have catalytic activity and function as enzymes, others serve as structural elements, and still others carry specific signals (in the case of receptors) or specific substances (in the case of transport proteins) into or out of cells. Proteins are perhaps the most versatile of all biomolecules. The nucleic acids, DNA and RNA, are polymers of nucleotides. They store, transmit, and translate genetic information. The polysaccharides, polymers of simple sugars such as glucose, have two major functions: they serve as energy-yielding fuel stores and as extracellular structural elements. Shorter polymers of sugars (oligosaccharides) attached to proteins or lipids at the cell surface serve as specific cellular signals. The lipids, greasy or oily hydrocarbon derivatives, serve as structural components of membranes, as a storage form of energy-rich fuel, and in other roles. These four classes of large biomolecules are all synthesized in condensation reactions (Fig. 3–14e). In macromolecules – proteins, nucleic acids, and polysaccharides – the number of monomeric subunits is

very large. Proteins have molecular weights in the range of 5,000 to over 1 million; the nucleic acids have molecular weights ranging up to several billion; and polysaccharides, such as starch, also have molecular weights into the millions. Individual lipid molecules are much smaller (Mr 750 to 1,500), and are not classed as macromolecules. However, when large numbers of lipid molecules associate noncovalently, very large structures result. Cellular membranes are built of enormous aggregates containing millions of lipid molecules.
Figure 3–15  Informational and structural macromolecules. A, T, C, and G represent the four deoxynucleotides of DNA, and glucose (Glc) is the repeating monomeric subunit of starch and cellulose. The number of possible permutations and combinations of four deoxynucleotides is virtually limitless, as is the number of melodies possible with a few musical notes. A polymer of one subunit type is information-poor and monotonous.

Although living organisms contain a very large number of different proteins and different nucleic acids, a fundamental simplicity underlies their structure (Chapter 1). The simple monomeric subunits from which all proteins and all nucleic acids are constructed are few in number and identical in all living species. Proteins and nucleic acids are informational macromolecules: each protein and each nucleic acid has a characteristic information-rich subunit sequence (Fig. 3–15).

Polysaccharides built from only a single kind of unit, or from two different alternating units, are not informational molecules in the same sense as are proteins and nucleic acids (Fig. 3–15). However, complex polysaccharides made up of six or more different kinds of sugars connected in branched chains do have the structural and stereochemical variety that enables them to carry information recognizable by other macromolecules.

Figure 3–16  The organic compounds from which most larger structures in cells are constructed: the ABCs of biochemistry. Shown on these two pages are (a) the 20 amino acids from which the proteins of all organisms are built (the side chains are shaded red), (b) the five nitrogenous bases, two five-carbon sugars, and phosphoric acid from which all nucleic acids are built, (c) five components found in many membrane lipids, and (d) α-D-glucose, the parent sugar from which most carbohydrates are derived. Note that phosphoric acid is a subunit of both nucleic acids and membrane lipids. The five-carbon and six-carbon sugars are shown here in their ring forms rather than their straightchain forms (Chapter 11). All components are shown in their un-ionized form.
Figure 3–17  Each simple component in Fig. 3–16 is a precursor of many other kinds of biomolecules.

Figure 3–16 shows the structures of some monomeric units, arranged in families. We have already seen that the most abundant polysaccharides in nature, starch and cellulose, are constructed of repeating units of D-glucose. The monomeric subunits of proteins are 20 different amino acids; all have an amino group (an imino group in the case of proline) and a carboxyl group attached to the same carbon atom, called, by convention, the α carbon. These α-amino acids differ from each other only in their side chains (Fig. 3–16).

The recurring structural units of all nucleic acids are eight different nucleotides; four kinds of nucleotides are the structural units of DNA, and four others are the units of RNA. Each nucleotide is made up of three components: (1) a nitrogenous organic base, (2) a five-carbon sugar, and (3) phosphate (Fig. 3–16). The eight different nucleotides of DNA and RNA are built from five different organic bases combined with two different sugars.

Lipids also are constructed from relatively few kinds of subunits. Most lipid molecules contain one or more long-chain fatty acids, of which palmitic acid and oleic acid are parent compounds. Many lipids also contain an alcohol, e.g., glycerol, and some contain phosphate (Fig. 3–16). Thus, only three dozen different organic compounds are the parents of most biomolecules.

Each of the compounds in Figure 3–16 has multiple functions in living organisms (Fig. 3–17). Amino acids are not only the monomeric subunits of proteins; some also act as neurotransmitters and as precursors of hormones and toxins. Adenine serves both as a subunit in the structure of nucleic acids and of ATP, and as a neurotransmitter. Fatty acids serve as components of complex membrane lipids, energy-rich fuel-storage fats, and the protective waxy coats on leaves and fruits. D-Glucose is the monomeric subunit of starch and cellulose, and also is the precursor of other sugars such as D-mannose and sucrose.

J. Willard Gibbs
1839–1903

It is extremely improbable that amino acids in a mixture would spontaneously condense into a protein with a unique sequence. This would represent increased order in a population of molecules; but according to the second law of thermodynamics (Chapter 13) the tendency is toward ever-greater disorder in the universe. To bring about the synthesis of macromolecules from their monomeric subunits, free energy must be supplied to the system (the cell).

The randomness of the components of a chemical system is expressed as entropy, symbolized S. Any change in randomness of the system is the entropy change, ΔS, which has a positive value when randomness increases. J. Willard Gibbs, who developed the theory of energy changes during chemical reactions, showed that the free-energy content (G; recall Chapter 1) of any isolated system can be defined in terms of three quantities: enthalpy (H) (reflecting the number and kinds of bonds; see p. 66), entropy (S), and T, the absolute temperature (Kelvin). The definition of free energy is: G = HTS. When a chemical reaction occurs at constant temperature, the free-energy change is determined by ΔH, reflecting the kinds and numbers of chemical bonds and noncovalent interactions broken and formed, and ΔS, the change in the system’s randomness:

ΔG = ΔHT ΔS

Recall from Chapter 1 that a process tends to occur spontaneously only if ΔG is negative. How, then, can cells synthesize polymers such as proteins and nucleic acids, if the free-energy change for polymerizing subunits is positive? They couple these thermodynamically unfavorable (endergonic) reactions to other cellular reactions that liberate free energy (exergonic reactions), so that the sum of the free-energy changes is negative:

Amino acids →  proteins ΔG1 is positive (endergonic)
ATP →  AMP + 2 PO43- ΔG2 is negative (exergonic)

Sum: Amino acids + ATP   →  proteins + AMP + 2 PO43-

The sum of ΔG1 and ΔG2 is negative (the overall process is exergonic).
Figure 3–18  The structural hierarchy in the molecular organization of cells. The nucleus of this plant cell, for example, contains several types of supramolecular complexes, including chromosomes. Chromosomes consist of macromolecules – DNA and many different proteins. Each type of macromolecule is constructed from simple subunits – DNA from the deoxyribonucleotides, for example.  
(Adapted from Becker, W.M. and Deamer, D.W. (1991) The World of the Cell, 2nd edn, Fig. 2–15, The Benjamin/Cummings Publishing Company, Menlo Park, CA)

The monomeric subunits in Figure 3–16 are very small compared with biological macromolecules. An amino acid molecule such as alanine is less than 0.5 nm long. Hemoglobin, the oxygen-carrying protein of erythrocytes, consists of nearly 600 amino acid units covalently linked into four long chains, which are folded into globular shapes and associated in a tetrameric structure with a diameter of 5.5 nm. Protein molecules in turn are small compared with ribosomes (about 20 nm in diameter), which contain about 70 different proteins and several different RNA molecules. Ribosomes, in their turn, are much smaller than organelles such as mitochondria, typically 1,000 nm in diameter. It is a long jump from the simple biomolecules to the larger cellular structures that can be seen with the light microscope. Figure 3–18 illustrates the structural hierarchy in cellular organization.

In proteins, nucleic acids, and polysaccharides, the individual subunits are joined by covalent bonds. By contrast, in supramolecular complexes, the different macromolecules are held together by noncovalent interactions – much weaker, individually, than covalent bonds. Among these are hydrogen bonds (between polar groups), ionic interactions (between charged groups), hydrophobic interactions (between nonpolar groups), and van der Waals interactions, all of which have energies of only a few kilojoules, compared with covalent bonds, which have bond energies of 200 to 900 kJ/mol (see Table 3–5). The nature of these noncovalent interactions will be described in the next chapter.

The large numbers of weak interactions between macromolecules in supramolecular complexes stabilize the resulting noncovalent structures.

Although the monomeric subunits of macromolecules are so much smaller than cells and organelles, they influence the shape and function of these much larger structures. In sickle-cell anemia, a hereditary human disorder, the hemoglobin molecule is defective. In the two β chains of hemoglobin from healthy individuals, a glutamic acid residue occurs at position 6. In people with sickle-cell anemia, a valine residue occurs at position 6. This single difference in the sequence of the 146 amino acids of the β chain affects only a tiny portion of the molecule, yet it causes the hemoglobin to form large aggregates within the erythrocytes, which become deformed (sickled) and function abnormally.

Because all biological macromolecules are made from the same three dozen subunits, it seems likely that all living organisms descended from a single primordial cell line. These subunits are proposed to have had, singly and collectively, the most successful combination of chemical and physical properties for their function as the raw materials of biological macromolecules and for carrying out the basic energy-transforming and self-replicating features of a living cell. These primordial organic compounds may have been retained during biological evolution over billions of years because of their unique fitness.

Figure 3–19  Lightning evoked by a volcanic eruption that resulted in the formation of the island of Surtsey off the coast of Iceland in 1963. The intense fields of electrical, thermal, and shock-wave energy generated by such cataclysms, which were frequent on the primitive earth, could have been a major factor in the origin of organic compounds.

We come now to a puzzle. Apart from their occurrence in living organisms, organic compounds, including the basic biomolecules, occur only in trace amounts in the earth’s crust, the sea, and the atmosphere. How did the first living organisms acquire their characteristic organic building blocks? In 1922, the biochemist Aleksandr I. Oparin proposed a theory for the origin of life early in the history of the earth, postulating that the atmosphere was once very different from that of today. Rich in methane, ammonia, and water, and essentially devoid of oxygen, it was a reducing atmosphere, in contrast to the oxidizing environment of our era. In Oparin’s theory, electrical energy of lightning discharges or heat energy from volcanoes (Fig. 3–19) caused ammonia, methane, water vapor, and other components of the primitive atmosphere to react, forming simple organic compounds. These compounds then dissolved in the ancient seas, which over many millenia became enriched with a large variety of simple organic compounds. In this warm solution (the "primordial soup") some organic molecules had a greater tendency than others to associate into larger complexes. Over millions of years, these in turn assembled spontaneously to form membranes and catalysts (enzymes), which came together to become precursors of the first primitive cells. For many years, Oparin’s views remained speculative and appeared untestable.

Figure 3–20  Spark-discharge apparatus of the type used by Miller and Urey in experiments demonstrating abiotic formation of organic compounds under primitive atmospheric conditions. After subjecting the gaseous contents of the system to electrical sparks, products were collected by condensation. Biomolecules such as amino acids were among the products (see Table 3–8).
Figure 3–21  Among the products of electrical discharge through an atmosphere containing HCN are compounds such as those in (a). These compounds promote the polymerization of monomers such as amino acids into polymers (b).

A classic experiment on the abiotic (nonbiological) origin of organic biomolecules was carried out in 1953 by Stanley Miller in the laboratory of Harold Urey. Miller subjected gaseous mixtures of NH3, CH4, water vapor, and H2 to electrical sparks produced across a pair of electrodes (to simulate lightning) for periods of a week or more (Fig. 3–20), then analyzed the contents of the closed reaction vessel. The gas phase of the resulting mixture contained CO and CO2, as well as the starting

materials. The water phase contained a variety of organic compounds, including some amino acids, hydroxy acids, aldehydes, and hydrogen cyanide (HCN). This experiment established the possibility of abiotic production of biomolecules in relatively short times under relatively mild conditions.

Several developments have allowed more refined studies of the type pioneered by Miller and Urey, and have yielded strong evidence that a wide variety of biomolecules, including proteins and nucleic acids, could have been produced spontaneously from simple starting materials probably present on the earth at the time life arose.

Modern extensions of the Miller experiments have employed “atmospheres” that include CO2 and HCN, and much improved technology for identifying small quantities of products. The formation of hundreds of organic compounds has been demonstrated (Table 3–8). These compounds include more than ten of the common amino acids, a variety of mono-, di-, and tricarboxylic acids, fatty acids, adenine, and formaldehyde. Under certain conditions, formaldehyde polymerizes to form sugars containing three, four, five, and six carbons. The sources of energy that are effective in bringing about the formation of these compounds include heat, visible and ultraviolet (UV) light, x rays, gamma radiation, ultrasound and shock waves, and alpha and beta particles.

In addition to the many monomers that form in these experiments, polymers of nucleotides (nucleic acids) and of amino acids (proteins) also form. Some of the products of the self condensation of HCN are effective promoters of such polymerization reactions (Fig. 3–21), and inorganic ions present in the earth’s crust (Cu2+, Ni2+, and Zn2+) also enhance the rate of polymerization.

In short, laboratory experiments on the spontaneous formation of biomolecules under prebiotic conditions have provided good evidence that many of the chemical components of living cells, including proteins and RNA, can form under these conditions. Short polymers of RNA can act as catalysts in biologically significant reactions (Chapter 25), and it seems likely that RNA played a crucial role in prebiotic evolution, both as catalyst and as information repository.

Figure 3–22  One possible "RNA world" scenario, showing the transition from the prebiotic RNA world (shades of yellow) to the biotic DNA world (orange).

In modern organisms, nucleic acids encode the genetic information that specifies the structure of enzymes, and enzymes have the ability to catalyze the replication and repair of nucleic acids. The mutual dependence of these two classes of biomolecules poses the perplexing question: which came first, DNA or protein?

The answer may be: neither. The discovery that RNA molecules can act as catalysts in their own formation suggests that RNA may have been the first gene and the first catalyst. According to this scenario (Fig. 3–22), one of the earliest stages of biological evolution was the chance formation, in the primordial soup, of an RNA molecule that had the ability to catalyze the formation of other RNA molecules of the same sequence – a self-replicating, self-perpetuating RNA. The concentration of a self-replicating RNA molecule would increase exponentially, as one molecule formed two, two formed four, and so on. The fidelity of self replication was presumably less than perfect, so the process would generate variants of the RNA, some of which might be even better able to self-replicate. In the competition for nucleotides, the most efficient of the self-replicating sequences would win, and less efficient replicators would fade from the population.

The division of function between DNA (genetic information storage) and protein (catalysis) was, according to the "RNA world" hypothesis, a later development (Fig. 3–22). New variants of self-replicating RNA molecules developed, with the additional ability to catalyze the condensation of amino acids into peptides. Occasionally, the peptide(s) thus formed would reinforce the self-replicating ability of the RNA, and the pair – RNA molecule and helping peptide – could undergo further modifications in sequence, generating even more efficient self-replicating systems. Sometime after the evolution of this primitive protein-synthesizing system, there was a further development: DNA molecules with sequences complementary to the self-replicating RNA molecules took over the function of conserving the "genetic" information, and RNA molecules evolved to play roles in protein synthesis. Proteins proved to be versatile catalysts, and over time, assumed that function. Lipidlike compounds in the primordial soup formed relatively impermeable layers surrounding self-replicating collections of molecules. The concentration of proteins and nucleic acids within these lipid enclosures favored the molecular interactions required in self-replication.

This "RNA world" hypothesis is plausible but by no means universally accepted. The hypothesis does make testable predictions, and to the extent that experimental tests are possible within finite times (less than or equal to the life span of a scientist!), the hypothesis will be tested and refined.

Figure 3–23  Ancient reefs in Australia contain fossil evidence of microbial life in the sea of 3.5 billion years ago. Bits of sand and limestone became trapped in the sticky extracellular coats of cyanobacteria, gradually building up these stromatolites found in Hamelin Bay, Western Australia (a). Microscopic examination of thin sections of stromatolite reveals microfossils of filamentous bacteria (b).

The earth was formed about 4.5 billion years ago, and the first definitive evidence of life dates to about 3.5 billion years ago. An international group of scientists showed in 1980 that certain ancient rock formations (stromatolites; Fig. 3–23) in western Australia contained fossils of primitive microorganisms. Somewhere on earth during that first billion-year period, there arose the first simple organism, capable

of replicating its own structure from a template (RNA?) that was the first genetic material. Because the terrestrial atmosphere at the dawn of life was nearly devoid of oxygen, and because there were few microorganisms to scavenge organic compounds formed by natural processes, these compounds were relatively stable. Given this stability and eons of time, the improbable became inevitable: the organic compounds were incorporated into evolving cells to produce more and more effective self reproducing catalysts. The process of biological evolution had begun. Organisms developed mechanisms for harnessing the energy of sunlight through photosynthesis, to make sugars and other organic molecules from carbon dioxide, and to convert molecular nitrogen from the atmosphere into nitrogenous biomolecules such as amino acids. By developing their own capacities to synthesize biomolecules, cells became independent of the random processes by which such compounds had first appeared on earth. As evolution proceeded, organisms began to interact and to derive mutual benefits from each other’s products, forming increasingly complex ecological systems.
www.bioinfo.org.cn/book/biochemistry/chapt03/bio3.htm
Summary

Most of the dry weight of living organisms consists of organic compounds, molecules containing covalently bonded carbon backbones to which other carbon, hydrogen, oxygen, or nitrogen atoms may be attached. Carbon appears to have been selected in the course of biological evolution because of the ability of carbon atoms to form single and double bonds with each other, making possible formation of linear, cyclic, and branched backbone structures in great variety. To these backbones are attached different kinds of functional groups, which determine the chemical properties of the molecules. Organic biomolecules also have characteristic shapes (configurations and conformations) in three dimensions. Many biomolecules occur in asymmetric or chiral forms called enantiomers, stereoisomers that are nonsuperimposable mirror images of each other. Usually, only one of a pair of enantiomers has biological activity.

The strength of covalent chemical bonds, measured in joules, depends on the electronegativities and sizes of the atoms that share electrons. The enthalpy change (ΔH) for a chemical reaction reflects the number and kind of bonds made and broken. For endothermic reactions, ΔH is positive; for exothermic reactions, negative. The many different chemical reactions that occur within a cell fall into five general categories: group transfers, oxidation–reduction reactions, rearrangements of the bonds around carbon atoms, breakage or formation of carbon–carbon bonds, and condensations.

Most of the organic matter in living cells consists of macromolecules: nucleic acids, proteins, and polysaccharides. Each type of macromolecule is composed of small, covalently linked monomeric subunits of relatively few kinds. Proteins are polymers of 20 different kinds of amino acids, nucleic acids are polymers of different nucleotide units (four in DNA, four in RNA), and polysaccharides are polymers of recurring sugar

units. Nucleic acids and proteins are informational macromolecules; the characteristic sequences of their subunits constitute the genetic individuality of a species. Simple polysaccharides act as structural components, but some complex polysaccharides also are informational macromolecules.

There is a structural hierarchy in the molecular organization of cells. Cells contain organelles, such as nuclei, mitochondria, and chloroplasts, which in turn contain supramolecular complexes, such as membranes and ribosomes, and these consist in turn of clusters of macromolecules that are bound together by many relatively weak, noncovalent forces. The macromolecules consist of covalently linked subunits. The formation of macromolecules from simple subunits creates order (decreases entropy); this synthesis requires energy and therefore must be coupled to exergonic reactions.

The small biomolecules such as amino acids and sugars probably first arose spontaneously from atmospheric gases and water under the influence of electrical energy (lightning) during the early history of the earth. Such processes, called chemical evolution, can be simulated in the laboratory. The monomeric subunits of cellular macromolecules appear to have been selected during early biological evolution as being the most fit for their biological functions. These subunit molecules are relatively few in number, but are very versatile; evolution has combined small biomolecules to yield macromolecules capable of diverse biological functions. The first macromolecules may have been RNA molecules that were capable of catalyzing their own replication. Later in evolution, DNA took over the function of storing genetic information, proteins became the cellular catalysts, and RNA mediated between these, allowing the expression of genetic information as proteins.

Further Reading

General

Baker, J.J. & Allen, G.E. (1981) Matter, Energy, and Life: An Introduction to Chemical Concepts, 4th edn, Addison-Wesley Publishing Co., Inc., Reading, MA. 

Callewaert, D.M. & Genyea, J. (1980) Basic Chemistry: General, Organic, Biological, Worth Publishers, Inc., New York. 

Dickerson, R.E. & Geis, I. (1976) Chemistry, Matter, and the Universe, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 

Frieden, E. (1972) The chemical elements of life. Sci. Am. 227 (July), 52–61. 

The Molecules of Life. (1985) Sci. Am. 253 (October). 
An entire issue devoted to the structure and function of biomolecules. It includes articles on DNA, RNA, and proteins, and their subunits.

Chemistry and Stereochemistry

Brewster, J.H. (1986) Stereochemistry and the origins of life. J. Chem. Educ. 8, 667–670. 
An interesting and lucid discussion of the ways in which evolution could have selected only one of two stereoisomers for the construction of proteins and other molecules.

Hegstrom, R.A. & Kondepudi, D.K. (1990) The handedness of the universe. Sci. Am. 262 (January), 108–115. 
Stereochemistry and the asymmetry of biomolecules, viewed in the context of the universe.

Loudon, M. (1988) Organic Chemistry, 2nd edn, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 
This and the following two books provide details on stereochemistry and the chemical reactivity of functional groups. All excellent textbooks.

Morrison, R.T. & Boyd, R.N. (1992) Organic Chemistry, 6th edn, Allyn & Bacon, Inc., Boston, MA. 

Streitweiser, A. Jr. & Heathcock, C.H. (1981) Introduction to Organic Chemistry, 2nd edn, Macmillan Publishing Co., Inc., New York. 

Prebiotic Evolution

Cavalier-Smith, T. (1987) The origin of cells: a symbiosis between genes, catalysts, and membranes. Cold Spring Harb. Symp. Quant. Biol. 52, 805–824 

Darnell, J.E. & Doolittle, W.F. (1986) Speculations on the early course of evolution. Proc. Natl. Acad. Sci. USA 83, 1271–1275 
A clear statement of the RNA world scenario.

Evolution of Catalytic Function. (1987) Cold Spring Harb. Symp. Quant. Biol. 52. 
A collection of almost 100 articles on all aspects of prebiotic and early biological evolution; probably the single best source on molecular evolution.

Ferris, J.P. (1984) The chemistry of life’s origin. Chem. Eng. News 62, 21–35 
A short, clear description of the experimental evidence for the synthesis of biomolecules under prebiotic conditions.

Horgan, J. (1991) In the beginning . . . Sci. Am. 264 (February), 116–125 
A brief, clear statement of current theories regarding prebiotic evolution.

Miller, S.L. (1987) Which organic compounds could have occurred on the prebiotic earth? Cold Spring Harb. Symp. Quant. Biol. 52, 17–27 

Schopf, J.W. (ed) (1983) Earth’s Earliest Biosphere, Princeton University Press, Princeton, NJ. 
A comprehensive discussion of geologic history and its relation to the development of life.

Problems

1. Vitamin C: Is the Synthetic Vitamin as Good as the Natural One?  One claim put forth by purveyors of health foods is that vitamins obtained from natural sources are more healthful than those obtained by chemical synthesis. For example, it is claimed that pure L-ascorbic acid (vitamin C) obtained from rose hips is better for you than pure L-ascorbic acid manufactured in a chemical plant. Are the vitamins from the two sources different? Can the body distinguish a vitamin’s source?

2. Identification of Functional Groups  Figure 3–5 shows the common functional groups of biomolecules. Since the properties and biological activities of biomolecules are largely determined by their functional groups, it is important to be able to identify them. In each of the molecules at right, identify the constituent functional groups.

3. Drug Activity and Stereochemistry  The quantitative differences in biological activity between the two enantiomers of a compound are sometimes quite large. For example, the D-isomer of the drug isoproterenol, used to treat mild asthma, is 50 to 80 times more effective as a bronchodilator than the L-isomer. Identify the chiral center in isoproterenol. Why would the two enantiomers have such radically different bioactivity?

4. Drug Action and Shape of Molecules  Some years ago two drug companies marketed a drug under the trade names Dexedrine and Benzedrine. The structure of the drug is shown below.

The physical properties (C, H, and N analysis, melting point, solubility, etc.) of Dexedrine and Benzedrine were identical. The recommended oral dosage of Dexedrine (which is still available) was 5 mg/d, but the recommended dosage of Benzedrine was significantly higher. Apparently it required considerably more Benzedrine than Dexedrine to yield the same physiological response. Explain this apparent contradiction.

5. Components of Complex Biomolecules  Figure 3–16 shows the structures of the major components of complex biomolecules. For each of the three important biomolecules below (shown in their ionized forms at physiological pH), identify the constituents.

       (a) Guanosine triphosphate (GTP), an energyrich nucleotide that serves as precursor to RNA:

       (b) Phosphatidylcholine, a component of many membranes:

       (c) Methionine enkephalin, the brain’s own opiate:

6. Determination of the Structure of a Biomolecule  An unknown substance, X, was isolated from rabbit muscle. The structure of X was determined from the following observations and experiments. Qualitative analysis showed that X was composed entirely of C, H, and O. A weighed sample of X was completely oxidized, and the amount of H2O and CO2 produced was measured. From this quantitative analysis, it was concluded that X contains 40.00% C, 6.71% H, and 53.29% O by weight. The molecular mass of X was determined by a mass spectrometer and found to be 90.00. An infrared spectrum of X showed that it contained one double bond. X dissolved readily in water to give an acidic solution. A solution of X was tested in a polarimeter and demonstrated optical activity.

       (a) Determine the empirical and molecular formula of X.
       (b) Draw the possible structures of X that fit the molecular formula and contain one double bond. Consider only linear or branched structures and disregard cyclic structures. Note that oxygen makes very poor bonds to itself.
       (c) What is the structural significance of the observed optical activity? Which structures in (b) does this observation eliminate? Which structures are consistent with the observation?
       (d) What is the structural significance of the observation that a solution of X was acidic? Which structures in (b) are now eliminated? Which structures are consistent with the observation?
       (e) What is the structure of X? Is more than one structure consistent with all the data?
www.bioinfo.org.cn/book/biochemistry/chapt03/bio4.htm
This view of the earth from space shows that most of the planet’s surface is covered with water. The seas, where living organisms probably first arose, are today the habitat of countless modern organisms.
Chapter 4
Water: Its Effect on Dissolved Biomolecules
Water is the most abundant substance in living systems, making up 70% or more of the weight of most organisms. Water pervades all portions of every cell and is the medium in which the transport of nutrients, the enzyme-catalyzed reactions of metabolism, and the transfer of chemical energy occur. The first living organisms probably arose in the primeval oceans; evolution was shaped by the properties of the medium in which it occurred. All aspects of cell structure and function are adapted to the physical and chemical properties of water. This chapter begins with descriptions of these physical and chemical properties. The strong attractive forces between water molecules result in water’s solvent properties. The slight tendency of water to ionize is also of crucial importance to the structure and function of biomolecules, and we will review the topic of ionization in terms of equilibrium constants, pH, and titration curves. Finally, we will consider the way in which aqueous solutions of weak acids or bases and their salts act as buffers against pH changes in biological systems. The water molecule and its ionization products, H+ and OH, profoundly influence the structure, self-assembly, and properties of all cellular components, including enzymes and other proteins, nucleic acids, and lipids. The noncovalent interactions responsible for the specificity of "recognition" among biomolecules are decisively influenced by the solvent properties of water.

Hydrogen bonds between water molecules provide the cohesive forces that make water a liquid at room temperature and that favor the extreme ordering of molecules typical of crystalline water (ice). Polar biomolecules dissolve readily in water because they can replace energetically favorable water–water interactions with even more favorable water–solute interactions (hydrogen bonds and electrostatic interactions). In contrast, nonpolar biomolecules interfere with favorable water–water interactions and are poorly soluble in water. In aqueous solutions, these molecules tend to cluster together to minimize the energetically unfavorable effects of their presence.

Hydrogen bonds and ionic, hydrophobic (Greek, "water-fearing"), and van der Waals interactions, although individually weak, are numerous in biological macromolecules and collectively have a very significant influence on the three-dimensional structures of proteins, nucleic acids, polysaccharides, and membrane lipids. Before we begin a

detailed discussion of these biomolecules in the following chapters, it is useful to review the properties of the solvent, water, in which they are assembled and carry out their functions.
* The heat energy required to convert 1.0 g of a liquid at its boiling point, at atmospheric pressure, into its gaseous state at the same temperature. It is a direct measure of the energy required to overcome attractive forces between molecules in the liquid phase.
Figure 4–1  The dipolar nature of the H2O molecule, shown (a) by ball-and-stick and (b) by spacefilling models. The dashed lines in (a) represent the nonbonding orbitals. There is a nearly tetrahedral arrangement of the outer shell electron pairs around the oxygen atom; the two hydrogen atoms have localized partial positive charges and the oxygen atom has two localized partial negative charges. (c) Two H2O molecules joined by a hydrogen bond (designated by three blue lines) between the oxygen atom of the upper molecule and a hydrogen atom of the lower one. Hydrogen bonds are longer and weaker than covalent O–H bonds.
Figure 4–2  In ice, each water molecule forms the maximum of four hydrogen bonds, creating a regular crystal lattice. In liquid water at room temperature, by contrast, each water molecule forms an average of 3.4 hydrogen bonds with other water molecules. The crystal lattice of ice occupies more space than the same number of H2O molecules occupy in liquid water; ice is less dense than liquid water, and thus floats.

Water has a higher melting point, boiling point, and heat of vaporization than most other common liquids (Table 4–1). These unusual properties are a consequence of strong attractions between adjacent water molecules, which give liquid water great internal cohesion.

What is the cause of these strong intermolecular attractions in liquid water? Each hydrogen atom of a water molecule shares an electron pair with the oxygen atom. The geometry of the water molecule is dictated by the shapes of the outer electron orbitals of the oxygen atom, which are similar to the bonding orbitals of carbon (see Fig. 3–4a). These orbitals describe a rough tetrahedron, with a hydrogen atom at each of two corners and unshared electrons at the other two (Fig. 4–1). The H–O–H bond angle is 104.5° slightly less than the 109.5° of a perfect tetrahedron; the nonbonding orbitals of the oxygen atom slightly compress the orbitals shared by hydrogen.

The oxygen nucleus attracts electrons more strongly than does the hydrogen nucleus (i.e., the proton); oxygen is more electronegative (see Table 3–4). The sharing of electrons between H and O is therefore unequal; the electrons are more often in the vicinity of the oxygen atom than of the hydrogen. The result of this unequal electron sharing is two electric dipoles in the water molecule, one along each of the H–O bonds; the oxygen atom bears a partial negative charge (δ), and each hydrogen a partial positive charge (δ+). The resulting electrostatic attraction between the oxygen atom of one water molecule and the hydrogen of another (Fig. 4–1c) constitutes a hydrogen bond.

Hydrogen bonds are weaker than covalent bonds. The hydrogen bonds in liquid water have a bond energy (the energy required to break a bond) of only about 20 kJ/mol, compared with 460 kJ/mol for the covalent O–H bond. At room temperature, the thermal energy of an aqueous solution (the kinetic energy resulting from the motion of individual atoms and molecules) is of the same order as that required to break hydrogen bonds. When water is heated, its temperature

increase reflects the faster motion of individual water molecules. Although at any given time most of the molecules in liquid water are hydrogen-bonded, the lifetime of each hydrogen bond is less than 1 × 10–9 s. The apt phrase “flickering clusters” has been applied to the short-lived groups of hydrogen-bonded molecules in liquid water. The very large number of hydrogen bonds between molecules nevertheless confers great internal cohesion on liquid water.

The nearly tetrahedral arrangement of the orbitals about the oxygen atom (Fig. 4–1a) allows each water molecule to form hydrogen bonds with as many as four neighboring water molecules. At any given instant in liquid water at room temperature, each water molecule forms hydrogen bonds with an average of 3.4 other water molecules. The water molecules are in continuous motion in the liquid state, so hydrogen bonds are constantly and rapidly being broken and formed. In ice, however, each water molecule is fixed in space and forms hydrogen bonds with four other water molecules, to yield a regular lattice structure (Fig. 4–2). To break the large numbers of hydrogen bonds in such a lattice requires much thermal energy, which accounts for the relatively high melting point of water (Table 4–1). When ice melts or water evaporates, heat is taken up by the system:

H2O(s)  →  H2O(l)   ΔH = +5.9 kJ/mol
H2O(l)  →  H2O(g)   ΔH = +44.0 kJ/mol

During melting or evaporation, the entropy of the aqueous system increases as more highly ordered arrays of water molecules relax into the less orderly hydrogen-bonded arrays in liquid water, or the wholly disordered water molecules in the gaseous state. At room temperature, both the melting of ice and the evaporation of water occur spontaneously; the tendency of the water molecules to associate through hydrogen bonds is outweighed by the energetic push toward randomness. Recall that the free-energy change (ΔG) must have a negative value for a process to occur spontaneously: ΔG = ΔHTΔS, where ΔG represents the driving force, ΔH the energy from making and breaking bonds, and ΔS the increase in randomness. Since ΔH is positive for melting and evaporation, it is clearly the increase in entropy (ΔS) that makes ΔG negative and drives these transformations.
Figure 4–3  Common types of hydrogen bonds. In biological systems, the electronegative atom (the hydrogen acceptor) is usually oxygen or nitrogen. The distance between two hydrogen-bonded atoms varies from 0.26 to 0.31 nm.
Figure 4–4  Some hydrogen bonds of biological importance.
Figure 4–5  Directionality of the hydrogen bond. The attraction between the partial electric charges (see Fig. 4–1) is greatest when the three atoms involved (in this case O, H, and O) lie in a straight line.

Hydrogen bonds are not unique to water. They readily form between an electronegative atom (usually oxygen or nitrogen) and a hydrogen atom covalently bonded to another electronegative atom in the same or another molecule (Fig. 4–3). However, hydrogen atoms covalently bonded to carbon atoms, which are not electronegative, do not participate in hydrogen bonding. The distinction explains why butanol (CH3CH2CH2CH2OH) has a relatively high boiling point of 117 °C, whereas butane (CH3CH2CH2CH3) has a boiling point of only –0.5 °C. Butanol has a polar hydroxyl group and thus can form hydrogen bonds with other butanol molecules.

Uncharged but polar biomolecules such as sugars dissolve readily in water because of the stabilizing effect of the many hydrogen bonds that form between the hydroxyl groups or the carbonyl oxygen of the sugar and the polar water molecules. Alcohols, aldehydes, and ketones all form hydrogen bonds with water, as do compounds containing N–H bonds (Fig. 4–4), and molecules containing such groups tend to be soluble in water.

Hydrogen bonds are strongest when the bonded molecules are oriented to maximize electrostatic interaction, which occurs when the hydrogen atom and the two atoms that share it are in a straight line (Fig. 4–5). Hydrogen bonds are thus highly directional and capable of holding two hydrogen-bonded molecules or groups in a specific geometric arrangement. We shall see later that this property of hydrogen bonds confers very precise three-dimensional structures upon protein and nucleic acid molecules, in which there are many intramolecular hydrogen bonds.

Water is a polar solvent. It readily dissolves most biomolecules, which are generally charged or polar compounds (Table 4–2); compounds that dissolve easily in water are hydrophilic (Greek, "water-loving"). In contrast, nonpolar solvents such as chloroform and benzene are poor solvents for polar biomolecules, but easily dissolve nonpolar biomolecules such as lipids and waxes.

Water dissolves salts such as NaCl by hydrating and stabilizing the Na+ and Cl ions, weakening their electrostatic interactions and thus counteracting their tendency to associate in a crystalline lattice (Fig. 4–6). The solubility of charged biomolecules in water is also a result of hydration and charge screening. Compounds with functional groups such as ionized carboxylic acids (–COO), protonated amines (–NH3+), and phosphate esters or anhydrides are generally soluble in water for the same reason.

Water is especially effective in screening the electrostatic interactions between dissolved ions. The strength, or force (F), of these ionic interactions depends upon the magnitude of the charges (Q), the distance between the charged groups (r), and the dielectric constant (ϵ) of the solvent through which the interactions occur:

Q1Q2
F  =  
ϵr2

The dielectric constant is a physical property reflecting the number of dipoles in a solvent. For water at 25 °C, ϵ (which is dimensionless) 78.5, and for the very nonpolar solvent benzene, ϵ is 4.6. Thus ionic interactions are much stronger in less polar environments. The dependence on r2 is such that ionic attractions or repulsions operate over limited distances, in the range of 10 to 40 nm (depending on the electrolyte concentration) when the solvent is water.
Figure 4–6  Water dissolves many crystalline salts by hydrating their component ions. The NaCl crystal lattice is disrupted as water molecules cluster about the Cl and Na+ ions. The ionic charges are thus partially neutralized, and the electrostatic attractions necessary for lattice formation are weakened.

As a salt such as NaCl dissolves, the Na+ and Cl ions leaving the crystal lattice acquire far greater freedom of motion (Fig. 4–6). The resulting increase in the entropy (randomness) of the system is largely responsible for the ease of dissolving salts such as NaCl in water. In thermodynamic terms, formation of the solution occurs with a favorable change in free energy: ΔG = ΔHTΔS, where ΔH has a small positive value and TΔS a large positive value; thus ΔG is negative.

* The arrows represent electric dipoles; there is a partial negative charge (δ) at the head of the arrow, a partial positive charge (δ+; not shown here) at the tail.

The biologically important gases CO2, O2, and N2 are nonpolar. In the diatomic molecules O2 and N2, electrons are shared equally by both atoms. In CO2, each C=O bond is polar, but the two dipoles are oppositely directed and cancel each other (Table 4–3). The movement of these molecules from the disordered gas phase into aqueous solution constrains their motion and therefore represents a decrease in entropy. These gases are consequently very poorly soluble in water (Table 4–3). Some organisms have water-soluble carrier proteins (hemoglobin and myoglobin, for example) that facilitate the transport of O2. Carbon dioxide forms carbonic acid (H2CO3) in aqueous solution, and is transported in that form.

Two other gases, NH3 and H2S, also have biological roles in some organisms; these are polar and dissolve readily in water (Table 4–3).

www.bioinfo.org.cn/book/biochemistry/chapt04/bio1.htm
Figure 4–7  (a) The long-chain fatty acids have very hydrophobic alkyl chains, each of which is surrounded by a layer of highly ordered water molecules. (b) By clustering together in micelles, the fatty acid molecules expose a smaller hydrophobic surface area to the water, and fewer water molecules are found in the shell of ordered water. The energy gained by freeing immobilized water molecules stabilizes the micelle.

When water is mixed with a hydrocarbon such as benzene or hexane, two phases form; neither liquid is soluble in the other. Shorter hydrocarbons such as ethane have small but measurable solubility in water. Nonpolar compounds such as benzene, hexane, and ethane are hydrophobic – they are unable to undergo energetically favorable interactions with water molecules, and they actually interfere with the hydrogen bonding among water molecules. All solute molecules or ions dissolved in water interfere with the hydrogen bonding of some water molecules in their immediate vicinity, but polar or charged solutes (such as NaCl) partially compensate for lost hydrogen bonds by forming new solute–water interactions. The net change in enthalpy (ΔH) for dissolving these solutes is generally small. Hydrophobic solutes offer no such compensation, and their addition to water may therefore result in a small gain of enthalpy; the breaking of hydrogen bonds requires the addition of energy to the system. Furthermore, dissolving hydrophobic solutes in water results in a measurable decrease in entropy. Water molecules in the immediate vicinity of a nonpolar solute are constrained in their possible orientations, resulting in a shell of

highly ordered water molecules around each solute molecule. The number of water molecules in the highly ordered shell is proportional to the surface area of the hydrophobic solute. The free-energy change for dissolving a nonpolar solute in water is thus unfavorable: ΔG = ΔHTΔS, where ΔH has a positive value, ΔS a negative value, and thus ΔG is positive.

Amphipathic compounds contain regions that are polar (or charged) and regions that are nonpolar (Table 4–2). When amphipathic compounds are mixed with water, the two regions of the solute molecule experience conflicting tendencies; the polar or charged, hydrophilic region interacts favorably with the solvent and tends to dissolve, but the nonpolar, hydrophobic region has the opposite tendency, to avoid contact with the water (Fig. 4–7a). The nonpolar regions of the molecules cluster together to present the smallest hydrophobic area to the solvent, and the polar regions are arranged to maximize their interaction with the aqueous solvent (Fig. 4–7b). These stable structures of amphipathic compounds in water, called micelles, may contain hundreds or thousands of molecules. The forces that hold the nonpolar regions of the molecules together are called hydrophobic interactions. The strength of these interactions is not due to any intrinsic attraction between nonpolar molecules. Rather, it results from the system’s achieving greatest thermodynamic stability by minimizing the entropy decrease that results from the ordering of water molecules around hydrophobic portions of the solute molecule.

Many biomolecules are amphipathic (Table 4–2); proteins, pigments, certain vitamins, and the sterols and phospholipids of membranes all have polar and nonpolar surface regions. Structures composed of these molecules are stabilized by hydrophobic interactions among the nonpolar regions. Hydrophobic interactions among lipids, and between lipids and proteins, are the most important determinants of structure in biological membranes; and hydrophobic interactions between nonpolar amino acids stabilize the three-dimensional folding patterns of proteins.

Hydrogen bonding between water and polar solutes also causes some ordering of water molecules, but the effect is less significant than with nonpolar solutes. Part of the driving force for the binding of a polar substrate to the complementary polar surface of an enzyme is the entropy increase resulting from the disordering of ordered water molecules around the substrate (reactant), as the enzyme displaces hydrogen-bonded water from the substrate.
Figure 4–8  The changes in energy as two atoms approach. Two opposite forces operate on the atoms, plotted here as a function of the distance between the atoms: an attraction that increases as the two approach (blue), and a repulsion that increases very sharply as the atoms come so close that their outer electron orbitals overlap (black). The net energy of the interaction is the sum of these two (red); an energy minimum occurs just before the repulsive effect dominates (at rme). The closest approach that is energetically feasible, rv, defines the van der Waals radii; it is the sum of the van der Waals radii of the two atoms.

When two uncharged atoms are brought very close together, their surrounding electron clouds influence each other. Random variations in the positions of the electrons around one nucleus may create a transient electric dipole, which induces a transient, opposite electric dipole in the nearby atom. The two dipoles are weakly attracted to each other, bringing the two nuclei closer. The force of this weak attraction is the van der Waals interaction. As the two nuclei draw closer together, their electron clouds begin to repel each other, and at some point the van der Waals attraction exactly balances this repulsive force (Fig. 4–8); the nuclei cannot be brought closer, and are said to be in van der Waals contact. For each atom, there is a characteristic van der Waals radius, a measure of how close that atom will allow another to approach (see Table 3–3).

The noncovalent interactions we have described (hydrogen bonds and ionic, hydrophobic, and van der Waals interactions) are much weaker than covalent bonds (see Table 3–5). The input of about 350 kJ of energy is required to break a mole (6 × 1023) of C–C single bonds, and of about 410 kJ to break a mole of C–H bonds, but only 4 to 8 kJ is sufficient to disrupt a mole of typical van der Waals interactions (Table 4–4). Hydrophobic interactions are similarly weak, and ionic interactions and hydrogen bonds are only a little stronger; a typical hydrogen bond in aqueous solvent can be broken by the input of about 20 kJ/mol.

In aqueous solvent at 25 °C, the available thermal energy is of the same order as the strength of these weak interactions. Furthermore, the interaction between solute and solvent (water) molecules is nearly as favorable as solute–solute interactions. Consequently, hydrogen bonds and ionic, hydrophobic, and van der Waals interactions are continually formed and broken.

Although these four types of interactions are individually weak relative to covalent bonds, the cumulative effect of many such interactions in a protein or nucleic acid can be very significant. For example, the noncovalent binding of an enzyme to its substrate may involve several hydrogen bonds and one or more ionic interactions, as well as hydrophobic and van der Waals interactions. The formation of each of these weak bonds contributes to a net decrease in free energy; this binding free energy is released as bond formation stabilizes the system. The stability of a noncovalent interaction such as that of a small molecule hydrogen-bonded to its macromolecular partner is calculable from the binding energy. Stability, as measured by the equilibrium constant (see below) of the binding reaction, varies exponentially with binding energy. The unfolding of a molecule stabilized by numerous weak interactions requires many of these interactions to be disrupted at the same time; because the interactions fluctuate randomly, such simultaneous disruptions are very unlikely. The molecular stability bestowed by two or five or 20 weak interactions is therefore much greater than would be expected from a simple addition of binding energies.

Macromolecules such as proteins, DNA, and RNA contain so many sites of potential hydrogen bonding or ionic, van der Waals, or hydrophobic interactions that the cumulative effect of the many small binding forces is enormous. The most stable (native) structure of most macromolecules is that in which weak-bonding possibilities are maximized. The folding of a single polypeptide or polynucleotide chain into its three-dimensional shape is determined by this principle. The binding of an antigen to a specific antibody depends on the cumulative effects of many weak interactions. The energy released when an enzyme binds noncovalently to its substrate is the main source of catalytic power for the enzyme. The binding of a hormone or a neuro­transmitter to its cellular receptor protein is the result of weak interactions. One consequence of the size of enzymes and receptors is that their large surfaces provide many opportunities for weak interactions. At the molecular level, the complementarity between interacting biomolecules reflects the complementarity and weak interactions between polar, charged, and hydrophobic groups on the surfaces of the molecules.

Although many of the solvent properties of water can be explained in terms of the uncharged H2O molecule, the small degree of ionization of water to hydrogen ions (H+) and hydroxide ions (OH) must also be taken into account. Like all reversible reactions, the ionization of water can be described by an equilibrium constant. When weak acids or weak bases are dissolved in water, they can contribute H+ by ionizing (if acids) or consume H+ by being protonated (if bases); these processes are also governed by equilibrium constants. The total hydrogen ion concentration from all sources is experimentally measurable; it is expressed as the pH of the solution. To predict the state of ionization of solutes in water, we must take into account the relevant equilibrium constants for each ionization reaction. We therefore turn now to a brief discussion of the ionization of water and of weak acids and bases dissolved in water.

Water molecules have a slight tendency to undergo reversible ionization to yield a hydrogen ion and a hydroxide ion, giving the equilibrium

H2O ⇌ H+ + OH
(4–1)

This reversible ionization is crucial to the role of water in cellular function, so we must have a means of expressing the extent of ionization of water in quantitative terms. A brief review of some properties of reversible chemical reactions will show how this can be done.

The position of equilibrium of any chemical reaction is given by its equilibrium constant. For the generalized reaction

A + B ⇌ C + D
(4–2)

an equilibrium constant can be defined in terms of the concentrations of reactants (A and B) and products (C and D) present at equilibrium:

[C][D]
Keq  =  
[A][B]

(Strictly speaking, the concentration terms should be the activities, or effective concentrations in nonideal solutions, of each species. Except in very accurate work, the equilibrium constant may be approximated by measuring the concentrations at equilibrium.)

The equilibrium constant is fixed and characteristic for any given chemical reaction at a specified temperature. It defines the composition of the final equilibrium mixture of that reaction, regardless of the starting amounts of reactants and products. Conversely, one can calculate the equilibrium constant for a given reaction at a given temperature if the equilibrium concentrations of all its reactants and products are known. We will show in a later chapter that the standard free-energy change (ΔG°) is directly related to Keq.

The degree of ionization of water at equilibrium (Eqn 4–1) is small; at 25 °C only about one of every 107 molecules in pure water is ionized at any instant. The equilibrium constant for the reversible ionization of water (Eqn 4–1) is

[H+][OH]
Keq  =  
[H2O]
(4–3)

In pure water at 25 °C, the concentration of water is 55.5 M (grams of H2O in 1 L divided by the gram molecular weight, or 1000/18 = 55.5 M), and is essentially constant in relation to the very low concentrations of H+ and OH, namely, 1 × 10–7 M. Accordingly, we can substitute 55.5 M in the equilibrium constant expression (Eqn 4–3) to yield

[H+][OH]
Keq  =  
55.5 M

which, on rearranging, becomes
(55.5 M)(Keq)  =   [H+][OH]  =  Kw
(4–4)

where Kw designates the product (55.5 M)(Keq), the ion product of water at 25°C.

The value for Keq has been determined by electrical-conductivity measurements of pure water (in which only the ions arising from the dissociation of H2O can carry current) and found to be 1.8 × 10−16 M at 25 °C. Substituting this value for Keq in Equation 4–4 gives

(55.5 M)(1.8 × 10−16 M)   =  [H+][OH]
99.9 × 10−16 M2   =  [H+][OH]
1.0 × 10−14 M2   =  [H+][OH]   =  Kw

Thus the product [H+][OH] in aqueous solutions at 25 °C always equals 1 × 10–14 M2. When there are exactly equal concentrations of both H+ and OH, as in pure water, the solution is said to be at neutral pH. At this pH, the concentration of H+ and OH can be calculated from the ion product of water as follows:

Kw  =  [H+][OH]  =  [H+]2

Solving for [H+] gives

[H+]  =  Kw0.5  =  (1 × 10−14 M2)0.5
[H+]  =  [OH]  =  10–7 M

As the ion product of water is constant, whenever the concentration of H+ ions is greater than 1 × 10–7 M, the concentration of OH must become less than 1 × 10–7 M, and vice versa. When the concentration of H+ is very high, as in a solution of hydrochloric acid, the OH concentration must be very low. From the ion product of water we can calculate the H+ concentration if we know the OH concentration, and vice versa (Box 4–1).
B O X  4–1
The Ion Product of Water: Two Illustrative Problems

* The expression pOH is sometimes used to describe the basicity, or OH concentration, of a solution; pOH is defined by the expression pOH = –log[OH], which is analogous to the expression for pH. Note that for all cases, pH + pOH = 14.
Figure 4–9  The pH of some aqueous fluids.

The ion product of water, Kw, is the basis for the pH scale (Table 4–5). It is a convenient means of designating the actual concentration of H+ (and thus of OH) in any aqueous solution in the range between 1.0 M H+ and 1.0 M OH. The term pH is defined by the expression

1
pH  =  log 
  =  –log[H+]
 [H+

The symbol p denotes "negative logarithm of." For a precisely neutral solution at 25 °C, in which the concentration of hydrogen ions is 1.0 × 10–7 M, the pH can be calculated as follows:

1
pH   =   log 
  =  log (1 × 107)  =  log 1.0 + log 107
 1 × 10–7 
=   0 +  7.0
=   7.0

The value of 7.0 for the pH of a precisely neutral solution is not an arbitrarily chosen figure; it is derived from the absolute value of the ion product of water at 25 °C, which by convenient coincidence is a round number. Solutions having a pH greater than 7 are alkaline or basic; the concentration of OH is greater than that of H+. Conversely, solutions having a pH less than 7 are acidic (Table 4–5).

Note that the pH scale is logarithmic, not arithmetic. To say that two solutions differ in pH by 1 pH unit means that one solution has ten times the H+ concentration of the other, but it does not tell us the

absolute magnitude of the difference. Figure 4–9 gives the pH of some common aqueous fluids. A cola drink (pH 3.0) or red wine (pH 3.7) has an H+ concentration approximately 10,000 times greater than that of blood (pH 7.4).

The pH of an aqueous solution can be approximately measured using various indicator dyes, including litmus, phenolphthalein, and phenol red, which undergo color changes as a proton dissociates from the dye molecule. Accurate determinations of pH in the chemical or clinical laboratory are made with a glass electrode that is selectively sensitive to H+ concentration but insensitive to Na+, K+, and other cations. In a pH meter the signal from such an electrode is amplified and compared with the signal generated by a solution of accurately known pH.

Measurement of pH is one of the most important and frequently used procedures in biochemistry. The pH affects the structure and activity of biological macromolecules; for example, the catalytic activity of enzymes. Measurements of the pH of the blood and urine are commonly used in diagnosing disease. The pH of the blood plasma of severely diabetic people, for example, is often lower than the normal value of 7.4; this condition is called acidosis. In certain other disease states the pH of the blood is higher than normal, the condition of alkalosis.

www.bioinfo.org.cn/book/biochemistry/chapt04/bio2.htm

Hydrochloric, sulfuric, and nitric acids, commonly called strong acids, are completely ionized in dilute aqueous solutions; the strong bases NaOH and KOH are also completely ionized.

Biochemists are often more concerned with the behavior of weak acids and bases – those not completely ionized when dissolved in water. These are common in biological systems and play important roles in metabolism and its regulation. The behavior of aqueous solutions of weak acids and bases is best understood if we first define some terms.

Acids may be defined as proton donors and bases as proton acceptors. A proton donor and its corresponding proton acceptor make up a conjugate acid–base pair (Table 4–6). Acetic acid (CH3COOH), a proton donor, and the acetate anion (CH3COO), the corresponding proton acceptor, constitute a conjugate acid–base pair, related by the reversible reaction

CH3COOH  ⇌  H+ + CH3COO

Each acid has a characteristic tendency to lose its proton in an aqueous solution. The stronger the acid, the greater its tendency to lose its proton. The tendency of any acid (HA) to lose a proton and form its conjugate base (A) is defined by the equilibrium constant (K) for the reversible reaction

HA  ⇌  H+ + A
which is

[H+][A]
K  =  
[HA]

Equilibrium constants for ionization reactions are more usually called ionization or dissociation constants. The dissociation constants of some acids, often designated Ka, are given in Table 4–7. Stronger acids, such as formic and lactic acids, have higher dissociation constants; weaker acids, such as dihydrogen phosphate (H2PO4), have lower dissociation constants.

Also included in Table 4–7 are values of pKa, which is analogous to pH and is defined by the equation

1
pKa  =   log 
  =  – log Ka
 Ka 

The more strongly dissociated the acid, the lower its pKa. As we shall now see, the pKa of any weak acid can be determined quite easily.
Figure 4–10  The titration curve of acetic acid. After the addition of each increment of NaOH to the acetic acid solution, the pH of the mixture is measured. This value is plotted against the fraction of the total amount of NaOH required to neutralize the acetic acid (i.e., to bring it to pH ≈ 7). The points so obtained yield the titration curve. Shown in the boxes are the predominant ionic forms at the points designated. At the midpoint of the titration, the concentrations of the proton donor and proton acceptor are equal. The pH at this point is numerically equal to the pKa of acetic acid. The shaded zone is the useful region of buffering power.
Figure 4–11  Comparison of the titration curves of three weak acids, CH3COOH, H2PO4, and NH4+. The predominant ionic forms at designated points in the titration are given in boxes. The regions of buffering capacity are indicated at the right. Conjugate acid–base pairs are effective buffers between approximately 25 and 75% neutralization of the proton-donor species.

Titration is used to determine the amount of an acid in a given solution. In this procedure, a measured volume of the acid is titrated with a solution of a strong base, usually sodium hydroxide (NaOH), of known concentration. The NaOH is added in small increments until the acid is consumed (neutralized), as determined with an indicator dye or with a pH meter. The concentration of the acid in the original solution can be calculated from the volume and concentration of NaOH added.

A plot of the pH against the amount of NaOH added (a titration curve) reveals the pKa of the weak acid. Consider the titration of a 0.1 M solution of acetic acid (HAc) with 0.1 M NaOH at 25 °C (Fig. 4–10). Two reversible equilibria are involved in the process:

H2O   ⇌   H+ + OH
(4–5)
HAc  ⇌  H+ + Ac
(4–6)

The equilibria must simultaneously conform to their characteristic equilibrium constants, which are, respectively,

Kw  =  [H+][OH]  =  1 × 10–14 M2
(4–7)
[H+][Ac]
Ka  =  
  =  1.74 × 10–5 M
[HAc]
(4–8)

At the beginning of the titration, before any NaOH is added, the acetic acid is already slightly ionized, to an extent that can be calculated from its dissociation constant (Eqn 4–8).

As NaOH is gradually introduced, the added OH combines with the free H+ in the solution to form H2O, to an extent that satisfies the equilibrium relationship in Equation 4–7. As free H+ is removed, HAc dissociates further to satisfy its own equilibrium constant (Eqn 4–8). The net result as the titration proceeds is that more and more HAc ionizes, forming Ac, as the NaOH is added. At the midpoint of the titration (Fig. 4–10), at which exactly 0.5 equivalent of NaOH has been added, one-half of the original acetic acid has undergone dissociation,

so that the concentration of the proton donor, [HAc], now equals that of the proton acceptor, [Ac]. At this midpoint a very important relationship holds: the pH of the equimolar solution of acetic acid and acetate is exactly equal to the pKa of acetic acid (pKa = 4.76; see Table 4–7 and Fig. 4–10). The basis for this relationship, which holds for all weak acids, will soon become clear.

As the titration is continued by adding further increments of NaOH, the remaining undissociated acetic acid is gradually converted into acetate. The end point of the titration occurs at about pH 7.0: all the acetic acid has lost its protons to OH, to form H2O and acetate. Throughout the titration the two equilibria (Eqns 4–5 and 4–6) coexist, each always conforming to its equilibrium constant.

Figure 4–11 compares the titration curves of three weak acids with very different dissociation constants: acetic acid (pKa = 4.76); dihydrogen phosphate (pKa = 6.86); and ammonium ion, or NH4+ (pKa = 9.25). Although the titration curves of these acids have the same shape, they are displaced along the pH axis because these acids have different strengths. Acetic acid is the strongest and loses its proton most readily, since its Ka is highest (pKa lowest) of the three. Acetic acid is already half dissociated at pH 4.76. H2PO4 loses a proton less readily, being half dissociated at pH 6.86. NH4+ is the weakest acid of the three and does not become half dissociated until pH 9.25.

The most important point about the titration curve of a weak acid is that it shows graphically that a weak acid and its anion – a conjugate acid–base pair – can act as a buffer.

Almost every biological process is pH dependent; a small change in pH produces a large change in the rate of the process. This is true not only for the many reactions in which the H+ ion is a direct participant, but also for those in which there is no apparent role for H+ ions. The enzymes that catalyze cellular reactions, and many of the molecules on which they act, contain ionizable groups with characteristic pKa values. The protonated amino (–NH3+) and carboxyl groups of amino acids and the phosphate groups of nucleotides, for example, function as weak acids; their ionic state depends upon the pH of the solution in which they are dissolved. As we noted above, ionic interactions are among the forces that stabilize a protein molecule and allow an enzyme to recognize and bind its substrate.

Cells and organisms maintain a specific and constant cytosolic pH, keeping biomolecules in their optimal ionic state, usually near pH 7. In multicellular organisms, the pH of the extracellular fluids (blood, for example) is also tightly regulated. Constancy of pH is achieved primarily by biological buffers: mixtures of weak acids and their conjugate bases.

We describe here the ionization equilibria that account for buffering, and show the quantitative relationship between the pH of a buffered solution and the pKa of the buffer. Biological buffering is illustrated by the phosphate and carbonate buffering systems of humans.

Figure 4–12  Capacity of the acetic acid-acetate couple to act as a buffer system, capable of absorbing either H+ or OH through the reversibility of the dissociation of acetic acid. The proton donor, in this case acetic acid (HAc), contains a reserve of bound H+, which can be released to neutralize an addition of OH to the system, forming H2O. This happens because the product [H+][OH] transiently exceeds Kw (1 × 10–14 M2). The equilibrium quickly adjusts so that this product equals 1 × 10–14 M2 (at 25 °C), thus transiently reducing the concentration of H+. But now the quotient [H+][Ac]/[HAc] is less then Ka, so HAc dissociates further to restore equilibrium. Similarly, the conjugate base, Ac, can react with H+ ions added to the system; again, the two ionization reactions simultaneously come to equilibrium. Thus a conjugate acid–base pair, such as acetic acid and acetate ion, tends to resist a change in pH when small amounts of acid or base are added. Buffering action is simply the consequence of two reversible reactions taking place simultaneously and reaching their points of equilibrium as governed by their equilibrium constants, Kw and Ka.

Buffers are aqueous systems that tend to resist changes in their pH when small amounts of acid (H+) or base (OH) are added. A buffer system consists of a weak acid (the proton donor) and its conjugate base (the proton acceptor). As an example, a mixture of equal concentrations of acetic acid and acetate ion, found at the midpoint of the titration curve in Figure 4–10, is a buffer system. The titration curve of acetic acid has a relatively flat zone extending about 0.5 pH units on either side of its midpoint pH of 4.76. In this zone there is only a small change in pH when increments of either H+ or OH are added to the system. This relatively flat zone is the buffering region of the acetic acid–acetate buffer pair. At the midpoint of the buffering region, where the concentration of the proton donor (acetic acid) exactly equals that of the proton acceptor (acetate), the buffering power of the system is maximal; that is, its pH changes least on addition of an increment of H+ or OH. The pH at this point in the titration curve of acetic acid is equal to its pKa. The pH of the acetate buffer system does change slightly when a small amount of H+ or OH is added, but this change is very small compared with the pH change that would result if the same amount of H+ (or OH) were added to pure water or to a solution of the salt of a strong acid and strong base, such as NaCl, which have no buffering power.

Buffering results from two reversible reaction equilibria occurring in a solution of nearly equal concentrations of a proton donor and its conjugate proton acceptor. Figure 4–12 helps to explain how a buffer system works. Whenever H+ or OH is added to a buffer, the result is a small change in the ratio of the relative concentrations of the weak

acid and its anion and thus a small change in pH. The decrease in concentration of one component of the system is balanced exactly by an increase in the other. The sum of the buffer components does not change, only their ratio.

Each conjugate acid–base pair has a characteristic pH zone in which it is an effective buffer (Fig. 4–11). The H2PO4/HPO42− pair has a pKa of 6.86 and thus can serve as a buffer system near pH 6.86; the NH4+/NH3 pair, with a pKa of 9.25, can act as a buffer near pH 9.25.

The quantitative relationship among pH, the buffering action of a mixture of weak acid with its conjugate base, and the pKa of the weak acid is given by the Henderson–Hasselbalch equation. The titration curves of acetic acid, H2PO4, and NH4+ (Fig. 4–11) have nearly identical shapes, suggesting that they all reflect a fundamental law or relationship. This is indeed the case. The shape of the titration curve of any weak acid is expressed by the Henderson–Hasselbalch equation, which is important for understanding buffer action and acid–base balance in the blood and tissues of the vertebrate organism. This equation is simply a useful way of restating the expression for the dissociation constant of an acid. For the dissociation of a weak acid HA into H+ and A, the Henderson–Hasselbalch equation can be derived as follows:

[H+][A]
Ka  =  
[HA]

First solve for [H+]:

[HA]
[H+]  =  Ka 
[A]

Then take the negative logarithm of both sides:

[HA]
– log [H+]  =  – log Ka  – log 
[A]

Substitute pH for –log [H+] and pKa for –log Ka:

[HA]
pH  =  pKa – log 
[A]

Now invert –log [HA]/[A], which involves changing its sign, to obtain the Henderson–Hasselbalch equation:

[A]
pH  =  pKa + log 
[HA]

which is stated more generally as

 [proton acceptor] 
pH  =  pKa  +  log 
[proton donor]

This equation fits the titration curve of all weak acids and enables us to deduce a number of important quantitative relationships. For example, it shows why the pKa of a weak acid is equal to the pH of the solution at the midpoint of its titration. At this point [HA] = [A], and

pH = pKa + log 1.0 = pKa + 0 = pKa
The Henderson–Hasselbalch equation also makes it possible to calculate the pKa of any acid from the molar ratio of proton-donor and proton-acceptor species at any given pH; to calculate the pH of a conjugate acid–base pair of a given pKa and a given molar ratio; and to calculate the molar ratio of proton donor and proton acceptor at any pH given the pKa of the weak acid (Box 4–2).
B O X  4–2
Solving Problems with the Henderson–Hasselbalch Equation

1. Calculate the pKa of lactic acid, given that when the concentration of free lactic acid is 0.010 M and the concentration of lactate is 0.087 M, the pH is 4.80.

[lactate]
pH   =  pKa  +  log 
[lactic acid]             

[lactate]
pKa   =  pH  –  log 
[lactic acid]               

0.087
          =  4.80  –  log 
  =  4.80 – log 8.7
0.010

  =  4.80 – 0.94  =  3.86   (answer)

2. Calculate the pH of a mixture of 0.1 M acetic acid and 0.2 M sodium acetate. The pKa of acetic acid is 4.76.

[acetate]
pH   =  pKa  +  log 
[acetic acid]         

0.2
          =  4.76  +  log 
  =  4.76 + 0.301
0.1

=  5.06   (answer)             

3. Calculate the ratio of the concentrations of acetate and acetic acid required in a buffer system of pH 5.30.

[acetate]
   pH   =  pKa  +  log 
[acetic acid] 

[acetate]
log 
  =  pH  –  pKa                                     
[acetic acid] 

 =  5.30 – 4.76  =  0.54

[acetate]

  =  antilog 0.54  =  3.47   (answer)   
[acetic acid]

www.bioinfo.org.cn/book/biochemistry/chapt04/bio3.htm
Figure 4–13  The amino acid histidine, a component of proteins, is a weak acid. The pKa of the protonated nitrogen of the side chain is 6.0.

The cytoplasm of most cells contains high concentrations of proteins, which contain many amino acids with functional groups that are weak acids or weak bases. The side chain of the amino acid histidine (Fig. 4–13) has a pKa of 6.0, and proteins containing histidine residues can therefore buffer effectively near neutral pH. Nucleotides such as ATP, as well as many low molecular weight metabolites, contain ionizable groups that can contribute buffering power to the cytoplasm. Some highly specialized organelles and extracellular compartments have high concentrations of compounds that contribute buffering capacity: organic acids buffer the vacuoles of plant cells; ammonia buffers urine.

Figure 4–14  The pH optima of some enzymes: pepsin, a digestive enzyme secreted into gastric juice (black); trypsin, a digestive enzyme that acts in the small intestine (red); alkaline phosphatase of bone tissue (blue).

The intracellular and extracellular fluids of all multicellular organisms have a characteristic and nearly constant pH, which is regulated by various biological activities. The organism’s first line of defense against changes in internal pH is provided by buffer systems. Two important biological buffers are the phosphate and bicarbonate systems. The phosphate buffer system, which acts in the cytoplasm of all cells, consists of H2PO4 as proton donor and HPO42− as proton acceptor:

H2PO4   ⇌   H+  +  HPO42−

The phosphate buffer system works exactly like the acetate buffer system, except for the pH range in which it functions. The phosphate buffer system is maximally effective at a pH close to its pKa of 6.86 (see Table 4–7 and Fig. 4–11), and thus tends to resist pH changes in the range between about 6.4 and 7.4. It is therefore effective in providing buffering power in intracellular fluids; in mammals, for example, extracellular fluids and most cytoplasmic compartments have a pH in the range of 6.9 to 7.4.

Blood plasma is buffered in part by the bicarbonate system, consisting of carbonic acid (H2CO3) as proton donor and bicarbonate (HCO3) as proton acceptor:

H2CO3   ⇌   H+  +  HCO3

This system has an equilibrium constant

[H+][HCO3]
K1  =  
[H2CO3]

and functions as a buffer in the same way as other conjugate acid–base pairs. It is unique, however, in that one of its components, carbonic acid (H2CO3), is formed from dissolved (d) carbon dioxide and water, according to the reversible reaction

CO2(d) + H2O   ⇌   H2CO3

which has an equilibrium constant given by the expression

[H2CO3]
K2  =  
[CO2(d)][H2O]

Carbon dioxide is a gas under normal conditions, and the concentration of dissolved CO2 is the result of equilibration with CO2 of the gas phase:

CO2(g)   ⇌   CO2(d)

This process has an equilibrium constant given by

[CO2(d)]
K3  =  
[CO2(g)]

The pH of a bicarbonate buffer system depends on the concentration of H2CO3 and HCO3, the proton donor and acceptor components. The concentration of H2CO3 in turn depends on the concentration of dissolved CO2, which in turn depends on the concentration or partial pressure of CO2 in the gas phase; thus the pH of a bicarbonate buffer exposed to a gas phase is ultimately determined by the concentration of HCO3 in the aqueous phase and the partial pressure of CO2 in the gas phase (Box 4–3).
Human blood plasma normally has a pH close to 7.40. Should the pH-regulating mechanisms fail or be overwhelmed, as may happen in severe uncontrolled diabetes when an overproduction of metabolic acids causes acidosis, the pH of the blood can fall to 6.8 or below, leading to irreparable cell damage and death. In other diseases the pH may rise to lethal levels. Although many aspects of cell structure and function are influenced by pH, it is the catalytic activity of enzymes that is especially sensitive. Enzymes typically show maximal catalytic activity at a characteristic pH, called the optimum pH (Fig. 4–14). On either side of the optimum pH their catalytic activity often declines sharply. Thus a small change in pH can make a large difference in the rate of some crucial enzyme-catalyzed reaction. Biological control of the pH of cells and body fluids is therefore of central importance in all aspects of metabolism and cellular activities.
B O X  4–3
Blood, Lungs, and Buffer: The Bicarbonate Buffer System

In animals with lungs, the bicarbonate buffer system is an effective physiological buffer near pH 7.4 because the H2CO3 of the blood plasma is in equilibrium with a large reserve capacity of CO2(g) in the air space of the lungs. This buffer system involves three reversible equilibria between gaseous CO2 in the lungs and bicarbonate (HCO3) in the blood plasma (Fig. 1). When H+ is added to blood as it passes through the tissues, reaction 1 proceeds toward a new equilibrium, in which the concentration of H2CO3 is increased. This increases the concentration of CO2(d) in the blood (reaction 2), and thus increases the pressure of CO2(g) in the air space of the lungs (reaction 3); the extra CO2 is exhaled.

Conversely, when OH is added to the blood plasma, the opposite events occur: the H+ concentration is lowered, causing more H2CO3 to dissociate into H+ and HCO3. This in turn causes more CO2(g) from the lungs to dissolve in the blood plasma. The rate of breathing, that is, the rate of inhaling and exhaling CO2, can quickly adjust these equilibria to keep the blood pH nearly constant.

Figure 1  The CO2 in the air space of the lungs is in equilibrium with the bicarbonate buffer in the blood plasma passing through the lung capillaries. Because the concentration of dissolved CO2 can be adjusted rapidly through changes in the rate of breathing, the bicarbonate buffer system of the blood is in near-equilibrium with a large potential reservoir of CO2.
Figure 4–15  Water participates directly in a variety of reactions. (a) ATP is a phosphate anhydride formed by a condensation reaction (loss of the elements of water) between ADP and phosphate. R represents adenosine monophosphate (AMP). This condensation reaction requires energy. The hydrolysis (addition of the elements of water) of ATP releases an equivalent amount of energy. (b), (c), and (d) represent similar condensation and hydrolysis reactions common in biological systems.

Water is not just the solvent in which the chemical reactions of living cells occur; it is very often a direct participant in those reactions. The formation of ATP from ADP and inorganic phosphate is a condensation reaction (see Fig. 3–14) in which the elements of water are eliminated (Fig. 4–15a). The compound formed by this condensation is called a phosphate anhydride. Hydrolysis reactions are responsible for the enzymatic depolymerization of proteins, carbohydrates, and nucleic acids ingested in the diet. Hydrolytic enzymes (hydrolases) catalyze the addition of the elements of water to the bonds that connect monomeric subunits in these macromolecules (Fig. 4–15). Hydrolysis reactions are almost invariably exergonic, and the formation of cellular polymers from their subunits by simple reversal of hydrolysis would be endergonic and as such does not occur. We shall see that cells circumvent this thermodynamic obstacle by coupling the endergonic condensation reactions to exergonic processes, such as breakage of the anhydride bond in ATP.

You are (we hope!) consuming oxygen as you read. Water and carbon dioxide are the end products of the oxidation of fuels such as glucose. The overall reaction of this process can be summarized by the equation:

C6H12O6   +  6O2   →   6CO2  +  6H2O
Glucose

The "metabolic water" thus formed from stored fuels is actually enough to allow some animals in very dry habitats (gerbils, kangaroo rats, camels) to survive without drinking water for extended periods.
Green plants and algae use the energy of sunlight (represented by hν, the energy of light of frequency ν; h is Planck’s constant) to split water in the process of photosynthesis:

2H2O  +  2A   →   O2  +  2AH2

In this reaction, A is an electron-accepting species, which varies with the type of photosynthetic organism.
Aqueous environments support a myriad of species. Soft corals, sponges, bryozoans, and algae compete for space on this reef substrate off the Philippine Islands.

Organisms have effectively adapted to their aqueous environment and have even evolved means of exploiting the unusual properties of water. The high specific heat of water (the heat energy required to raise the temperature of 1 g of water by 1 °C) is useful to cells and organisms because it allows water to act as a "heat buffer," permitting the temperature of an organism to remain relatively constant as the temperature of the air fluctuates and as heat is generated as a byproduct of metabolism. Furthermore, some vertebrates exploit the high heat of vaporization of water (see Table 4–1) by using (thus losing) excess body heat to evaporate sweat. The high degree of internal cohesion of liquid water, due to hydrogen bonding, is exploited by plants as a means of transporting dissolved nutrients from the roots to the leaves during the process of transpiration. Even the lower density of ice than of liquid water has important biological consequences in the life cycles of aquatic organisms. Ponds freeze from the top down, and the layer of ice at the top insulates the water below from frigid air, preventing the pond (and the organisms in it) from freezing solid. Most fundamental to all living organisms is the fact that many physical and biological properties of cell macromolecules, particularly the proteins and nucleic acids, derive from their interactions with water molecules of the surrounding medium. The influence of water on the course of biological evolution has been profound and determinative. If life forms have evolved elsewhere in the universe, it is unlikely that they resemble those of earth, unless their extraterrestrial origin is also a place in which plentiful liquid water is available as solvent.

www.bioinfo.org.cn/book/biochemistry/chapt04/bio4.htm
Summary

Water is the most abundant compound in living organisms. Its relatively high freezing point, boiling point, and heat of vaporization are the result of strong intermolecular attractions in the form of hydrogen bonding between adjacent water molecules. Liquid water has considerable short-range order and consists of short-lived hydrogen-bonded clusters. The polarity and hydrogen-bonding properties of water make it a potent solvent for many ionic compounds and other polar molecules. Nonpolar compounds, including the gases CO2, O2, and N2, are poorly soluble in water. Water disperses amphipathic molecules to form micelles, clusters of molecules in which the hydrophobic groups are hidden from water and the polar groups are exposed on the external surface.

Four types of weak interactions occur within and between biomolecules in an aqueous solvent: hydrogen bonds and ionic, hydrophobic, and van der Waals interactions. Although weak individually, these interactions collectively create a very strong stabilizing force for proteins, nucleic acids, and membranes. Weak (noncovalent) interactions are also at the heart of enzyme catalysis, antibody function, and receptor–ligand interactions.

Water ionizes very slightly to form H+ and OH ions. In dilute aqueous solutions, the concentrations of H+ and OH ions are inversely related by the expression Kw = [H+][OH] = 1 × 10–14 M2 (at 25 °C). The hydrogen-ion concentration of biological systems is usually expressed in terms of pH, defined as pH = –log [H+]. The pH of aqueous solutions is measured by means of glass electrodes sensitive to H+ concentration.

Acids are defined as proton donors and bases as proton acceptors. A conjugate acid–base pair consists of a proton donor (HA) and its corresponding proton acceptor (A). The tendency of an acid HA to donate protons is expressed by its dissociation constant (Ka = [H+][A]/[HA]) or by the function pKa, defined as –log Ka, which can be determined from an experimental titration curve. The pH of a solution of a weak acid is quantitatively related to its pKa and to the ratio of the concentrations of its proton-donor and proton-acceptor species by the Henderson–Hasselbalch equation.

A conjugate acid–base pair can act as a buffer and resist changes in pH; its capacity to do so is greatest at a pH equal to its pKa. Many types of biomolecules have functional groups that contribute buffering capacity. H2CO3/HCO3 and H2PO4/HPO42− are important biological buffer systems. The catalytic activity of enzymes is strongly influenced by pH, and it is essential that the environments in which they function be buffered against large pH changes.

Water is not only the solvent in which metabolic reactions occur; it participates directly in many of the reactions, including hydrolysis and condensation reactions.

The physical and chemical properties of water are central to biological structure and function. The evolution of life on earth was doubtless influenced greatly by both the solvent and reactant properties of water.

Further Reading

General

Dick, D.A.T. (1966) Cell Water, Butterworth Publishers, Inc., Stoneham, MA. 
A classic description of the properties and functions of water in living organisms.

Edsall, J.T. & Wyman, J. (1958) Biophysical Chemistry, Vol. 1, Academic Press, Inc., New York. 
An excellent discussion of water and its fitness as a biological solvent.

Eisenberg, D. & Kauzmann, W. (1969) The Structure and Properties of Water, Oxford University Press, New York. 
An advanced treatment of the physical chemistry of water.

Franks, F. (ed) (1975) Water – A Comprehensive Treatise, Vol. 4, Plenum Press, New York. 

Franks, F. & Mathias, S.F. (eds) (1982) Biophysics of Water, John Wiley & Sons, Inc., New York. 
A large collection of papers on the structure of pure water and of the cytoplasm.

Henderson, L.J. (1927) The Fitness of the Environment, Beacon Press, Boston, MA. [Reprinted (1958).] 
This book is a classic; it includes a discussion of the suitability of water as the solvent for life on earth.

Kuntz, I.D. & Zipp, A. (1977) Water in biological systems. New Engl. J. Med. 297, 262–266. 
A brief review of the physical state of cytosolic water and its interactions with dissolved biomolecules.

Solomon, A.K. (1971) The state of water in red cells. Sci. Am. 224 (February), 88–96. 
A description of research on the structure of water within cells.

Stillinger, F.H. (1980) Water revisited. Science 209, 451–457. 
A short review of the physical structure of water, including the importance of hydrogen bonding and the nature of hydrophobic interactions.

Symons, M.C.R. (1981) Water structure and reactivity. Acc. Chem. Res. 14, 179–187. 

Wiggins, P.M. (1990) Role of water in some biological processes. Microbiol. Rev. 54, 432–449. 
A recent and excellent review of water in biology, including discussion of the physical structure of liquid water, its interaction with biomolecules, and the state of water in living cells.

Weak Interactions in Aqueous Systems

Fersht, A.R. (1987) The hydrogen bond in molecular recognition. Trends Biochem. Sci. 12, 301–304. 
A clear, brief, quantitative discussion of the contribution of hydrogen bonding to molecular recognition and enzyme catalysis.

Frieden, E. (1975) Non-covalent interactions: key to biological flexibility and specificity. J. Chem. Educ. 52, 754–761. 
Review of the four kinds of weak interactions that stabilize macromolecules and confer biological specificity, with clear examples.

Tanford, C. (1978) The hydrophobic effect and the organization of living matter. Science 200, 1012–1018. 
An excellent review of the chemical and energetic basis for hydrophobic interactions between biomolecules in aqueous solutions.

Weak Acids, Weak Bases, and Buffers

Montgomery, R. & Swenson, C.A. (1976) Quantitative Problems in the Biochemical Sciences, 2nd edn, W.H. Freeman and Company, New York. 
This and the following book are excellent compilations of solved problems, many of which concern pH, the ionization of weak acids and bases, and buffers.

Segel, I.H. (1976) Biochemical Calculations, 2nd edn, John Wiley & Sons, Inc., New York. 

Problems

1. Artificial Vinegar  One way to make vinegar (not the preferred way) is to prepare a solution of acetic acid, the sole acid component of vinegar, at the proper pH (see Fig. 4–9) and add appropriate flavoring agents. Acetic acid (Mr 60) is a liquid at 25 °C with a density of 1.049 g/mL. Calculate the amount (volume) that must be added to distilled water to make 1 L of simulated vinegar (see Table 4–7).

2. Acidity of Gastric HCl  In a hospital laboratory, a 10.0 mL sample of gastric juice, obtained several hours after a meal, was titrated with 0.1 M NaOH to neutrality; 7.2 mL of NaOH was required. The stomach contained no ingested food or drink, thus assume that no buffers were present. What was the pH of the gastric juice?

3. Measurement of Acetylcholine Levels by pH Changes  The concentration of acetylcholine, a neurotransmitter, can be determined from the pH changes that accompany its hydrolysis. When incubated with a catalytic amount of the enzyme acetylcholinesterase, acetylcholine is quantitatively converted into choline and acetic acid, which dissociates to yield acetate and a hydrogen ion:

In a typical analysis, 15 mL of an aqueous solution containing an unknown amount of acetylcholine had a pH of 7.65. When incubated with acetylcholinesterase, the pH of the solution decreased to a final value of 6.87. Assuming that there was no buffer in the assay mixture, determine the number of moles of acetylcholine in the 15 mL of unknown.

4. Significance of the pKa of an Acid  One common description of the pKa of an acid is that it represents the pH at which the acid is half ionized, that is, the pH at which it exists as a 50:50 mixture of the acid and the conjugate base. Demonstrate this relationship for an acid HA, starting from the equilibrium-constant expression.

5. Properties of a Buffer  The amino acid glycine is often used as the main ingredient of a buffer in biochemical experiments. The amino group of glycine, which has a pKa of 9.6, can exist either in the protonated form (–NH3+) or as the free base (–NH2) because of the reversible equilibrium

R–NH3+   ⇌   R–NH2  +  H+

       (a) In what pH range can glycine be used as an effective buffer due to its amino group?
       (b) In a 0.1 M solution of glycine at pH 9.0, what fraction of glycine has its amino group in the –NH3+ form?
       (c) How much 5 M KOH must be added to 1.0 L of 0.1 M glycine at pH 9.0 to bring its pH to exactly 10.0?
       (d) In order to have 99% of the glycine in its –NH3+ form, what must the numerical relation be between the pH of the solution and the pKa of the amino group of glycine?

6. The Effect of pH on Solubility  The strongly polar hydrogen-bonding nature of water makes it an excellent solvent for ionic (charged) species. By contrast, un-ionized, nonpolar organic molecules, such as benzene, are relatively insoluble in water. In principle, the aqueous solubility of all organic acids or bases can be increased by deprotonation or protonation of the molecules, respectively, to form charged species. For example, the solubility of benzoic acid in water is low. The addition of sodium bicarbonate raises the pH of the solution and deprotonates the benzoic acid to form benzoate ion, which is quite soluble in water.

Are the molecules in (a) to (c) (below) more soluble in an aqueous solution of 0.1 M NaOH or 0.1 M HCl? (The dissociable protons are shown in red.)

7. Treatment of Poison Ivy Rash  Catechols substituted with long-chain alkyl groups are the components of poison ivy and poison oak that produce the characteristic itchy rash.

If you were exposed to poison ivy, which of the treatments below would you apply to the affected area? Justify your choice.

       (a) Wash the area with cold water.
       (b) Wash the area with dilute vinegar or lemon juice.
       (c) Wash the area with soap and water.
       (d) Wash the area with soap, water, and baking soda (sodium bicarbonate).

8. pH and Drug Absorption  Aspirin is a weak acid with a pKa of 3.5.

It is absorbed into the blood through the cells lining the stomach and the small intestine. Absorption requires passage through the cell membrane, which is determined by the polarity of the molecule: charged and highly polar molecules pass slowly, whereas neutral hydrophobic ones pass rapidly. The pH of the gastric juice in the stomach is about 1.5 and the pH of the contents of the small intestine is about 6. Is more aspirin absorbed into the bloodstream from the stomach or from the small intestine? Clearly justify your choice.

9. Preparation of Standard Buffer for Calibration of a pH Meter  The glass electrode used in commercial pH meters gives an electrical response proportional to the hydrogen-ion concentration. To convert these responses into pH, glass electrodes must be calibrated against standard solutions of known hydrogen-ion concentration. Determine the weight in grams of sodium dihydrogen phosphate (NaH2PO4 ‧ H2O; formula weight (FW) 138.01) and disodium hydrogen phosphate (Na2HPO4; FW 141.98) needed to prepare 1 L of a standard buffer at pH 7.00 with a total phosphate concentration of 0.100 M (see Table 4–7).

10. Control of Blood pH by the Rate of Respiration 

       (a) The partial pressure of CO2 in the lungs can be varied rapidly by the rate and depth of breathing. For example, a common remedy to alleviate hiccups is to increase the concentration of CO2 in the lungs. This can be achieved by holding one’s breath, by very slow and shallow breathing (hypoventilation), or by breathing in and out of a paper bag. Under such conditions, the partial pressure of CO2 in the air space of the lungs rises above normal. Qualitatively explain the effect of these procedures on the blood pH.
       (b) A common practice of competitive short-distance runners is to breathe rapidly and deeply (hyperventilation) for about half a minute to remove CO2 from their lungs just before running in, say, a 100 m dash. Their blood pH may rise to 7.60. Explain why the blood pH goes up.
       (c) During a short-distance run the muscles produce a large amount of lactic acid from their glucose stores. In view of this fact, why might hyperventilation before a dash be useful?
www.bioinfo.org.cn/book/biochemistry/chapt04/bio5.htm
Part II
Structure and Catalysis

Facing page: End-on view of the triple-stranded collagen superhelix. Collagen,
a component of connective tissue, provides tensile strength and resiliency. Its
strength is derived in part from the three tightly wrapped identical helical strands
(shown in gray, purple, and blue), much the way a length of rope is stronger than
its constituent fibers. The tight wrapping is made possible by the presence of glycine,
shown in red, at every third position along each strand, where the strands are in
contact. Glycine’s small size allows for very close contact.

In Part I we contrasted the complex structure and function of living cells with the relative simplicity of the monomeric units from which the enzymes, supramolecular complexes, and organelles of the cells are constructed. Part II is devoted to the structure and function of the major classes of cellular constituents: amino acids and proteins (Chapters 5 through 8), fatty acids, lipids, and membranes (Chapters 9 and 10), sugars and polysaccharides (Chapter 11), and nucleotides and nucleic acids (Chapter 12). We begin in each case by considering the covalent structure of the simple subunits (amino acids, fatty acids, monosaccharides, and nucleotides). These subunits are a major part of the language of biochemistry; familiarity with them is a prerequisite for understanding more advanced topics covered in this book, as well as the rapidly growing and exciting literature of biochemistry.

After describing the covalent chemistry of the monomeric units, we consider the structure of the macromolecules and supramolecular complexes derived from them. An overriding theme is that the polymeric macromolecules in living systems, though large, are highly ordered chemical entities, with specific sequences of monomeric subunits giving rise to discrete structures and functions. This fundamental theme can be broken down into three interrelated principles: (1) the unique structure of each macromolecule determines its function; (2) noncovalent interactions play a critical role in the structure and function of macromolecules; and (3) the specific sequences of monomeric subunits in polymeric macromolecules contain the information upon which the ordered living state depends. Each of these principles deserves further comment.

The relationship between structure and function is especially evident in proteins, which exhibit an extraordinary diversity of functions. One particular polymeric sequence of amino acids produces a strong, fibrous structure found in hair and wool; another produces a protein that transports oxygen in the blood. Similarly, the special functions of lipids, polysaccharides, and nucleic acids can be understood as a direct manifestation of their chemical structure, with their characteristic monomeric subunits linked in precise functional groups or polymers. Lipids aggregate to form membranes; sugars linked together become energy stores and structural fibers; nucleotides in a polymer become the blueprint for an entire organism.

As we move from monomeric units to larger and larger polymers, the chemical focus shifts from covalent bonds to noncovalent interactions. The covalent nature of monomeric units, and of the bonds that connect them in polymers, places strong constraints upon the shapes

assumed by large molecules. It is the numerous noncovalent interactions, however, that dictate the stable native conformation and provide the flexibility necessary for the biological function of these large molecules. We will see that noncovalent interactions are essential to the catalytic power of enzymes, the arrangement and properties of lipids in a membrane, and the critical interaction of complementary base pairs in nucleic acids.

The principle that sequences of monomeric subunits are information-rich emerges fully in the discussion of nucleic acids in Chapter 12. However, proteins and some polysaccharides are also information-rich molecules. The amino acid sequence is a form of information that directs the folding of the protein into its unique three-dimensional structure, and ultimately determines the function of the protein. Some polysaccharides also have unique sequences and three-dimensional structures that can be recognized by other macromolecules.

For each class of molecules we find a similar structural hierarchy, in which subunits of fixed structure are connected by bonds of limited flexibility, to form macromolecules with three-dimensional structures determined by noncovalent interactions. Together, the molecules described in Part II are the “stuff” of life. We begin with the amino acids.

www.bioinfo.org.cn/book/biochemistry/chapt05/bio0.htm
Figure 5–1  The protein keratin is formed by all vertebrates. It is the chief structural component of hair, scales, horn, wool, nails, and feathers. The black rhinoceros is nearing extinction in the wild because of the myths prevalent in some parts of the world that a powder derived from its horn has aphrodisiac properties. In reality, the chemical properties are no different from those of powdered bovine hooves or human fingernails.
Chapter 5
Amino Acids and Peptides
Proteins are the most abundant macromolecules in living cells, occurring in all cells and all parts of cells. Proteins also occur in great variety; thousands of different kinds may be found in a single cell. Moreover, proteins exhibit great diversity in their biological function. Their central role is made evident by the fact that proteins are the most important final products of the information pathways discussed in Part IV of this book. In a sense, they are the molecular instruments through which genetic information is expressed. It is appropriate to begin the study of biological macromolecules with the proteins, whose name derives from the Greek prōtos, meaning "first" or "foremost".

Relatively simple monomeric subunits provide the key to the structure of the thousands of different proteins. All proteins, whether from the most ancient lines of bacteria or from the most complex forms of life, are constructed from the same ubiquitous set of 20 amino acids, covalently linked in characteristic linear sequences. Because each of these amino acids has a distinctive side chain that determines its chemical properties, this group of 20 precursor molecules may be regarded as the alphabet in which the language of protein structure is written.

Proteins are chains of amino acids, each joined to its neighbor by a specific type of covalent bond. What is most remarkable is that cells can produce proteins that have strikingly different properties and activities by joining the same 20 amino acids in many different combinations and sequences. From these building blocks different organisms can make such widely diverse products as enzymes, hormones, antibodies, the lens protein of the eye, feathers, spider webs, rhinoceros horns (Fig. 5–1), milk proteins, antibiotics, mushroom poisons, and a myriad of other substances having distinct biological activities.

Protein structure and function is the topic for the next four chapters. In this chapter we begin with a description of amino acids and the covalent bonds that link them together in peptides and proteins.

Proteins can be reduced to their constituent amino acids by a variety of methods, and the earliest studies of proteins naturally focused on the free amino acids derived from them. The first amino acid to be discovered in proteins was asparagine, in 1806. The last of the 20 to be found, threonine, was not identified until 1938. All the amino acids have trivial or common names, in some cases derived from the source from which they were first isolated. Asparagine was first found in asparagus, as one might guess; glutamate was found in wheat gluten; tyrosine was first isolated from cheese (thus its name is derived from the Greek tyros, "cheese"); and glycine (Greek glykos, "sweet") was so named because of its sweet taste.

Figure 5–2  General structure of the amino acids found in proteins. With the exception of the nature of the R group, this structure is common to all the α-amino acids. (Proline, because it is an imino acid, is an exceptional component of proteins.) The α carbon is shown in blue. R (in red) represents the R group or side chain, which is different in each amino acid. In all amino acids except glycine (shown for comparison) the α-carbon atom has four different substituent groups.
Figure 5–3  (a) The two stereoisomers of alanine. L- and D-alanine are nonsuperimposable mirror images of each other. (b, c) Two different conventions for showing the configurations in space of stereoisomers. In perspective formulas (b) the wedge-shaped bonds project out of the plane of the paper, the dashed bonds behind it. In projection formulas (c) the horizontal bonds are assumed to project out of the plane of the paper, the vertical bonds behind. However, projection formulas are often used casually without reference to stereochemical configuration.
Figure 5–4  Steric relationship of the stereoisomers of alanine to the absolute configuration of L- and D-glyceraldehyde. In these perspective formulas, the carbons are lined up vertically, with the chiral atom in the center. The carbons in these molecules are numbered beginning with the aldehyde or carboxyl carbons on the end, or 1 to 3 from top to bottom as shown. When presented in this way, the R group of the amino acid (in this case the methyl group of alanine) is always below the α carbon. L-Amino acids are those with the α-amino group on the left, and D-amino acids have the α-amino group on the right.

All of the 20 amino acids found in proteins have a carboxyl group and an amino group bonded to the same carbon atom (the α carbon) (Fig. 5–2). They differ from each other in their side chains, or R groups, which vary in structure, size, and electric charge, and influence the solubility of amino acids in water. When the R group contains additional carbons in a chain, they are designated β, γ, δ, ε, etc., proceeding out from the α carbon. The 20 amino acids of proteins are often referred to as the standard, primary, or normal amino acids, to distinguish them from amino acids within proteins that are modified after the proteins are synthesized, and from many other kinds of amino acids present in living organisms but not in proteins. The standard amino acids have been assigned three-letter abbreviations and one-letter symbols (Table 5–1), which are used as shorthand to indicate the composition and sequence of amino acids in proteins.

We note in Figure 5–2 that for all the standard amino acids except one (glycine) the α carbon is asymmetric, bonded to four different substituent groups: a carboxyl group, an amino group, an R group, and a hydrogen atom. The α-carbon atom is thus a chiral center (see Fig. 3–9). Because of the tetrahedral arrangement of the bonding orbitals around the α-carbon atom of amino acids, the four different substituent groups can occupy two different arrangements in space, which are nonsuperimposable mirror images of each other (Fig. 5–3). These two forms are called enantiomers or stereoisomers (see Fig. 3–9). All molecules with a chiral center are also optically active – i.e., they can rotate plane-polarized light, with the direction of the rotation differing for different stereoisomers.

* A scale combining hydrophobicity and hydrophilicity; can be used to predict which amino acids will be found in an aqueous environment (– values) and which will be found in a hydrophobic environment (+ values). See Box 10–2. From Kyte, J. & Doolittle, R.F. (1982) J. Mol. Biol. 157, 105–132.

† Average occurrence in over 200 proteins. From Klapper, M.H. (1977) Biochem. Biophys. Res. Commun. 78, 1018–1024.

The classification and naming of stereoisomers is based on the absolute configuration of the four substituents of the asymmetric carbon atom. For this purpose a reference compound has been chosen, to which all other optically active compounds are compared. This reference compound is the 3-carbon sugar glyceraldehyde (Fig. 5–4), the smallest sugar to have an asymmetric carbon atom. The naming of configurations of both simple sugars and amino acids is based on the absolute configuration of glyceraldehyde, as established by x-ray diffraction analysis. The stereoisomers of all chiral compounds having a configuration related to that of L-glyceraldehyde are designated L (for levorotatory, derived from levo, meaning “left”), and the stereoisomers related to D-glyceraldehyde are designated D (for dextrorotatory, derived from dextro, meaning “right”). The symbols L and D thus refer to the absolute configuration of the four substituents around the chiral carbon.

Nearly all biological compounds with a chiral center occur naturally in only one stereoisomeric form, either D or L. The amino acids in protein molecules are the L stereoisomers. D-Amino acids have been found only in small peptides of bacterial cell walls and in some peptide antibiotics (see Fig. 5–19).

It is remarkable that the amino acids of proteins are all L stereoisomers. As we noted in Chapter 3, when chiral compounds are formed by ordinary chemical reactions, a racemic mixture of D and L isomers results. Whereas the L and D forms of chiral molecules are difficult for a chemist to distinguish and isolate, they are as different as night and day to a living system. The ability of cells to specifically synthesize the L isomer of amino acids reflects one of many extraordinary properties of enzymes (Chapter 8). The stereospecificity of the reactions catalyzed by some enzymes is made possible by the asymmetry of their active sites. The characteristic three-dimensional structures of proteins (Chapter 7), which dictate their diverse biological activities, require that all their constituent amino acids be of one stereochemical series.

Figure 5–5  Nonionic and zwitterionic forms of amino acids. Note the separation of the + and – charges in the zwitterion, which makes it an electric dipole. The nonionic form does not occur in significant amounts in aqueous solutions. The zwitterion predominates at neutral pH.

Amino acids in aqueous solution are ionized and can act as acids or bases. Knowledge of the acid–base properties of amino acids is extremely important in understanding the physical and biological properties of proteins. Moreover, the technology of separating, identifying, and quantifying the different amino acids, which are necessary steps in determining the amino acid composition and sequence of protein molecules, is based largely on their characteristic acid–base behavior.

Those α-amino acids having a single amino group and a single carboxyl group crystallize from neutral aqueous solutions as fully ionized species known as zwitterions (German for "hybrid ions"), each having both a positive and a negative charge (Fig. 5–5). These ions are electrically neutral and remain stationary in an electric field. The dipolar nature of amino acids was first suggested by the observation that crystalline amino acids have melting points much higher than those of other organic molecules of similar size. The crystal lattice of amino acids is held together by strong electrostatic forces between positively and negatively charged functional groups of neighboring molecules, resembling the stable ionic crystal lattice of NaCl (see Fig. 4–6).

Figure 5–6  The 20 standard amino acids of proteins. They are shown with their amino and carboxyl groups ionized, as they would occur at pH 7.0. The portions in black are those common to all the amino acids; the portions shaded in red are the R groups.
Figure 5–7  Comparison of the light absorbance spectra of the aromatic amino acids at pH 6.0. The amino acids are present in equimolar amounts (10−3 M) under identical conditions. The light absorbance of tryptophan is as much as fourfold higher than that of tyrosine. Phenylalanine absorbs less light than either tryptophan or tyrosine. Note that the absorbance maximum for tryptophan and tyrosine occurs near a wavelength of 280 nm.

An understanding of the chemical properties of the standard amino acids is central to an understanding of much of biochemistry. The topic can be simplified by grouping the amino acids into classes based on the properties of their R groups (Table 5–1), in particular, their polarity or tendency to interact with water at biological pH (near pH 7.0). The polarity of the R groups varies widely, from totally nonpolar or hydrophobic (water-insoluble) to highly polar or hydrophilic (water-soluble).

The structures of the 20 standard amino acids are shown in Figure 5–6, and many of their properties are listed in Table 5–1. There are five main classes of amino acids, those whose R groups are: nonpolar and aliphatic; aromatic (generally nonpolar); polar but uncharged; negatively charged; and positively charged. Within each class there are gradations of polarity, size, and shape of the R groups.

Nonpolar, Aliphatic R Groups  The hydrocarbon R groups in this class of amino acids are nonpolar and hydrophobic (Fig. 5–6). The bulky side chains of alanine, valine, leucine, and isoleucine, with their distinctive shapes, are important in promoting hydrophobic interactions within protein structures. Glycine has the simplest amino acid structure. Where it is present in a protein, the minimal steric hindrance of the glycine side chain allows much more structural flexibility than the other amino acids. Proline represents the opposite structural extreme. The secondary amino (imino) group is held in a rigid conformation that reduces the structural flexibility of the protein at that point.
Aromatic R Groups  Phenylalanine, tyrosine, and tryptophan, with their aromatic side chains (Fig. 5–6), are relatively nonpolar (hydrophobic). All can participate in hydrophobic interactions, which are particularly strong when the aromatic groups are stacked on one another. The hydroxyl group of tyrosine can form hydrogen bonds, and it acts as an important functional group in the activity of some enzymes. Tyrosine and tryptophan are significantly more polar than phenylalanine because of the tyrosine hydroxyl group and the nitrogen of the tryptophan indole ring.

Tryptophan and tyrosine, and to a lesser extent phenylalanine, absorb ultraviolet light (Fig. 5–7 and Box 5–1). This accounts for the characteristic strong absorbance of light by proteins at a wavelength of 280 nm, and is a property exploited by researchers in the characterization of proteins.

Polar, Uncharged R Groups  The R groups of these amino acids (Fig. 5–6) are more soluble in water, or hydrophilic, than those of the nonpolar amino acids, because they contain functional groups that form hydrogen bonds with water. This class of amino acids includes serine, threonine, cysteine, methionine, asparagine, and glutamine. The polarity of serine and threonine is contributed by their hydroxyl groups; that of cysteine and methionine by their sulfur atom; and that of asparagine and glutamine by their amide groups.

Asparagine and glutamine are the amides of two other amino acids also found in proteins, aspartate and glutamate, respectively, to which asparagine and glutamine are easily hydrolyzed by acid or base. Cysteine has an R group (a thiol group) that is approximately as acidic as the hydroxyl group of tyrosine. Cysteine requires special mention for another reason. It is readily oxidized to form a covalently linked dimeric amino acid called cystine, in which two cysteine molecules are joined by a disulfide bridge. Disulfide bridges of this kind occur in many proteins, stabilizing their structures.

Negatively Charged (Acidic) R Groups  The two amino acids having R groups with a net negative charge at pH 7.0 are aspartate and glutamate, each with a second carboxyl group (Fig. 5–6). These amino acids are the parent compounds of asparagine and glutamine, respectively.

Positively Charged (Basic) R Groups  The amino acids in which the R groups have a net positive charge at pH 7.0 are lysine, which has a second amino group at the ϵ position on its aliphatic chain; arginine, which has a positively charged guanidino group; and histidine, containing an imidazole group (Fig. 5–6). Histidine is the only standard amino acid having a side chain with a pKa near neutrality.

B O X  5–1
Absorption of Light by Molecules: The Lambert–Beer Law

Measurement of light absorption is an important tool for analysis of many biological molecules. The fraction of the incident light absorbed by a solution at a given wavelength is related to the thickness of the absorbing layer (path length) and the concentration of the absorbing species. These two relationships are combined into the Lambert–Beer law, given in integrated form as

I0
log 
  =  ϵcl
I

where I0 is the intensity of the incident light, I is the intensity of the transmitted light, ϵ is the molar absorption coefficient (in units of liters per mole-centimeter), c the concentration of the absorbing species (in moles per liter), and l the path length of the light-absorbing sample (in centimeters). The Lambert–Beer law assumes that the incident light is parallel and monochromatic and that the solvent and solute molecules are randomly oriented. The expression log (I0/I) is called the absorbance, designated A.

It is important to note that each millimeter path length of absorbing solution in a 1.0 cm cell absorbs not a constant amount but a constant fraction of the incident light. However, with an absorbing layer of fixed path length, the absorbance A is directly proportional to the concentration of the absorbing solute.

The molar absorption coefficient varies with the nature of the absorbing compound, the solvent, the wavelength, and also with pH if the light-absorbing species is in equilibrium with another species having a different spectrum through gain or loss of protons.

In practice, absorbance measurements are usually made on a set of standard solutions of known concentration at a fixed wavelength. A sample of unknown concentration can then be compared with the resulting standard curve, as shown in Figure 1.

Figure 1  Eight standard solutions containing known amounts of protein and one sample containing an unknown amount of protein were reacted with the Bradford reagent. This reagent contains a dye that shifts its absorption maximum to 595 nm when it binds amino acid residues. The A595, (absorbance at 595 nm) of the standard samples was plotted against the protein concentration to create the standard curve, shown here. The A595 of the unknown sample, 0.58, corresponds to a protein concentration of 122 μg/mL.
www.bioinfo.org.cn/book/biochemistry/chapt05/bio1.htm
Figure 5–8  (a) Some nonstandard amino acids found in proteins; all are derived from standard amino acids. The extra functional groups are shown in red. Desmosine is formed from four residues of lysine, whose carbon backbones are shaded in gray. Selenocysteine is derived from serine. (b) Ornithine and citrulline are intermediates in the biosynthesis of arginine and in the urea cycle. Note that two systems are used to number carbons in the naming of these amino acids. The α, β, γ system used for γ-carboxyglutamate begins at the α carbon (see Fig. 5–2) and extends into the R group. The α-carboxyl group is not included. In contrast, the numbering system used to identify the modified carbon in 4-hydroxyproline, 5-hydroxylysine, and 6-N-methyllysine includes the α-carboxyl carbon, which is designated carbon 1 (or C-1).

In addition to the 20 standard amino acids that are common in all proteins, other amino acids have been found as components of only certain types of proteins (Fig. 5–8a). Each of these is derived from one of the 20 standard amino acids, in a modification reaction that occurs after the standard amino acid has been inserted into a protein. Among the nonstandard amino acids are 4-hydroxyproline, a derivative of proline, and 5-hydroxylysine; the former is found in plant cell-wall proteins, and both are found in the fibrous protein collagen of connective tissues. N-Methyllysine is found in myosin, a contractile protein of muscle. Another important nonstandard amino acid is

γ-carboxyglutamate, found in the blood-clotting protein prothrombin as well as in certain other proteins that bind Ca2+ in their biological function. More complicated is the nonstandard amino acid desmosine, a derivative of four separate lysine residues, found in the fibrous protein elastin. Selenocysteine contains selenium rather than the oxygen of serine, and is found in glutathione peroxidase and a few other proteins.

Some 300 additional amino acids have been found in cells and have a variety of functions but are not substituents of proteins. Ornithine and citrulline (Fig. 5–8b) deserve special note because they are key intermediates in the biosynthesis of arginine and in the urea cycle. These pathways are described in Chapters 21 and 17, respectively.

When a crystalline amino acid, such as alanine, is dissolved in water, it exists in solution as the dipolar ion, or zwitterion, which can act either as an acid (proton donor):

or as a base (proton acceptor):

Substances having this dual nature are amphoteric and are often called ampholytes, from "amphoteric electrolytes". A simple monoamino monocarboxylic α-amino acid, such as alanine, is actually a diprotic acid when it is fully protonated, that is, when both its carboxyl group and amino group have accepted protons. In this form it has two groups that can ionize to yield protons, as indicated in the following equation:

Figure 5–9  The titration curve of 0.1 M glycine at 25 °C. The ionic species predominating at key points in the titration are shown above the graph. The shaded boxes, centered about pK1 = 2.34 and pK2 = 9.60, indicate the regions of greatest buffering power.
Figure 5–10  (a) Interactions between the α-amino and α-carboxyl groups in an α-amino acid. The nearby positive charge of the –NH3+ group makes ionization of the carboxyl group more likely (i.e., lowers the pKa for –COOH). This is due to a stabilizing interaction between opposite charges on the zwitterion and a repulsive interaction between the positive charges of the amino group and the departing proton. (b) The normal pKa for a carboxyl group is approximately 4.76, as for acetic acid.

Titration involves the gradual addition or removal of protons. Figure 5–9 shows the titration curve of the diprotic form of glycine. Each molecule of added base results in the net removal of one proton from

one molecule of amino acid. The plot has two distinct stages, each corresponding to the removal of one proton from glycine. Each of the two stages resembles in shape the titration curve of a monoprotic acid, such as acetic acid (see Fig. 4–10), and can be analyzed in the same way. At very low pH, the predominant ionic species of glycine is +H3N–CH2–COOH, the fully protonated form. At the midpoint in the first stage of the titration, in which the –COOH group of glycine loses its proton, equimolar concentrations of proton-donor (+H3N–CH2–COOH) and proton-acceptor (+H3N–CH2–COO) species are present. At the midpoint of a titration (see Fig. 4–11), the pH is equal to the pKa, of the protonated group being titrated. For glycine, the pH at the midpoint is 2.34, thus its –COOH group has a pKa of 2.34. [Recall that pH and pKa are simply convenient notations for proton concentration and the equilibrium constant for ionization, respectively (Chapter 4). The pKa is a measure of the tendency of a group to give up a proton, with that tendency decreasing tenfold as the pKa increases by one unit.] As the titration proceeds, another important point is reached at pH 5.97. Here there is a point of inflection, at which removal of the first proton is essentially complete, and removal of the second has just begun. At this pH the glycine is present largely as the dipolar ion +H3N–CH2–COO. We shall return to the significance of this inflection point in the titration curve shortly.

The second stage of the titration corresponds to the removal of a proton from the –NH3+ group of glycine. The pH at the midpoint of this stage is 9.60, equal to the pKa for the –NH3+ group. The titration is complete at a pH of about 12, at which point the predominant form of glycine is H2N–CH2–COO.

From the titration curve of glycine we can derive several important pieces of information. First, it gives a quantitative measure of the pKa of each of the two ionizing groups, 2.34 for the –COOH group and 9.60 for the –NH3+ group. Note that the carboxyl group of glycine is over 100 times more acidic (more easily ionized) than the carboxyl group of acetic acid, which has a pKa, of 4.76. This effect is caused by the nearby positively charged amino group on the α-carbon atom, as described in Figure 5–10.

The second piece of information given by the titration curve of glycine (Fig. 5–9) is that this amino acid has two regions of buffering power (see Fig. 4–12). One of these is the relatively flat portion of the curve centered about the first pKa of 2.34, indicating that glycine is a good buffer near this pH. The other buffering zone extends for ~1.2 pH units centered around pH 9.60. Note also that glycine is not a good buffer at the pH of intracellular fluid or blood, about 7.4.

The Henderson–Hasselbalch equation (Chapter 4) can be used to calculate the proportions of proton-donor and proton-acceptor species of glycine required to make a buffer at a given pH within the buffering ranges of glycine; it also makes it possible to solve other kinds of buffer problems involving amino acids (see Box 4–2).

Another important piece of information derived from the titration curve of an amino acid is the relationship between its net electric charge and the pH of the solution. At pH 5.97, the point of inflection between the two stages in its titration curve, glycine is present as its dipolar form, fully ionized but with no net electric charge (Fig. 5–9). This characteristic pH is called the isoelectric point or isoelectric pH, designated pI or pHI. For an amino acid such as glycine, which has no ionizable group in the side chain, the isoelectric point is the arithmetic mean of the two pKa values:

pI = (pK1 + pK2) / 2
which in the case of glycine is

pI = (2.34 + 9.60) / 2 = 5.97

As is evident in Figure 5–9, glycine has a net negative charge at any pH above its pI and will thus move toward the positive electrode (the anode) when placed in an electric field. At any pH below its pI, glycine has a net positive charge and will move toward the negative electrode, the cathode. The farther the pH of a glycine solution is from its isoelectric point, the greater the net electric charge of the population of glycine molecules. At pH 1.0, for example, glycine exists entirely as the form +H3N–CH2–COOH, with a net positive charge of 1.0. At pH 2.34, where there is an equal mixture of +H3N–CH2–COOH and +H3N–CH2–COO, the average or net positive charge is 0.5. The sign and the magnitude of the net charge of any amino acid at any pH can be predicted in the same way.

This information has practical importance. For a solution containing a mixture of amino acids, the different amino acids can be separated on the basis of the direction and relative rate of their migration when placed in an electric field at a known pH.

Figure 5–11  The titration curves of (a) glutamate and (b) histidine. The pKa of the R group is designated pKR.

The shared properties of many amino acids permit some simplifying generalizations about the acid–base behavior of different classes of amino acids.

All amino acids with a single α-amino group, a single α-carboxyl group, and an R group that does not ionize have titration curves resembling that of glycine (Fig. 5–9). This group of amino acids is characterized by having very similar, although not identical, values for pK1 (the pK of the –COOH group) in the range of 1.8 to 2.4 and for pK2 (of the –NH3+ group) in the range of 8.8 to 11.0 (Table 5–1).

Amino acids with an ionizable R group (Table 5–1) have more complex titration curves with three stages corresponding to the three possible ionization steps; thus they have three pKa values. The third stage for the titration of the ionizable R group merges to some extent with the others. The titration curves of two representatives of this group, glutamate and histidine, are shown in Figure 5–11. The isoelectric points of amino acids in this class reflect the type of ionizing R groups present. For example, glutamate has a pI of 3.22, considerably lower than that of glycine. This is a result of the presence of two carboxyl

groups which, at the average of their pKa values (3.22), contribute a net negative charge of –1 that balances the +1 contributed by the amino group. Similarly, the pI of histidine, with two groups that are positively charged when protonated, is 7.59 (the average of the pKa values of the amino and imidazole groups), much higher than that of glycine.

Another important generalization can be made about the acid–base behavior of the 20 standard amino acids. Under the general condition of free and open exposure to the aqueous environment, only histidine has an R group (pKa = 6.0) providing significant buffering power near the neutral pH usually found in the intracellular and intercellular fluids of most animals and bacteria. All the other amino acids have pKa values too far away from pH 7 to be effective physiological buffers (Table 5–1), although in the interior of proteins the pKa values of amino acid side chains are often altered.

Figure 5–12  Ion-exchange chromatography. An example of a cation-exchange resin is presented.
(a) Negatively charged sulfonate groups (–SO3) on the resin surface attract and bind cations, such as H+, Na+, or cationic forms of amino acids. (b) An acidic solution (pH 3.0) of the amino acid mixture is poured on a column packed with resin and allowed to percolate through slowly. At pH 3.0 the amino acids are largely cations with net positive charges, but they differ in the pKa, values of their R groups, and hence in the extent to which they are ionized and in their tendency to bind to the anionic resin. As a result, they move through the column at different rates.

Ion-exchange chromatography is the most widely used method for separating, identifying, and quantifying the amounts of each amino acid in a mixture. This technique primarily exploits differences in the sign and magnitude of the net electric charges of amino acids at a given pH, which are predictable from their pKa values or their titration curves.

The chromatographic column consists of a long tube filled with particles of a synthetic resin containing fixed charged groups; those with fixed anionic groups are called cation-exchange resins and those with fixed cationic groups, anion-exchange resins. A simple form of ion-exchange chromatography on a cation-exchange resin is described in Figure 5–12. The affinity of each amino acid for the resin is affected by pH (which determines the ionization state of the molecule) and the concentration of other salt ions that may compete with the resin by associating with the amino acid. Separation of amino acids can therefore be optimized by gradually changing the pH and/or salt concentration of the solution being passed through the column so as to create a pH or salt gradient. A modern enhancement of this and other chromatographic techniques is called high-performance liquid chromatography (HPLC). This takes advantage of stronger resin material and improved apparatus designed to permit chromatography at high pressures, allowing better separations in a much shorter time. For amino acids, the entire procedure has been automated, so that elution, collection of fractions, analysis of each fraction, and recording of data are performed automatically in an amino-acid analyzer. Figure 5–13 shows a chromatogram of an amino acid mixture analyzed in this way.

Figure 5–13  Automatically recorded high-performance liquid chromatographic analysis of amino acids on a cation-exchange resin. The area under each peak on the chromatogram is proportional to the amount of each amino acid in the mixture.
Figure 5–14  Reagents that react with the α-amino group of amino acids. The reactions producing 2,4-dinitrophenyl and fluorescamine derivatives are illustrated. The reactions of dansyl chloride and dabsyl chloride are similar to that of 1-fluoro-2,4-dinitrobenzene (Sanger’s reagent). Because the derivatives of these reagents absorb light, they greatly facilitate the detection and quantification of the amino acids.

As for all organic compounds, the chemical reactions of amino acids are those characteristic of their functional groups. Because all amino acids contain amino and carboxyl groups, all will undergo chemical reactions characteristic for these groups. For example, their amino groups can be acetylated or formylated, and their carboxyl groups can be esterified. We will not examine all such organic reactions of amino acids, but several widely used reactions are noteworthy because they greatly simplify the detection, measurement, and identification of amino acids.

One of the most important, technically and historically, is the ninhydrin reaction, which has been used for many years to detect and quantify microgram amounts of amino acids. When amino acids are heated with excess ninhydrin, all those having a free α-amino group yield a purple product. Proline, in which the α-amino group is substituted (forming an imino group), yields a yellow product. Under appropriate conditions the intensity of color produced (optical absorbance of the solution; see Box 5–1) is proportional to the amino acid concentration. Comparing the absorbance to that of appropriate standard solutions is an accurate and technically simple method for measuring amino acid concentration.

Several other convenient reagents are available that react with the α-amino group to form colored or fluorescent derivatives. Unlike ninhydrin, these have the advantage that the intact R group of the amino acid remains part of the product, so that derivatives of different amino acids can be distinguished. Fluorescamine reacts rapidly with amino acids and provides great sensitivity, yielding a highly fluorescent derivative that permits the detection of nanogram quantities of amino acids (Fig. 5–14). Dabsyl chloride, dansyl chloride, and 1-fluoro-2,4-dinitrobenzene (Fig. 5–14) yield derivatives that are stable under harsh conditions such as those used in the hydrolysis of proteins.

www.bioinfo.org.cn/book/biochemistry/chapt05/bio2.htm

We now turn to polymers of amino acids, the peptides. Biologically occurring peptides range in size from small molecules containing only two or three amino acids to macromolecules containing thousands of amino acids. The focus here is on the structure and chemical properties of the smaller peptides, providing a prelude to the discussion of the large peptides called proteins in the next two chapters.

Figure 5–15  Formation of a peptide bond (shaded in gray) in a dipeptide. This is a condensation reaction. The α-amino group of amino acid 2 acts as a nucleophile (see Table 3–6) to displace the hydroxyl group of amino acid 1 (red). Amino groups are good nucleophiles, but the hydroxyl group is a poor leaving group and is not readily displaced. At physiological pH the reaction as shown does not occur to any appreciable extent. Peptide bond formation is endergonic, with a free-energy change of about +21 kJ/mol.
Figure 5–16  Structure of the pentapeptide serylglycyltyrosinylalanylleucine, or Ser-Gly-Tyr-Ala-Leu. Peptides are named beginning with the amino-terminal residue, which by convention is placed at the left. The peptide bonds are shown shaded in gray, the R groups in red.

Two amino acid molecules can be covalently joined through a substituted amide linkage, termed a peptide bond, to yield a dipeptide. Such a linkage is formed by removal of the elements of water from the α-carboxyl group of one amino acid and the α-amino group of another (Fig. 5–15). Peptide-bond formation is an example of a condensation reaction, a common class of reaction in living cells. Note that as shown in Figure 5–15, this reaction has an equilibrium that favors reactants rather than products. To make the reaction thermodynamically more favorable, the carboxyl group must be chemically modified or activated so that the hydroxyl group can be more readily eliminated. A chemical approach to this problem is outlined at the end of this chapter (see Box 5–2). The biological approach to peptide bond formation is a major topic of Chapter 26.

Three amino acids can be joined by two peptide bonds to form a tripeptide; similarly, amino acids can be linked to form tetrapeptides and pentapeptides. When a few amino acids are joined in this fashion, the structure is called an oligopeptide. When many amino acids are joined, the product is called a polypeptide. Proteins may have thousands of amino acid units. Although the terms "protein" and "polypeptide" are sometimes used interchangeably, molecules referred to as polypeptides generally have molecular weights below 10,000.

Figure 5–16 shows the structure of a pentapeptide. The amino acid units in a peptide are often called residues (each has lost a hydrogen atom from its amino group and a hydroxyl moiety from its carboxyl group). The amino acid residue at that end of a peptide having a free a-amino group is the amino-terminal (or N-terminal) residue; the residue at the other end, which has a free carboxyl group, is the carboxyl-terminal (C-terminal) residue. By convention, short peptides are named from the sequence of their constituent amino acids, beginning at the left with the amino-terminal residue and proceeding toward the carboxyl terminus at the right (Fig. 5–16).

Although hydrolysis of peptide bonds is an exergonic reaction, it occurs slowly because of its high activation energy. As a result, the peptide bonds in proteins are quite stable under most intracellular conditions.

The peptide bond is the single most important covalent bond linking amino acids in peptides and proteins. The only other type of covalent bond that occurs frequently enough to deserve special mention here is the disulfide bond sometimes formed between two cysteine residues. Disulfide bonds play a special role in the structure of many proteins, particularly those that function extracellularly, such as the hormone insulin and the immunoglobulins or antibodies.

Figure 5–17  Ionization and electric charge of peptides. The groups ionized at pH 7.0 are in red.
(a) A tetrapeptide with two ionizable R groups. (b) The cationic, isoelectric, and anionic forms of a dipeptide lacking ionizable R groups.

Peptides contain only one free α-amino group and one free α-carboxyl group (Fig. 5–17). These groups ionize as they do in simple amino acids, although the ionization constants are different because the oppositely charged group is absent from the α carbon. The α-amino and α-carboxyl groups of all other constituent amino acids are covalently joined in the form of peptide bonds, which do not ionize and thus do not contribute to the total acid–base behavior of peptides. However, the R groups of some amino acids can ionize (Table 5–1), and in a peptide these contribute to the overall acid–base properties (Fig. 5–17). Thus the acid–base behavior of a peptide can be predicted from its single free α-amino and α-carboxyl groups and the nature and number of its ionizable R groups. Like free amino acids, peptides have characteristic titration curves and a characteristic isoelectric pH at which they do not move in an electric field. These properties are exploited in some of the techniques used to separate peptides and proteins (Chapter 6).

Figure 5–18  The amino-terminal residue of a tetrapeptide can be identified by labeling it with dabsyl chloride, then hydrolyzing the peptide bonds in strong acid. The result is a mixture of amino acids of which only the amino-terminal amino acid (and lysine) is labeled.

Like other organic molecules, peptides undergo chemical reactions that are characteristic of their functional groups: the free amino and carboxyl groups and the R groups.

Peptide bonds can be hydrolyzed by boiling with either strong acid (typically 6 M HCl) or base to yield the constituent amino acids.

Hydrolysis of peptide bonds in this manner is a necessary step in determining the amino acid composition of proteins. The reagents shown in Figure 5–14 label only free amino groups: those of the amino-terminal residue and the R groups of any lysines present. If dabsyl chloride, dansyl chloride, or 1-fluoro-2,4-dinitrobenzene is used before acid hydrolysis of the peptide, the amino-terminal residue can be separated and identified (Fig. 5–18).

Peptide bonds can also be hydrolyzed by certain enzymes called proteases. Proteolytic (protein-cleaving) enzymes are found in all cells and tissues, where they degrade unneeded or damaged proteins or aid in the digestion of food.

Figure 5–19  Some naturally occurring peptides with intense biological activity. The amino-terminal residues are at the left end. (a) Bradykinin, a hormonelike peptide that inhibits inflammatory reactions. (b) Oxytocin, formed by the posterior pituitary gland. The shaded portion is a residue of glycinamide (H2N–CH2–CONH2). (c) Thyrotropin-releasing factor, formed by the hypothalamus. (d) Two enkephalins, brain peptides that affect the perception of pain. (e) Gramicidin S, an antibiotic produced by the bacterium Bacillus brevis. The arrows indicate the direction from the amino toward the carboxyl end of each residue. The peptide has no termini because it is circular. Orn is the symbol for ornithine, an amino acid that generally does not occur in proteins. Note that gramicidin S contains two residues of a D-amino acid (D-phenylalanine).

Much of the material in the chapters to follow will revolve around the activities of proteins with molecular weights measured in the tens and even hundreds of thousands. Not all polypeptides are so large, however. There are many naturally occurring small polypeptides and oligopeptides, some of which have important biological activities and exert their effects at very low concentrations. For example, a number of vertebrate hormones (intercellular chemical messengers) (Chapter 22) are small polypeptides. The hormone insulin contains two polypeptide chains, one having 30 amino acid residues and the other 21. Other polypeptide hormones include glucagon, a pancreatic hormone of 29 residues that opposes the action of insulin, and corticotropin, a 39-

residue hormone of the anterior pituitary gland that stimulates the adrenal cortex.

Some biologically important peptides have only a few amino acid residues. That small peptides can have large biological effects is readily illustrated by the activity of the commercially synthesized dipeptide, L-aspartylphenylalanyl methyl ester. This compound is an artificial sweetener better known as aspartame or NutraSweet®:

Among naturally occurring small peptides are hormones such as oxytocin (nine amino acid residues), which is secreted by the posterior pituitary and stimulates uterine contractions; bradykinin (nine residues), which inhibits inflammation of tissues; and thyrotropin-releasing factor (three residues), which is formed in the hypothalamus and stimulates the release of another hormone, thyrotropin, from the anterior pituitary gland (Fig. 5–19). Also noteworthy among short peptides are the enkephalins, compounds formed in the central nervous system

that bind to receptors in certain cells of the brain and induce analgesia (deadening of pain sensations). Enkephalins represent one of the body’s own mechanisms for control of pain. The enkephalin receptors also bind morphine, heroin, and other addicting opiate drugs (although these are not peptides). Some extremely toxic mushroom poisons, such as amanitin, are also peptides, as are many antibiotics.
A growing number of small peptides are proving to be important commercially as pharmaceutical reagents. Unfortunately, they are often present in exceedingly small amounts and hence are hard to purify. For these and other reasons, the chemical synthesis of peptides has become one of the major technologies associated with biochemistry (Box 5–2).

B O X  5–2
Chemical Synthesis of Peptides and Small Proteins

Many peptides are potentially useful as pharmacological reagents, and their synthesis is of considerable commercial importance. There are three ways to obtain a peptide: (1) purification from tissue, a task often made difficult by the vanishingly low concentrations of some peptides; (2) genetic engineering; or (3) direct chemical synthesis. Powerful techniques now make direct chemical synthesis an attractive option in many cases. In addition to commercial applications, the synthesis of specific peptide portions of larger proteins is an increasingly important tool for the study of protein structure and function.

The complexity of proteins makes the traditional synthetic approaches of organic chemistry impractical for peptides with more than four or five amino acids. One problem is the difficulty of purifying the product after each step, because the chemical properties of the peptide change each time a new amino acid is added.

The major breakthrough in this technology was provided by R. Bruce Merrifield. His innovation involved synthesizing a peptide while keeping it attached at one end to a solid support. The support is an insoluble polymer (resin) contained within a column, similar to that used for chromatographic procedures. The peptide is built up on this support one amino acid at a time using a standard set of reactions in a repeating cycle (Fig. 1).

The technology for chemical peptide synthesis has been automated, and several commercial instruments are now available. The most important limitation of the process involves the efficiency of each amino acid addition, as can be seen by calculating the overall yields of peptides of various lengths when the yield for addition of each new amino acid is 96.0 versus 99.8% (Table 1). The chemistry has been optimized to permit the synthesis of proteins 100 amino acids long in about 4 days in reasonable yield. A very similar approach is used to synthesize nucleic acids (Fig. 12–38). It is worth noting that this technology, impressive as it is, still pales when compared with biological processes. The same 100 amino-acid protein would be synthesized with exquisite fidelity in about 5 seconds in a bacterial cell.

 

 

Figure 1 Chemical synthesis of a peptide on a solid support. Reactions through are necessary for the formation of each peptide bond.

www.bioinfo.org.cn/book/biochemistry/chapt05/bio3.htm
Summary

The 20 amino acids commonly found as hydrolysis products of proteins contain an α-carboxyl group, an α-amino group, and a distinctive R group substituted on the α-carbon atom. The α-carbon atom of the amino acids (except glycine) is asymmetric, and thus amino acids can exist in at least two stereoisomeric forms. Only the L stereoisomers, which are related to the absolute configuration of L-glyceraldehyde, are found in proteins. The amino acids are classified on the basis of the polarity of their R groups. The nonpolar, aliphatic class includes alanine, glycine, isoleucine, leucine, proline, and valine. Phenylalanine, tryptophan, and tyrosine have aromatic side chains and are also relatively hydrophobic. The polar, uncharged class includes asparagine, cysteine, glutamine, methionine, serine, and threonine. The negatively charged (acidic) amino acids are aspartate and glutamate; the positively charge (basic) ones are arginine, histidine, and lysine. There are also a large ndmber of nonstandard amino acids that occur in some proteins (as a result of the modification of standard amino acids) or as free metabolites in cells.

Monoamino monocarboxylic amino acids are diprotic acids (+H3NCH(R)COOH) at low pH. As the pH is raised to about 6, near the isoelectric point, the proton is lost from the carboxyl group to form the dipolar or zwitter­ionic species +H3NCH(R)COO, which is electrically neutral. Further increase in pH causes loss of the second

proton, to yield the ionic species H2NCH(R)COO. Amino acids with ionizable R groups may exist in additional ionic species, depending on the pH and the pKa of the R group. Thus amino acids vary in their acid–base properties. Amino acids form colored derivatives with ninhydrin. Other colored or fluorescent derivatives are formed in reactions of the α-amino group of amino acids with fluorescamine, dansyl chloride, dabsyl chloride, and 1-fluoro-2,4-dinitrobenzene. Complex mixtures of amino acids can be separated and identified by ion-exchange chromatography or HPLC.

Amino acids can be joined covalently through peptide bonds to form peptides, which can also be formed by incomplete hydrolysis of polypeptides. The acid–base behavior and chemical reactions of a peptide are functions of its amino-terminal amino group, its carboxyl-terminal carboxyl group, and its R groups. Peptides can be hydrolyzed to yield free amino acids. Some peptides occur free in cells and tissues and have specific biological functions. These include some hormones and antibiotics, as well as other peptides with powerful biological activity.

Further Reading

General

Cantor, C.R. & Schimmel, P.R. (1980) Biophysical Chemistry, Part I: The Conformation of Biological Macromolecules, W.H. Freeman and Company, San Francisco. 
Excellent textbook outlining the properties of biological macromolecules and their monomeric subunits.

Creighton, T.E. (1984) Proteins: Structures and Molecular Properties, W.H. Freeman and Company, New York. 
Very useful general source.

Dickerson, R.E. & Geis, I. (1983) Proteins: Structure, Function, and Evolution, 2nd edn, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 
Beautifully illustrated and interesting account.

Amino Acids

Corrigan, J.J. (1969) D-Amino acids in animals. Science 169, 142–148. 

Meister, A. (1965) Biochemistry of the Amino Acids, 2nd edn, Vols. 1 and 2, Academic Press, Inc., New York. 
Encyclopedic treatment of the properties, occurrence, and metabolism of amino acids.

Montgomery, R. & Swenson, C.A. (1976) Quantitative Problems in the Biochemical Sciences, 2nd edn, W.H. Freeman and Company, New York. 

Segel, I.H. (1976) Biochemical Calculations, 2nd edn, John Wiley & Sons, New York. 

Peptides

Haschemeyer, R.H. & Haschemeyer, A.E.V. (1973) Proteins: A Guide to Study by Physical and Chemical Methods, John Wiley & Sons, New York. 

Merrifield, B. (1986) Solid phase synthesis. Science 232, 341–347. 

Smith, L.M. (1988) Automated synthesis and sequence analysis of biological macromolecules. Analyt. Chem. 60, 381A–390A. 

Problems

1. Absolute Configuration of Citrulline  Is citrulline isolated from watermelons (shown below) a D- or L-amino acid? Explain.

2. Relation between the Structures and Chemical Properties of the Amino Acids  The structures and chemical properties of the amino acids are crucial to understanding how proteins carry out their biological functions. The structures of the side chains of 16 amino acids are given below. Name the amino acid that contains each structure and match the R group with the most appropriate description of its properties, (a) to (m). Some of the descriptions may be used more than once.

(a) Small polar R group containing a hydroxyl group; this amino acid is important in the active site of some enzymes.

(b) Provides the least amount of steric hindrance.

(c) R group has pKa ≈ 10.5, making it positively charged at physiological pH.

(d) Sulfur-containing R group; neutral at any pH.

(e) Aromatic R group, hydrophobic in nature and neutral at any pH.

(f) Saturated hydrocarbon, important in hydrophobic interactions.

(g) The only amino acid having an ionizing R group with a pKa near 7; it is an important group in the active site of some enzymes.

(h) The only amino acid having a substituted α-amino group; it influences protein folding by forcing a bend in the chain.

(i) R group has a pKa near 4 and thus is negatively charged at pH 7.

(j) An aromatic R group capable of forming hydrogen bonds; it has a pKa near 10.

(k) Forms disulfide cross-links between polypeptide chains; the pKa of its functional group is about 10.

(l) R group with pKa ≈ 12, making it positively charged at physiological pH.

(m) When this polar but uncharged R group is hydrolyzed, the amino acid is converted into another amino acid having a negatively charged R group at pH near 7.

3. Relationship between the Titration Curve and the Acid–Base Properties of Glycine  A 100 mL solution of 0.1 M glycine at pH 1.72 was titrated with 2 M NaOH solution. During the titration, the pH was monitored and the results were plotted in the graph shown. The key points in the titration are designated I to V on the graph. For each of the statements below, identify the appropriate key point in the titration and justify your choice.

       (a) At what point will glycine be present predominantly as the species +H3N–CH2–COOH?
       (b) At what point is the average net charge of glycine +½?
       (c) At what point is the amino group of half of the molecules ionized?
       (d) At what point is the pH equal to the pKa of the carboxyl group?
       (e) At what point is the pH equal to the pKa of the protonated amino group?
       (f) At what points does glycine have its maximum buffering capacity?
       (g) At what point is the average net charge zero?
       (h) At what point has the carboxyl group been completely titrated (first equivalence point)?
       (i) At what point are half of the carboxyl groups ionized?
       (j) At what point is glycine completely titrated (second equivalence point)?
       (k) At what point is the structure of the predominant species +H3N–CH2–COO?
       (l) At what point do the structures of the predominant species correspond to a 50:50 mixture of +H3N–CH2–COO and H2N–CH2–COO?
       (m) At what point is the average net charge of glycine –1?
       (n) At what point do the structures of the predominant species consist of a 50:50 mixture of +H3N–CH2–COOH and +H3N–CH2–COO?
       (o) What point corresponds to the isoelectric point?
       (p) At what point is the average net charge on glycine –½?
       (q) What point represents the end of the titration?
       (r) If one wanted to use glycine as an efficient buffer, which points would represent the worst pH regions for buffering power?
       (s) At what point in the titration is the predominant species H2N–CH2–COO?

4. How Much Alanine Is Present as the Completely Uncharged Species?  At a pH equal to the isoelectric point, the net charge on alanine is zero. Two structures can be drawn that have a net charge of zero (zwitterionic and uncharged forms), but the predominant form of alanine at its pI is zwitterionic.

       (a) Explain why the form of alanine at its pI is zwitterionic rather than completely uncharged.

       (b) Estimate the fraction of alanine present at its pI as the completely uncharged form. Justify your assumptions.

5. Ionization State of Amino Acids  Each ionizable group of an amino acid can exist in one of two states, charged or neutral. The electric charge on the functional group is determined by the relationship between its pKa and the pH of the solution. This relationship is described by the Henderson–Hasselbalch equation.

       (a) Histidine has three ionizable functional groups. Write the relevant equilibrium equations for its three ionizations and assign the proper pKa for each ionization. Draw the structure of histidine in each ionization state. What is the net charge on the histidine molecule in each ionization state?
       (b) Draw the structures of the predominant ionization state of histidine at pH 1, 4, 8, and 12. Note that the ionization state can be approximated by treating each ionizable group independently.
       (c) What is the net charge of histidine at pH 1, 4, 8, and 12? For each pH, will histidine migrate toward the anode (+) or cathode (–) when placed in an electric field?

6. Preparation of a Glycine Buffer  Glycine is commonly used as a buffer. Preparation of a 0.1 M glycine buffer starts with 0.1 M solutions of glycine hydrochloride (HOOC–CH2–NH3+Cl) and

glycine (OOC–CH2–NH3+), two commercially available forms of glycine. What volumes of these two solutions must be mixed to prepare 1 L of 0.1 M glycine buffer having a pH of 3.2? (Hint: See Box 4–2)

7. Separation of Amino Acids by Ion-Exchange Chromatography  Mixtures of amino acids are analyzed by first separating the mixture into its components through ion-exchange chromatography. On a cation-exchange resin containing sulfonate groups (see Fig. 5–12), the amino acids flow down the column at different rates because of two factors that retard their movement: (1) ionic attraction between the –SO3 residues on the column and positively charged functional groups on the amino acids and (2) hydrophobic interaction between amino acid side chains and the strongly hydrophobic backbone of the polystyrene resin. For each pair of amino acids listed, determine which member will be eluted first from an ion-exchange column by a pH 7.0 buffer.

       (a) Asp and Lys
       (b) Arg and Met
       (c) Glu and Val
       (d) Gly and Leu
       (e) Ser and Ala

8. Naming the Stereoisomers of Isoleucine  The structure of the amino acid isoleucine is:

       (a) How many chiral centers does it have?

       (b) How many optical isomers?
       (c) Draw perspective formulas for all the optical isomers of isoleucine.

9. Comparison of the pKa Values of an Amino Acid and Its Peptides  The titration curve of the amino acid alanine shows the ionization of two functional groups with pKa values of 2.34 and 9.69, corresponding to the ionization of the carboxyl and the protonated amino groups, respectively. The titration of di-, tri-, and larger oligopeptides of alanine also shows the ionization of only two functional groups, although the experimental pKa values are different. The trend in pKa values is summarized in the table.

       (a) Draw the structure of Ala–Ala–Ala. Identify the functional groups associated with pK1 and pK2.

       (b) The value of pK1 increases in going from Ala to an Ala oligopeptide. Provide an explanation for this trend.
       (c) The value of pK2 decreases in going from Ala to an Ala oligopeptide. Provide an explanation for this trend.

10. Peptide Synthesis  In the synthesis of polypeptides on solid supports, the α-amino group of each new amino acid is "protected" by a t-butyloxycarbonyl group (see Box 5–2). What would happen if this protecting group were not present?

www.bioinfo.org.cn/book/biochemistry/chapt05/bio4.htm
Chapter 6
An Introduction to Proteins
Almost everything that occurs in the cell involves one or more proteins. Proteins provide structure, catalyze cellular reactions, and carry out a myriad of other tasks. Their central place in the cell is reflected in the fact that genetic information is ultimately expressed as protein. For each protein there is a segment of DNA (a gene; see Chapters 12 and 23) that encodes information specifying its sequence of amino acids. There are thousands of different kinds of proteins in a typical cell, each encoded by a gene and each performing a specific function. Proteins are among the most abundant biological macromolecules and are also extremely versatile in their functions.

The chapter begins with a discussion of some of the general properties of proteins. This is followed by a short summary of some common techniques used to purify and study proteins. Finally, we will examine the primary structure of protein molecules: the covalent backbone structure and the sequence of amino acid residues. One goal is to discover the relationships between amino acid sequence and biological function.

An understanding of these important macromolecules must begin with the fundamentals. What do proteins do? How big are they? What forms or shapes do they take? What are their chemical properties? The answers serve as an orientation to much that follows.

Figure 6–1  Functions of proteins. (a) The light produced by fireflies is the result of a light-producing reaction involving luciferin and ATP that is catalyzed by the enzyme luciferase (see Box 13–3). (b) Erythrocytes contain large amounts of the oxygen-transporting protein hemoglobin. (c) The white color of milk is derived primarily from the protein casein. (d) The movement of cilia in protozoans depends on the action of the protein dynein. (e) The protein fibroin is the major structural component of spider webs. (f) Castor beans contain a highly toxic protein called ricin. (g) Cancerous tumors are often made up of cells that have defects involving one or more of the proteins that regulate cell division.

We can classify proteins according to their biological roles.

Enzymes  The most varied and most highly specialized proteins are those with catalytic activity – the enzymes. Virtually all the chemical reactions of organic biomolecules in cells are catalyzed by enzymes. Many thousands of different enzymes, each capable of catalyzing a different kind of chemical reaction, have been discovered in different organisms (Fig. 6–1a).

Transport Proteins  Transport proteins in blood plasma bind and carry specific molecules or ions from one organ to another. Hemoglobin of erythrocytes (Fig. 6–1b) binds oxygen as the blood passes through the lungs, carries it to the peripheral tissues, and there releases it to participate in the energy-yielding oxidation of nutrients. The blood

plasma contains lipoproteins, which carry lipids from the liver to other organs. Other kinds of transport proteins are present in the plasma membranes and intracellular membranes of all organisms; these are adapted to bind glucose, amino acids, or other substances and transport them across the membrane.

Nutrient and Storage Proteins  The seeds of many plants store nutrient proteins required for the growth of the germinating seedling. Particularly well-studied examples are the seed proteins of wheat, corn, and rice. Ovalbumin, the major protein of egg white, and casein, the major protein of milk, are other examples of nutrient proteins (Fig. 6–1c). The ferritin found in some bacteria and in plant and animal tissues stores iron.

Contractile or Motile Proteins  Some proteins endow cells and organisms with the ability to contract, to change shape, or to move about. Actin and myosin function in the contractile system of skeletal muscle and also in many nonmuscle cells. Tubulin is the protein from which microtubules are built. Microtubules act in concert with the protein dynein in flagella and cilia (Fig. 6–1d) to propel cells.

Structural Proteins  Many proteins serve as supporting filaments, cables, or sheets, to give biological structures strength or protection. The major component of tendons and cartilage is the fibrous protein collagen, which has very high tensile strength. Leather is almost pure collagen. Ligaments contain elastin, a structural protein capable of stretching in two dimensions. Hair, fingernails, and feathers consist largely of the tough, insoluble protein keratin. The major component of silk fibers and spider webs is fibroin (Fig. 6–1e). The wing hinges of some insects are made of resilin, which has nearly perfect elastic properties.

Defense Proteins  Many proteins defend organisms against invasion by other species or protect them from injury. The immunoglobulins or antibodies, specialized proteins made by the lymphocytes of vertebrates, can recognize and precipitate or neutralize invading bacteria, viruses, or foreign proteins from another species. Fibrinogen and thrombin are blood-clotting proteins that prevent loss of blood when the vascular system is injured. Snake venoms, bacterial toxins, and toxic plant proteins, such as ricin, also appear to have defensive functions (Fig. 6–1f). Some of these, including fibrinogen, thrombin, and some venoms, are also enzymes.

Regulatory Proteins  Some proteins help regulate cellular or physiological activity. Among them are many hormones. Examples include insulin, which regulates sugar metabolism, and the growth hormone of the pituitary. The cellular response to many hormonal signals is often mediated by a class of GTP-binding proteins called G proteins (GTP is closely related to ATP, with guanine replacing the adenine portion of the molecule; see Figs. 1–12 and 3–16b. ) Other regulatory proteins bind to DNA and regulate the biosynthesis of enzymes and RNA molecules involved in cell division in both prokaryotes and eukaryotes (Fig. 6–1g).

Other Proteins  There are numerous other proteins whose functions are rather exotic and not easily classified. Monellin, a protein of an African plant, has an intensely sweet taste. It is being studied as a

nonfattening, nontoxic food sweetener for human use. The blood plasma of some Antarctic fish contains antifreeze proteins, which protect their blood from freezing.

It is extraordinary that all these proteins, with their very different properties and functions, are made from the same group of 20 amino acids.

How long are the polypeptide chains in proteins? Table 6–1 shows that human cytochrome c has 104 amino acid residues linked in a single chain; bovine chymotrypsinogen has 245 amino acid residues. Probably near the upper limit of size is the protein apolipoprotein B, a cholesterol-transport protein with 4,536 amino acid residues in a single polypeptide chain of molecular weight 513,000. Most naturally occurring polypeptides contain less than 2,000 amino acid residues.

Some proteins consist of a single polypeptide chain, but others, called multisubunit proteins, have two or more (Table 6–1). The individual polypeptide chains in a multisubunit protein may be identical or different. If at least some are identical, the protein is sometimes called an oligomeric protein and the subunits themselves are referred to as protomers. The enzyme ribonuclease has one polypeptide chain. Hemoglobin has four: two identical α chains and two identical β chains, all four held together by noncovalent interactions.

The molecular weights of proteins, which can be determined by various physicochemical methods, may range from little more than 10,000 for small proteins such as cytochrome c (104 residues), to more than 106 for proteins with very long polypeptide chains or those with several subunits. The molecular weights of some typical proteins are given in Table 6–1. No simple generalizations can be made about the molecular weights of proteins in relation to their function.

One can calculate the approximate number of amino acid residues in a simple protein containing no other chemical group by dividing its molecular weight by 110. Although the average molecular weight of the

20 standard amino acids is about 138, the smaller amino acids predominate in most proteins; when weighted for the proportions in which the various amino acids occur in proteins (see Table 5–1), the average molecular weight is nearer to 128. Because a molecule of water (Mr 18) is removed to create each peptide bond, the average molecular weight of an amino acid residue in a protein is about 128 – 18 = 110. Table 6–1 shows the number of amino acid residues in several proteins.

As is true for simple peptides, hydrolysis of proteins with acid or base yields a mixture of free α-amino acids. When completely hydrolyzed, each type of protein yields a characteristic proportion or mixture of the different amino acids. Table 6–2 shows the composition of the amino acid mixtures obtained on complete hydrolysis of human cytochrome c and of bovine chymotrypsinogen, the inactive precursor of the digestive enzyme chymotrypsin. These two proteins, with very different functions, also differ significantly in the relative numbers of each kind of amino acid they contain. The 20 amino acids almost never occur in equal amounts in proteins. Some amino acids may occur only once per molecule or not at all in a given type of protein; others may occur in large numbers.

Many proteins, such as the enzymes ribonuclease and chymotrypsinogen, contain only amino acids and no other chemical groups; these are considered simple proteins. However, some proteins contain chemical components in addition to amino acids; these are called conjugated proteins. The non-amino acid part of a conjugated protein is usually called its prosthetic group. Conjugated proteins are classified on the basis of the chemical nature of their prosthetic groups (Table 6–3); for example, lipoproteins contain lipids, glycoproteins contain sugar groups, and metalloproteins contain a specific metal. A number of proteins contain more than one prosthetic group. Usually the prosthetic group plays an important role in the protein’s biological function.

The aggregate biochemical picture of protein structure and function is derived from the study of many individual proteins. To study a protein in any detail it must be separated from all other proteins in a cell, and techniques must be available to determine its properties. The necessary methods come from protein chemistry, a discipline as old as biochemistry itself and one that retains a central position in biochemical research. Modern techniques are providing ever newer experimental insights into the critical relationship between the structure of a protein and its function.

Figure 6–2  Two types of chromatographic methods used in protein purification. (a) Size-exclusion chromatography; also called gel filtration. This method separates proteins according to size. The column contains a cross-linked polymer with pores of selected size. Larger proteins migrate faster than smaller ones, because they are too large to enter the pores in the beads and hence take a more direct route through the column. The smaller proteins enter the pores and are slowed by the more labyrinthian path they take through the column. (b) Affinity chromatography separates proteins by their binding specificities. The proteins retained on the column are those that bind specifically to a ligand cross-linked to the beads. (In biochemistry, the term "ligand" is used to refer to a group or molecule that is bound.) After nonspecific proteins are washed through the column, the bound protein of particular interest is eluted by a solution containing free ligand.

Cells contain thousands of different kinds of proteins. A pure preparation of a given protein is essential before its properties, amino acid composition, and sequence can be determined. How, then, can one protein be purified?

Methods for separating proteins take advantage of properties such as charge, size, and solubility, which vary from one protein to the next. Because many proteins bind to other biomolecules, proteins can also be separated on the basis of their binding properties. The source of a protein is generally tissue or microbial cells. The cells must be broken open and the protein released into a solution called a crude extract. If necessary, differential centrifugation can be used to prepare subcellular fractions or to isolate organelles (see Fig. 2–24). Once the extract or organelle preparation is ready, a variety of methods are available for separation of proteins. Ion-exchange chromatography (see Fig. 5–12) can be used to separate proteins with different charges in much the same way that it separates amino acids. Other chromatographic methods take advantage of differences in size, binding affinity, and solubility (Fig. 6–2). Nonchromatographic methods include the selective precipitation of proteins with salt, acid, or high temperatures.

The approach to the purification of a "new" protein, one not previously isolated, is guided both by established precedents and common sense. In most cases, several different methods must be used sequentially to completely purify a protein. The choice of method is somewhat empirical, and many protocols may be tried before the most effective is determined. Trial and error can often be minimized by using purification procedures developed for similar proteins as a guide. Published purification protocols are available for many thousands of proteins. Common sense dictates that inexpensive procedures be used first, when the total volume and number of contaminants is greatest. Chromatographic methods are often impractical at early stages because the amount of chromatographic medium needed increases with sample size. As each purification step is completed, the sample size generally becomes smaller (Table 6–4) and more sophisticated (and expensive) chromatographic procedures can be applied.

Figure 6–3  Activity versus specific activity. The difference between these two terms can be illustrated by considering two jars of marbles. The jars contain the same number of red marbles (representing an unknown protein), but different amounts of marbles of other colors. If the marbles are taken to represent proteins, both jars contain the same activity of the protein represented by the red marbles. The second jar, however, has the higher specific activity because here the red marbles represent a much higher fraction of the total.

In order to purify a protein, it is essential to have an assay to detect and quantify that protein in the presence of many other proteins. Often, purification must proceed in the absence of any information about the size and physical properties of the protein, or the fraction of the total protein mass it represents in the extract.

The amount of an enzyme in a given solution or tissue extract can be assayed in terms of the catalytic effect it produces, that is, the increase in the rate at which its substrate is converted to reaction products when the enzyme is present. For this purpose one must know (1) the overall equation of the reaction catalyzed, (2) an analytical procedure for determining the disappearance of the substrate or the appearance of the reaction products, (3) whether the enzyme requires cofactors such as metal ions or coenzymes, (4) the dependence of the enzyme activity on substrate concentration, (5) the optimum pH, and (6) a temperature zone in which the enzyme is stable and has high activity. Enzymes are usually assayed at their optimum pH and at some convenient temperature within the range 25 to 38 °C. Also, very high substrate concentrations are generally required so that the initial reaction rate, which is measured experimentally, is proportional to enzyme concentration (Chapter 8).

By international agreement, 1.0 unit of enzyme activity is defined as the amount of enzyme causing transformation of 1.0 μmol of substrate per minute at 25 °C under optimal conditions of measurement. The term activity refers to the total units of enzyme in the solution. The specific activity is the number of enzyme units per milligram of protein (Fig. 6–3). The specific activity is a measure of enzyme purity: it increases during purification of an enzyme and becomes maximal and constant when the enzyme is pure (Table 6–4).

After each purification step, the activity of the preparation (in units) is assayed, the total amount of protein is determined independently, and their ratio gives the specific activity. Activity and total protein generally decrease with each step. Activity decreases because some loss always occurs due to inactivation or nonideal interactions with chromatographic materials or other molecules in the solution. Total protein decreases because the objective is to remove as much nonspecific protein as possible. In a successful step, the loss of nonspecific protein is much greater than the loss of activity; therefore, specific activity increases even as total activity falls. The data are then assembled in a purification table (Table 6–4). A protein is generally considered pure when further purification steps fail to increase specific activity, and when only a single protein species can be detected (by methods to be described later).

For proteins that are not enzymes, other quantification methods are required. Transport proteins can be assayed by their binding to the molecule they transport, and hormones and toxins by the biological effect they produce; for example, growth hormones will stimulate the growth of certain cultured cells. Some structural proteins represent such a large fraction of a tissue mass that they can be readily extracted and purified without an assay. The approaches are as varied as the proteins themselves.

www.bioinfo.org.cn/book/biochemistry/chapt06/bio1.htm
Figure 6–4  Electrophoresis. (a) Different samples are loaded in wells or depressions at the top of the polyacrylamide gel. The proteins move into the gel when an electric field is applied. The gel minimizes convection currents caused by small temperature gradients, and it minimizes protein movements other than those induced by the electric field. (b) Proteins can be visualized after electrophoresis by treating the gel with a stain such as Coomassie blue, which binds to the proteins but not to the gel itself. Each band on the gel represents a different protein (or protein subunit); smaller proteins are found nearer the bottom of the gel. This gel illustrates the purification of the enzyme RNA polymerase from the bacterium E. coli. The first lane shows the proteins present in the crude cellular extract. Successive lanes show the proteins present after each purification step. The purified protein contains four subunits, as seen in the last lane on the right.
Figure 6–5  Estimating the molecular weight of a protein. The electrophoretic mobility of a protein on an SDS polyacrylamide gel is related to its molecular weight, Mr. (a) Standard proteins of known molecular weight are subjected to electrophoresis (lane 1). These marker proteins can be used to estimate the Mr of an unknown protein (lane 2). (b) A plot of log Mr of the marker proteins versus relative migration during electrophoresis allows the Mr of the unknown protein to be read from the graph.
Figure 6–6  Isoelectric focusing. This technique separates proteins according to their isoelectric points. A stable pH gradient is established in the gel by the addition of appropriate ampholytes. A protein mixture is placed in a well on the gel. With an applied electric field, proteins enter the gel and migrate until each reaches a pH equivalent to its pI. Remember that the net charge of a protein is zero when pH = pI.
Figure 6–7  Two-dimensional electrophoresis. (a) Proteins are first separated by isoelectric focusing. The gel is then laid horizontally on a second gel, and the proteins are separated by SDS polyacrylamide gel electrophoresis. In this two-dimensional gel, horizontal separation reflects differences in pI; vertical separation reflects differences in molecular weight. (b) More than 1,000 different proteins from E. coli can be resolved using this technique.

In addition to chromatography, another important set of methods is available for the separation of proteins, based on the migration of charged proteins in an electric field, a process called electrophoresis. These procedures are not often used to purify proteins in large amounts because simpler alternative methods are usually available and electrophoretic methods often inactivate proteins. Electrophoresis is, however, especially useful as an analytical method. Its advantage is that proteins can be visualized as well as separated, permitting a researcher to estimate quickly the number of proteins in a mixture or the degree of purity of a particular protein preparation. Also, electrophoresis allows determination of crucial properties of a protein such as its isoelectric point and approximate molecular weight.

In electrophoresis, the force moving the macromolecule (nucleic acids as well as proteins are separated this way) is the electrical potential, E. The electrophoretic mobility of the molecule, μ, is the ratio of the velocity of the particle, V, to the electrical potential. Electrophoretic mobility is also equal to the net charge of the molecule, Z, divided by the frictional coefficient, ƒ. Thus:

μ  =  V / E  =  Z / ƒ

Electrophoresis of proteins is generally carried out in gels made up of the cross-linked polymer polyacrylamide (Fig. 6–4). The polyacrylamide gel acts as a molecular sieve, slowing the migration of proteins approximately in proportion to their mass, or molecular weight.

An electrophoretic method commonly used for estimation of purity and molecular weight makes use of the detergent sodium dodecyl sulfate (SDS). SDS binds to most proteins (probably by hydrophobic interactions; see Chapter 4) in amounts roughly proportional to the molecular weight of the protein, about one molecule of SDS for every two amino acid residues. The bound SDS contributes a large net negative charge, rendering the intrinsic charge of the protein insignificant.

In addition, the native conformation of a protein is altered when SDS is bound, and most proteins assume a similar shape, and thus a similar ratio of charge to mass. Electrophoresis in the presence of SDS therefore separates proteins almost exclusively on the basis of mass (molecular weight), with smaller polypeptides migrating more rapidly. After electrophoresis, the proteins are visualized by adding a dye such as Coomassie blue (Fig. 6–4b) which binds to proteins but not to the gel itself. This type of gel provides one method to monitor progress in isolating a protein, because the number of protein bands should decrease as the purification proceeds. When compared with the positions to which proteins of known molecular weight migrate in the gel, the position of an unknown protein can provide an excellent measure of its molecular weight (Fig. 6–5). If the protein has two or more different subunits, each subunit will generally be separated by the SDS treatment, and a separate band will appear for each.

Isoelectric focusing is a procedure used to determine the isoelectric point (pI) of a protein (Fig. 6–6). A pH gradient is established by allowing a mixture of low molecular weight organic acids and bases (ampholytes; see p. 118) to distribute themselves in an electric field generated across the gel. When a protein mixture is applied, each protein migrates until it reaches the pH that matches its pI. Proteins with different isoelectric points are thus distributed differently throughout the gel (Table 6–5).

Combining these two electrophoretic methods in two-dimensional gels permits the resolution of complex mixtures of proteins (Fig. 6–7). This is a more sensitive analytical method than either isoelectric focusing or SDS electrophoresis alone. Two-dimensional electrophoresis separates proteins of identical molecular weight that differ in pI, or proteins with similar pI values but different molecular weights.

Figure 6–8  The immune response and the action of antibodies. (a) A molecule of immunoglobulin G (IgG) consists of two polypeptides known as heavy chains (white and light blue) and two known as light chains (purple and dark blue). Immunoglobulins are glycoproteins and contain bound carbohydrate (yellow). (b) Each antigen evokes a specific set of antibodies, which will recognize and combine only with that antigen or closely related molecules. (Antibody-binding sites are shown as red areas on the antigen.) The Y-shaped antibodies each have two binding sites for the antigen, and can precipitate the antigen by forming an insoluble, latticelike aggregate.
Figure 6–9  Analytical methods based on the interaction of antibodies with antigen. (a) An enzyme-linked immunosorbent assay (ELISA) used in testing for human pregnancy. Human chorionic gonadotropin (hCG), a hormone produced by the placenta, is detectable in maternal urine a few days after conception. In the ELISA, an antibody specific for hCG is attached to the bottom of a well in a plastic tray, to which a few drops of urine are added. If any hCG is present, it will bind to the antibodies. The tube is then washed, and a second antibody (also specific for hCG) is added. This second antibody is linked to an enzyme that catalyzes the conversion of a colorless compound to a colored one; the amount of colored compound produced provides a sensitive measure of the amount of hormone present. The ELISA has been adapted for use in determining the amount of specific proteins in tissue samples, in blood, or in urine. (b) Immunoblot (or Western blot) technique. Proteins are separated by electrophoresis, then antibodies are used to determine the presence and size

Several sensitive analytical procedures have been developed from the study of a class of proteins called antibodies or immunoglobulins, Antibody molecules appear in the blood serum and certain tissues of a vertebrate animal in response to injection of an antigen, a protein or other macromolecule foreign to that individual. Each foreign protein elicits the formation of a set of different antibodies, which can combine with the antigen to form an antigen–antibody complex. The production of antibodies is part of a general defense mechanism in vertebrates called the immune response.

Antibodies are Y-shaped proteins consisting of four polypeptide chains. They have two binding sites that are complementary to specific structural features of the antigen molecule, making possible the formation of a three-dimensional lattice of alternating antigen and antibody molecules (Fig. 6–8). If sufficient antigen is present in a sample, the addition of antibodies or blood serum from an immunized animal will result in the formation of a quantifiable precipitate. No such precipitate is formed when serum of an unimmunized animal is mixed with the antigen.

Antibodies are highly specific for the foreign proteins or other macromolecules that evoke their formation. It is this specificity that makes them valuable analytical reagents. A rabbit antibody formed to horse serum albumin, for example, will combine with the latter but will not usually combine with other horse proteins, such as horse hemoglobin.

Two types of antibody preparations are in use: polyclonal and monoclonal. Polyclonal antibodies are those produced by many different types (or populations) of antibody-producing cells in an animal immunized with an antigen (in this case a protein). Each type of cell produces an antibody that binds only to a specific, small part of the antigen protein. Consequently, polyclonal preparations contain a mixture of antibodies that recognize different parts of the protein. Monoclonal antibodies, in contrast, are synthesized by a population of identical cells (a clone) grown in cell culture. These antibodies are homogeneous, all recognizing the same specific part of the protein. The techniques for producing monoclonal antibodies were worked out by Georges Köhler and Cesar Milstein.

Antibodies are so exquisitely specific that they can in some cases distinguish between two proteins differing by only a single amino acid.

When a mixture of proteins is added to a chromatography column in which the antibody is covalently attached to a resin, the antibody will specifically bind its target protein and retain it on the column while other proteins are washed through. The target protein can then be eluted from the resin by a salt solution or some other agent. This can be a powerful tool for protein purification.

A variety of other analytical techniques rely on antibodies. In each case the antibody is attached to a radioactive label or some other reagent to make it easy to detect. The antibody binds the target protein, and the label reveals its presence in a solution or its location in a gel or even a living cell. Several variations of this procedure are illustrated in Figure 6–9. We shall examine some other aspects of antibodies in chapters to follow; they are of extreme importance in medicine and also tell much about the structure of proteins and the action of genes.

of the proteins. After separation, the proteins are transferred electrophoretically from an SDS polyacrylamide gel to a special paper (which makes them more accessible). Specific, labeled antibody is added, then the paper is washed to remove unbound antibody. The label can be a radioactive element, a fluorescent compound, or an enzyme as in the ELISA. The position of the labeled antibody defines the Mr of the detected protein. All of the proteins are seen in the stained gel; only the protein bound to the antibody is seen in the immunoblot. (c) In immunocytochemistry, labeled antibodies are introduced into cells to reveal the subcellular location of a specific protein. Here, fluorescently labeled antibodies and a fluorescence microscope have been used to locate tubulin filaments in a human fibroblast.
www.bioinfo.org.cn/book/biochemistry/chapt06/bio2.htm

All proteins in all species, regardless of their function or biological activity, are built from the same set of 20 amino acids (Chapter 5). What is it, then, that makes one protein an enzyme, another a hormone, another a structural protein, and still another an antibody? How do they differ chemically? Quite simply, proteins differ from each other because each has a distinctive number and sequence of amino acid residues. The amino acids are the alphabet of protein structure; they can be arranged in an almost infinite number of sequences to make an almost infinite number of different proteins. A specific sequence of amino acids folds up into a unique three-dimensional structure, and this structure in turn determines the function of the protein.

The amino acid sequence of a protein, or its primary structure, can be very informative to a biochemist. No other property so clearly distinguishes one protein from another. This now becomes the focus of the remainder of the chapter. We first consider empirical clues that amino acid sequence and protein function are closely linked, then describe how amino acid sequence is determined, and finally outline the many uses to which this information can be put.

The bacterium E. coli produces about 3,000 different proteins. A human being produces 50,000 to 100,000 different proteins. In both cases, each separate type of protein has a unique structure and this structure confers a unique function. Each separate type of protein also has a unique amino acid sequence. Intuition suggests that the amino acid sequence must play a fundamental role in determining the three-dimensional structure of the protein, and ultimately its function, but is this expectation correct? A quick survey of proteins and how they vary in amino acid sequence provides a number of empirical clues that help substantiate the important relationship between amino acid sequence and biological function. First, as we have already noted, proteins with different functions always have different amino acid sequences. Second, more than 1,400 human genetic diseases have been traced to the production of defective proteins (Table 6–6). Perhaps a third of these proteins are defective because of a single change in the amino acid sequence; hence, if the primary structure is altered, the function of the protein may also be changed. Finally, on comparing proteins with similar functions from different species, we find that these proteins often have similar amino acid sequences. An extreme case is ubiquitin, a 76 amino acid protein involved in regulating the degradation of other proteins. The amino acid sequence of ubiquitin is identical in species as disparate as fruit flies and humans.

Is the amino acid sequence absolutely fixed, or invariant, for a particular protein? No; some flexibility is possible. An estimated 20 to 30% of the proteins in humans are polymorphic, having amino acid sequence variants in the human population. Many of these variations in sequence have little or no effect on the function of the protein. Furthermore, proteins that carry out a broadly similar function in distantly related species often differ greatly in overall size and amino acid

sequence. An example is DNA polymerase, the primary enzyme involved in DNA synthesis. The DNA polymerase of a bacterium is very different in much of its sequence from that of a mouse cell.

The amino acid sequence of a protein is inextricably linked to its function. Proteins often contain crucial substructures within their amino acid sequence that are essential to their biological functions. The amino acid sequence in other regions might vary considerably without affecting these functions. The fraction of the sequence that is critical varies from protein to protein, complicating the task of relating sequence to structure, and structure to function. Before we can consider this problem further, however, we must examine how sequence information is obtained.

Figure 6–10  The amino acid sequence of the two chains of bovine insulin, which are joined by disulfide cross-linkages. The A chain is identical in human, pig, dog, rabbit, and sperm whale insulins. The B chains of the cow, pig, dog, goat, and horse are identical. Such identities between similar proteins of different species are discussed later in this chapter.

Two major discoveries in 1953 ushered in the modern era of biochemistry. In that year James D. Watson and Francis Crick deduced the double-helical structure of DNA and proposed a structural basis for the precise replication of DNA (Chapter 12). Implicit in their proposal was the idea that the sequence of nucleotide units in DNA bears encoded genetic information. In that same year, Frederick Sanger worked out the sequence of amino acids in the polypeptide chains of the hormone insulin (Fig. 6–10), surprising many researchers who had long thought that elucidation of the amino acid sequence of a polypeptide would be a hopelessly difficult task. These achievements together suggested that the nucleotide sequence of DNA and the amino acid sequence of proteins were somehow related. Within just over a decade, the nucleotide code that determines the amino acid sequence of protein molecules had been revealed (Chapter 26).

Today the amino acid sequences of thousands of different proteins from many species are known, determined using principles first developed by Sanger. These methods are still in use, although with many variations and improvements in detail.

Figure 6–11  Steps in sequencing a polypeptide.
(a) Determination of amino acid composition and
(b) identification of the amino-terminal residue are the first steps for many polypeptides. Sanger’s method for identifying the amino-terminal residue is shown here. The Edman degradation procedure (c) reveals the entire sequence of a peptide. For shorter peptides, this method alone readily yields the entire sequence, and steps (a) and (b) are often omitted. The latter procedures are useful in the case of larger polypeptides, which are often fragmented into smaller peptides for sequencing (see Fig. 6–13).

Three procedures are used in the determination of the sequence of a polypeptide chain (Fig. 6–11). The first is to hydrolyze it and determine its amino acid composition (Fig. 6–11a). This information is often valuable in later steps, and can also be useful in itself. Because amino acid composition differs from one protein to the next, it can serve as a kind of fingerprint. It can be used, for example, to help determine whether proteins isolated by different laboratories are the same or different.

Often, the next step is to identify the amino-terminal amino acid residue (Fig. 6–11b). For this purpose Sanger developed the reagent 1-fluoro-2,4-dinitrobenzene (FDNB; see Fig. 5–14). Other reagents used to label the amino-terminal residue are dansyl chloride and dabsyl chloride (see Figs. 5–14 and 5–18). The dansyl derivative is highly fluorescent and can be detected and measured in much lower concentrations than dinitrophenyl derivatives. The dabsyl derivative is intensely colored and also provides greater sensitivity than the dinitrophenyl compounds. These methods destroy the polypeptide and their utility is therefore limited to identification of the amino-terminal residue.

To sequence the entire polypeptide, a chemical method devised by Pehr Edman is usually employed. The Edman degradation procedure labels and removes only the amino-terminal residue from a peptide, leaving all other peptide bonds intact (Fig. 6–11c). The peptide is reacted with phenylisothiocyanate, and the amino-terminal residue is ultimately removed as a phenylthiohydantoin derivative. After removal and identifcation of the amino-terminal residue, the new amino-terminal residue so exposed can be labeled, removed, and identified by repeating the same series of reactions. This procedure is repeated until the entire sequence is determined. Refinements of each step permit the sequencing of up to 50 amino acid residues in a large peptide.

The many individual steps and the careful bookkeeping required in the determination of the amino acid sequence of long polypeptide chains are usually carried out by programmed and automated analyzers. The Edman degradation is carried out on a programmed machine, called a sequenator, which mixes reagents in the proper proportions, separates the products, identifies them, and records the results. Such instruments have greatly reduced the time and labor required to determine the amino acid sequence of polypeptides. These methods are extremely sensitive. Often, less than a microgram of protein is sufficient to determine its complete amino acid sequence.

Figure 6–12  Breaking disulfide bonds in proteins. The two common methods are illustrated. Oxidation of cystine with performic acid produces two cysteic acid residues. Reduction by dithiothreitol to form cysteine residues must be followed by further modification of the reactive –SH groups to prevent reformation of the disulfide bond. Acetylation by iodoacetate serves this purpose.
Figure 6–13  Fragmenting proteins prior to sequencing, and placing peptide fragments in their proper order with overlaps. The one-letter abbreviations for amino acids are given in Table 5–1. In this example, there are only two Cys residues, thus one possibility for location of the disulfide bridge (black bracket). In polypeptides with three or more Cys residues, disulfide bridges can be located as described in the text.

The overall accuracy for determination of an amino acid sequence generally declines as the length of the polypeptide increases, especially for polypeptides longer than 50 amino acids. The very large polypeptides found in proteins must usually be broken down into pieces small enough to be sequenced efficiently. There are several steps in this process. First, any disulfide bonds are broken, and the protein is cleaved into a set of specific fragments by chemical or enzymatic methods. Each fragment is then purified, and sequenced by the Edman procedure. Finally, the order in which the fragments appear in the original protein is determined and disulfide bonds (if any) are located.

Breaking Disulfide Bonds  Disulfide bonds interfere with the sequencing procedure. A cystine residue (p. 116) that has one of its peptide bonds cleaved by the Edman procedure will remain attached to the polypeptide. Disulfide bonds also interfere with the enzymatic or chemical cleavage of the polypeptide (described below). Two approaches to irreversible breakage of disulfide bonds are outlined in Figure 6–12.

Cleaving the Polypeptide Chain  Several methods can be used for fragmenting the polypeptide chain. These involve a set of enzymes (proteases) and chemical reagents that cleave peptide chains adjacent to specific amino acid residues (Table 6–7). The digestive enzyme trypsin, for example, catalyzes the hydrolysis of only those peptide bonds in

which the carbonyl group is contributed by either a Lys or an Arg residue, regardless of the length or amino acid sequence of the chain. The number of smaller peptides produced by trypsin cleavage can thus be predicted from the total number of Lys or Arg residues in the original polypeptide (Fig. 6–13). A polypeptide with five Lys and/or Arg residues will usually yield six smaller peptides on cleavage with trypsin. Moreover, all except one of these will have a carboxyl-terminal Lys or Arg. The fragments produced by trypsin action are separated by chromatographic or electrophoretic methods.
Sequencing of Peptides  All the peptide fragments resulting from the action of trypsin are sequenced separately by the Edman procedure.

Ordering Peptide Fragments  The order of these trypsin fragments in the original polypeptide chain must now be determined. Another sample of the intact polypeptide is cleaved into small fragments using a different enzyme or reagent, one that cleaves peptide bonds at points other than those cleaved by trypsin. For example, the reagent cyanogen bromide cleaves only those peptide bonds in which the carbonyl group is contributed by Met (Table 6–7). The fragments resulting from this new procedure are then separated and sequenced as before.

The amino acid sequences of each fragment obtained by the two cleavage procedures are examined, with the objective of finding peptides from the second procedure whose sequences establish continuity, because of overlaps, between the fragments obtained by the first cleavage procedure (Fig. 6–13). Overlapping peptides obtained from the second fragmentation yield the correct order of the peptide fragments produced in the first. Moreover, the two sets of fragments can be compared for possible errors in determining the amino acid sequence of each fragment. If the amino-terminal amino acid has been identified before the original cleavage of the protein, this information can be used to establish which fragment is derived from the amino terminus.

If the second cleavage procedure fails to establish continuity between all peptides from the first cleavage, a third or even a fourth cleavage method must be used to obtain a set of peptides that can provide the necessary overlap(s). A variety of proteolytic enzymes with different specificities are available (Table 6–7).

Locating Disulfide Bonds  After sequencing is completed, locating the disulfide bonds requires an additional step. A sample of the protein is again cleaved with a reagent such as trypsin, this time without first breaking the disulfide bonds. When the resulting peptides are separated by electrophoresis and compared with the original set of peptides generated by trypsin, two of the original peptides will be missing and a new, larger peptide will appear. The two missing peptides represent the regions of the intact polypeptide that are linked by a disulfide bond.

www.bioinfo.org.cn/book/biochemistry/chapt06/bio3.htm
Figure 6–14  Correspondence of DNA and amino acid sequences. Each amino acid is encoded by a specific sequence of three nucleotides (triplet) in DNA. The genetic code is described in detail in Chapter 26.

The approach outlined above is not the only way to obtain amino acid sequences. The development of rapid DNA sequencing methods (Chapter 12), the elucidation of the genetic code (Chapter 26), and the development of techniques for the isolation of genes (Chapter 28) make it possible to deduce the sequence of a polypeptide by determining the sequence of nucleotides in its gene (Fig. 6–14). The two techniques are complementary. When the gene is available, sequencing the DNA can be faster and more accurate than sequencing the protein. If the gene has not been isolated, direct sequencing of peptides is necessary, and this can provide information (e.g., the location of disulfide bonds) not available in a DNA sequence. In addition, a knowledge of the amino acid sequence can greatly facilitate the isolation of the corresponding gene (Chapter 28).

The sequence of amino acids in a protein can offer insights into its three-dimensional structure and its function, cellular location, and evolution. Most of these insights are derived by searching for similarities with other known sequences. Thousands of sequences are known and available in computerized data bases. The comparison of a newly obtained sequence with this large bank of stored sequences often reveals relationships both surprising and enlightening.

The relationship between amino acid sequence and three-dimensional structure, and between structure and function, is not understood in detail. However, a growing number of protein families are being revealed that have at least some shared structural and functional features that can be readily identified on the basis of amino acid sequence similarities alone. For example, there are four major families of proteases, several families of naturally occurring protease inhibitors, a large number of closely related protein kinases, and a similar large number of related protein phosphatases. Individual proteins are generally assigned to families by the degree of similarity in amino acid sequence (identical to other members of the family across 30% or more of the sequence), and proteins in these families generally share at least some structural and functional characteristics. Some families are defined, however, by identities involving only a few amino acids that are critical to a certain function. Many membrane-bound protein receptors share important structural features and have similar amino acid sequences, even though the extracellular molecules they bind are quite different. Even the immunoglobulin family includes a host of extracellular and cell-surface proteins in addition to antibodies.

The similarities may involve the entire protein or may be confined to relatively small segments of it. A number of similar substructures (domains) occur in many functionally unrelated proteins. An example is a 40 to 45 amino acid sequence called the EGF (epidermal growth factor) domain that makes up part of the structure of urokinase, the low-density lipoprotein receptor, several proteins involved in blood clotting, and many others. These domains often fold up into structural configurations that have an unusual degree of stability or that are specialized for a certain environment. Evolutionary relationships can also be inferred from the structural and functional similarities within protein families.

Certain amino acid sequences often serve as signals that determine the cellular location, chemical modification, and half-life of a protein. Special signal sequences, usually at the amino terminus, are used to target certain proteins for export from the cell, while other proteins are distributed to the nucleus, the cell surface, the cytosol, and other cellular locations. Other sequences act as attachment sites for prosthetic groups, such as glycosyl groups in glycoproteins and lipids in lipoproteins. Some of these signals are well characterized, and are easily recognized if they occur in the sequence of a newly discovered protein.

The probability that information about a new protein can be deduced from its primary structure improves constantly with the almost daily addition to the number of published amino acid sequences stored in shared databanks.

Figure 6–15  The amino acid sequence of human cytochrome c. Amino acid substitutions found at different positions in the cytochrome c of other species are listed below the sequence of the human protein. The amino acids are color-coded to help distinguish conservative and nonconservative substitutions: invariant amino acids are shaded in yellow, conservative amino acid substitutions are shaded in blue, and nonconservative substitutions are unshaded. X is an unusual amino acid, trimethyllysine. The one-letter abbreviations for amino acids are used here (see Table 5–1).
Figure 6–16  Main branches of the evolutionary tree constructed from the number of amino acid differences between cytochrome c molecules of different species. The numbers represent the number of residues by which the cytochrome c of a given line of organism differs from its ancestors.

Several important conclusions have come from study of the amino acid sequences of homologous proteins from different species. Homologous proteins are those that are evolutionarily related. They usually perform the same function in different species; an example is hemoglobin, which has the same oxygen-transport function in different vertebrates. Homologous proteins from different species often have polypeptide chains that are identical or nearly identical in length. Many positions in the amino acid sequence are occupied by the same amino acid in all species and are thus called invariant residues. But in other positions there may be considerable variation in the amino acid from one species to another; these are called variable residues.

The functional significance of sequence homology can be illustrated by cytochrome c, an iron-containing mitochondrial protein that transfers electrons during biological oxidations in eukaryotic cells. The polypeptide chain of this protein has a molecular weight of about 13,000 and has about 100 amino acid residues in most species. The amino acid sequences of cytochrome c from over 60 different species have been determined, and 27 positions in the chain of amino acid residues are invariant in all species tested (Fig. 6–15), suggesting that they are the most important residues specifying the biological activity of cytochrome c. The residues in other positions in the chain exhibit some interspecies variation. There are clear gradations in the number of changes observed in the variable residues. In some positions, all substitutions involve similar amino acid residues (e.g., Arg will replace Lys, both of which are positively charged); these are called conservative substitutions. At other positions the substitutions are more random. As we will show in the next chapter, the polypeptide chains of proteins are folded into characteristic and specific conformations and these conformations depend on amino acid sequence. Clearly, the invariant residues are more critical to the structure and function of a protein than the variable ones. Recognizing which amino acids fall into each category is an important step in deciphering the complicated question of how amino acid sequence is translated into a specific three-dimensional structure.

The variable amino acids provide information of another sort. Evolution is sometimes regarded as a theory that is accepted but difficult to test, yet the phylogenetic trees established by taxonomy have been tested and experimentally confirmed through biochemistry. The examination

of sequences of cytochrome c and other homologous proteins has led to an important conclusion: the number of residues that differ in homologous proteins from any two species is in proportion to the phylogenetic difference between those species. For example, 48 amino acid residues differ in the cytochrome c molecules of the horse and of yeast, which are very widely separated species, whereas only two residues differ in the cytochrome c of the much more closely related duck and chicken. In fact, the cytochrome c molecule has identical amino acid sequences in the chicken and the turkey, and in the pig, cow, and sheep. Information on the number of residue differences between homologous proteins of different species allows the construction of evolutionary maps that show the origin and sequence of development of different animals and plants during the evolution of species (Fig. 6–16). The relationships established by taxonomy and biochemistry agree well.
www.bioinfo.org.cn/book/biochemistry/chapt06/bio4.htm
Summary

Cells generally contain thousands of different proteins, each with a different function or biological activity. These functions include enzymatic catalysis, molecular transport, nutrition, cell or organismal motility, structural roles, organismal defense, regulation, and many others. Proteins consist of very long polypeptide chains having from 100 to over 2,000 amino acid residues joined by peptide linkages. Some proteins have several polypeptide chains, which are then referred to as subunits. Simple proteins yield only amino acids on hydrolysis; conjugated proteins contain in addition some other component, such as a metal ion or organic prosthetic group.

Proteins are purified by taking advantage of properties in which they differ, such as size, shape, binding affinities, charge, etc. Purification also requires a method for quantifying or assaying a particular protein in the presence of others. Proteins can be both separated and visualized by electrophoretic methods. Antibodies that specifically bind a certain protein can be used to detect and locate that protein in a solution, a gel, or even in the interior of a cell.

All proteins are made from the same set of 20 amino acids. Their differences in function result from differences in the composition and sequence of their amino acids. The amino acid sequences of polypeptide chains can be established by fragmenting them into smaller pieces using several specific reagents, and determining the amino acid sequence of each fragment by the Edman degradation procedure. The sequencing of suitably sized peptide fragments has been automated. The peptide fragments are then placed in the correct order by finding sequence overlaps between fragments generated by different methods. Protein sequences can also be deduced from the nucleotide sequence of the corresponding gene in the DNA. The amino acid sequence can be compared with the thousands of known sequences, often revealing insights into the structure, function, cellular location, and evolution of the protein.

Homologous proteins from different species show sequence homology: certain positions in the polypeptide chains contain the same amino acids, regardless of the species. In other positions the amino acids may differ. The invariant residues are evidently essential to the function of the protein. The degree of similarity between amino acid sequences of homologous proteins from different species correlates with the evolutionary relationship of the species.

Further Reading

See Chapter 5 for additional useful references.

Properties of Proteins

Creighton, T.E. (1984) Proteins: Structures and Molecular Properties, W.H. Freeman and Company, New York. 

Dickerson, R.E. & Geis, I. (1983) Proteins: Structure, Function, and Evolution, 2nd edn, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 
A beautifully illustrated introduction to proteins.

Doolittle, R.F. (1985) Proteins. Sci. Am. 253 (October), 88–99. 
An overview that highlights euolutionary relationships.

Srinavasan, P.R., Fruton, J.S., & Edsall, J.T. (eds) (1979) The Origins of Modern Biochemistry: A Retrospect on Proteins. Ann. N.Y. Acad. Sci. 325. 
A collection of very interesting articles on the history of protein research.

Structure and Function of Proteins. (1989) Trends Biochem. Sci. 14 (July). 
A special issue devoted to reviews on protein chemistry and protein structure.

Working with Proteins

Hirs, C.H.W. & Timasheff, S.N. (eds) (1983) Methods in Enzymology, Vol. 91, Part I: Enzyme Structure, Academic Press, Inc., New York. 
An excellent collection of authoritative articles on techniques in protein chemistry. Includes information on sequencing.

Kornberg, A. (1990) Why purify enzymes? In Methods in Enzymology, Vol. 182: Guide to Protein Purification, (Deutscher, M.P., ed), pp. 1-5, Academic Press, Inc., New York. 
The critical role of classical biochemical methods in a new age.

O’Farrell, P.H. (1975) High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 250, 4007–4021. 
An interesting attempt to count all the proteins in the
E. coli cell.

Plummer, David T. (1987) An Introduction to Practical Biochemistry, 3rd edn, McGraw-Hill, London. 
Good descriptions of many techniques for beginning students.

Scopes, R.K. (1987) Protein Purification: Principles and Practice, 2nd edn, Springer-Verlag, New York. 

Tonegawa, S. (1985) The molecules of the immune system. Sci. Am. 253 (October), 122–131. 

The Covalent Structure of Proteins

Dickerson, R.E. (1972) The structure and history of an ancient protein. Sci. Am. 226 (April), 58–72. 
A nice summary of information gleaned from interspecies comparisons of cytochrome
c sequences.

Doolittle, R. (1981) Similar amino acid sequences: chance or common ancestry? Science 214, 149–159. 
A good discussion of what can be learned by comparing amino acid sequences.

Hunkapiller, M.W., Strickler, J.E., & Wilson, K.J. (1984) Contemporary methodology for protein structure determination. Science 226, 304–311. 

Reidhaar-Olson, J.F. & Sauer, R.T. (1988) Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241, 53–57. 
A systematic study of possible amino acid substitutions in a short segment of one protein.

Wilson, A.C. (1985) The molecular basis of evolution. Sci. Am. 253 (October), 164–173. 

Problems

1. How Many β-Galactosidase Molecules Are Present in an E. coli Cell?  E. coli is a rod-shaped bacterium 2 μm long and 1 μm in diameter. When grown on lactose (a sugar found in milk), the bacterium synthesizes the enzyme β-galactosidase (Mr 450,000), which catalyzes the breakdown of lactose. The average density of the bacterial cell is 1.2 g/mL, and 14% of its total mass is soluble protein, of which 1.0% is β-galactosidase. Calculate the number of β-galactosidase molecules in an E. coli cell grown on lactose.

2. The Number of Tryptophan Residues in Bovine Serum Albumin  A quantitative amino acid analysis reveals that bovine serum albumin contains 0.58% by weight of tryptophan, which has a molecular weight of 204.

       (a) Calculate the minimum molecular weight of bovine serum albumin (i.e., assuming there is only one tryptophan residue per protein molecule).
       (b) Gel filtration of bovine serum albumin gives a molecular weight estimate of about 70,000. How many tryptophan residues are present in a molecule of serum albumin?

3. The Molecular Weight of Ribonuclease  Lysine makes up 10.5% of the weight of ribonuclease. Calculate the minimum molecular weight of ribonuclease. The ribonuclease molecule contains ten lysine residues. Calculate the molecular weight of ribonuclease.

4. The Size of Proteins  What is the approximate molecular weight of a protein containing 682 amino acids in a single polypeptide chain?

5. Net Electric Charge of Peptides  A peptide isolated from the brain has the sequence

Glu-His-Trp-Ser-Tyr-Gly-Leu-Arg-Pro-Gly

Determine the net charge on the molecule at pH 3. What is the net charge at pH 5.5? At pH 8? At pH 11? Estimate the pI for this peptide. (Use pKa values for side chains and terminal amino and carboxyl groups as given in Table 5–1.)

6. The Isoelectric Point of Pepsin  Pepsin of gastric juice (pH ≈ 1.5) has a pI of about 1, much lower than that of other proteins (see Table 6–5). What functional groups must be present in relatively large numbers to give pepsin such a low pI? What amino acids can contribute such groups?

7. The Isoelectric Point of Histones  Histones are proteins of eukaryotic cell nuclei. They are tightly bound to deoxyribonucleic acid (DNA), which has many phosphate groups. The pI of histones is very high, about 10.8. What amino acids must be present in relatively large numbers in histones? In what way do these residues contribute to the strong binding of histones to DNA?

8. Solubility of Polypeptides  One method for separating polypeptides makes use of their differential solubilities. The solubility of large polypeptides in water depends upon the relative polarity of their R groups, particularly on the number of ionized groups: the more ionized groups there are, the more soluble the polypeptide. Which of each pair of polypeptides below is more soluble at the indicated pH?

       (a) (Gly)20 or (Glu)20 at pH 7.0
       (b) (Lys–Ala)3 or (Phe–Met)3 at pH 7.0
       (c) (Ala–Ser–Gly)5 or (Asn–Ser–His)5 at pH 6.0
       (d) (Ala–Asp–Gly)5 or (Asn–Ser–His)5 at pH 3.0

9. Purification of an Enzyme  A biochemist discovers and purifies a new enzyme, generating the purification table below:

       (a) From the information given in the table, calculate the specific activity of the enzyme solution after each purification procedure.

       (b) Which of the purification procedures used for this enzyme is most effective (i.e., gives the greatest increase in purity)?
       (c) Which of the purification procedures is least effective?
       (d) Is there any indication in this table that the enzyme is now pure? What else could be done to estimate the purity of the enzyme preparation?

10. Fragmentation of a Polypeptide Chain by Proteolytic Enzymes  Trypsin and chymotrypsin are specific enzymes that catalyze the hydrolysis of polypeptides at specific locations (Table 6–7). The sequence of the B chain of insulin is shown below. Note that the cystine cross-linkage between the A and B chains has been cleaved through the action of performic acid (see Fig. 6–12).

Phe–Val–Asn–Gln–His–Leu–CysSO3–Gly–
Ser–His–Leu–Val–Glu–Ala–Leu–Tyr–Leu–
Val–CysSO3–Gly–Glu–Arg–Gly–Phe–Phe–
Tyr–Thr–Pro–Lys–Ala

Indicate the points in the B chain that are cleaved by (a) trypsin and (b) chymotrypsin. Note that these proteases will not remove single amino acids from either end of a polypeptide chain.

11. Sequence Determination of the Brain Peptide Leucine Enkephalin  A group of peptides that influence nerve transmission in certain parts of the brain has been isolated from normal brain tissue. These peptides are known as opioids, because they bind to specific receptors that bind opiate drugs, such as morphine and naloxone. Opioids thus mimic some of the properties of opiates. Some researchers consider these peptides to be the brain’s own pain killers. Using the information below, determine the amino acid sequence of the opioid leucine enkephalin. Explain how your structure is consistent with each piece of information.

       (a) Complete hydrolysis by 1 M HCl at 110 °C followed by amino acid analysis indicated the presence of Gly, Leu, Phe, and Tyr, in a 2:1:1:1 molar ratio.
       (b) Treatment of the peptide with 1-fluoro-2,4-dinitrobenzene followed by complete hydrolysis and chromatography indicated the presence of the 2,4-dinitrophenyl derivative of tyrosine. No free tyrosine could be found.
       (c) Complete digestion of the peptide with pepsin followed by chromatography yielded a dipeptide containing Phe and Leu, plus a tripeptide containing Tyr and Gly in a 1:2 ratio.

12. Structure of a Peptide Antibiotic from Bacillus brevis  Extracts from the bacterium Bacillus brevis contain a peptide with antibiotic properties. Such peptide antibiotics form complexes with metal ions and apparently disrupt ion transport across the cell membrane, killing certain bacterial species. The structure of the peptide has been determined from the following observations.

       (a) Complete acid hydrolysis of the peptide followed by amino acid analysis yielded equimolar amounts of Leu, Orn, Phe, Pro, and Val. Orn is ornithine, an amino acid not present in proteins but present in some peptides. It has the structure

       (b) The molecular weight of the peptide was estimated as about 1,200.

       (c) When treated with the enzyme carboxypeptidase, the peptide failed to undergo hydrolysis.
       (d) Treatment of the intact peptide with 1-fluoro-2,4-dinitrobenzene, followed by complete hydrolysis and chromatography, yielded only free amino acids and the following derivative:

(Hint: Note that the 2,4-dinitrophenyl derivative involves the amino group of a side chain rather than the α-amino group.)

       (e) Partial hydrolysis of the peptide followed by chromatographic separation and sequence analysis yielded the di- and tripeptides below (the amino-terminal amino acid is always at the left):

Leu–Phe     Phe–Pro     Orn–Leu     Val–Orn
Val–Orn–Leu     Phe–Pro–Val     Pro–Val–Orn

Given the above information, deduce the amino acid sequence of the peptide antibiotic. Show your reasoning. When you have arrived at a structure, go back and demonstrate that it is consistent with each experimental observation.

www.bioinfo.org.cn/book/biochemistry/chapt06/bio5.htm
Figure 7–1  The structure of the enzyme chymotrypsin, a globular protein. A molecule of glycine (blue) is shown for size comparison.
Chapter 7
The Three-Dimensional Structure of Proteins

The covalent backbone of proteins is made up of hundreds of individual bonds. If free rotation were possible around even a fraction of these bonds, proteins could assume an almost infinite number of three-dimensional structures. Each protein has a specific chemical or structural function, however, strongly suggesting that each protein has a unique three-dimensional structure (Fig. 7–1). The simple fact that proteins can be crystallized provides strong evidence that this is the case. The ordered arrays of molecules in a crystal can generally form only if the molecular units making up the crystal are identical. The enzyme urease (Mr 483,000) was among the first proteins crystallized, by James Sumner in 1926. This accomplishment demonstrated dramatically that even very large proteins are discrete chemical entities with unique structures, and it revolutionized thinking about proteins.

In this chapter, we will explore the three-dimensional structure of proteins, emphasizing several principles. First, the three-dimensional structure of a protein is determined by its amino acid sequence. Second, the function of a protein depends upon its three-dimensional structure. Third, the three-dimensional structure of a protein is unique, or nearly so. Fourth, the most important forces stabilizing the specific three-dimensional structure maintained by a given protein are noncovalent interactions. Finally, even though the structure of proteins is complicated, several common patterns can be recognized.

The relationship between the amino acid sequence and the three-dimensional structure of a protein is an intricate puzzle that has yet to be solved in detail. Polypeptides with very different amino acid sequences sometimes assume similar structures, and similar amino acid sequences sometimes yield very different structures. To find and understand patterns in this biochemical labyrinth requires a renewed appreciation for fundamental principles of chemistry and physics.

The spatial arrangement of atoms in a protein is called a conformation. The term conformation refers to a structural state that can, without breaking any covalent bonds, interconvert with other structural states. A change in conformation could occur, for example, by rotation about single bonds. Of the innumerable conformations that are theoretically possible in a protein containing hundreds of single bonds, one generally predominates. This is usually the conformation that is

thermodynamically the most stable, having the lowest Gibbs’ free energy (G). Proteins in their functional conformation are called native proteins.

What principles determine the most stable conformation of a protein? Although protein structures can seem hopelessly complex, close inspection reveals recurring structural patterns. The patterns involve different levels of structural complexity, and we now turn to a biochemical convention that serves as a framework for much of what follows in this chapter.

Figure 7–2  Levels of structure in proteins. The primary structure consists of a sequence of amino acids linked together by covalent peptide bonds, and includes any disulfide bonds. The resulting polypeptide can be coiled into an α helix, one form of secondary structure. The helix is a part of the tertiary structure of the folded polypeptide, which is itself one of the subunits that make up the quaternary structure of the multimeric protein, in this case hemoglobin.
Figure 7–3  The different structural domains in the polypeptide troponin C, a calcium-binding protein associated with muscle. The separate calcium-binding domains, indicated in blue and purple, are connected by a long α helix, shown in white.

Conceptually, protein structure can be considered at four levels (Fig. 7–2). Primary structure includes all the covalent bonds between amino acids and is normally defined by the sequence of peptide-bonded amino acids and locations of disulfide bonds. The relative spatial arrangement of the linked amino acids is unspecified.

Polypeptide chains are not free to take up any three-dimensional structure at random. Steric constraints and many weak interactions stipulate that some arrangements will be more stable than others. Secondary structure refers to regular, recurring arrangements in space of adjacent amino acid residues in a polypeptide chain. There are a few common types of secondary structure, the most prominent being the α helix and the β conformation. Tertiary structure refers to the spatial relationship among all amino acids in a polypeptide; it is the complete three-dimensional structure of the polypeptide. The boundary between secondary and tertiary structure is not always clear. Several different types of secondary structure are often found within the three-dimensional structure of a large protein. Proteins with several polypeptide chains have one more level of structure: quaternary structure, which refers to the spatial relationship of the polypeptides, or subunits, within the protein.

Continued advances in the understanding of protein structure, folding, and evolution have made it necessary to define two additional structural levels intermediate between secondary and tertiary structure. A stable clustering of several elements of secondary structure is sometimes referred to as supersecondary structure. The term is used to describe particularly stable arrangements that occur in many

different proteins and sometimes many times in a single protein. A somewhat higher level of structure is the domain. This refers to a compact region, including perhaps 40 to 400 amino acids, that is a distinct structural unit within a larger polypeptide chain. A polypeptide that is folded into a dumbbell-like shape might be considered to have two domains, one at either end. Many domains fold independently into thermodynamically stable structures. A large polypeptide chain can contain several domains that often are readily distinguishable within the overall structure (Fig. 7–3). In some cases the individual domains have separate functions. As we will see, important patterns exist at each of these levels of structure that provide clues to understanding the overall structure of large proteins.

The native conformation of a protein is only marginally stable; the difference in free energy between the folded and unfolded states in typical proteins under physiological conditions is in the range of only 20 to 65 kJ/mol. A given polypeptide chain can theoretically assume countless different conformations, and as a result the unfolded state of a protein is characterized by a high degree of conformational entropy. This entropy, and the hydrogen-bonding interactions of many groups in the polypeptide chain with solvent (water), tend to maintain the unfolded state. The chemical interactions that counteract these effects and stabilize the native conformation include disulfide bonds and the weak (noncovalent) interactions described in Chapter 4: hydrogen bonds, and hydrophobic, ionic, and van der Waals interactions. An appreciation of the role of these weak interactions is especially important to understanding how polypeptide chains fold into specific secondary, tertiary, and quaternary structures.

Every time a bond is formed between two atoms, some free energy is released in the form of heat or entropy. In other words, the formation of bonds is accompanied by a favorable (negative) change in free energy. The ΔG for covalent bond formation is generally in the range of –200 to –460 kJ/mol. For weak interactions, ΔG = –4 to –30 kJ/mol. Although covalent bonds are clearly much stronger, weak interactions predominate as a stabilizing force in protein structure because of their number. In general, the protein conformation with the lowest free energy (i.e., the most stable) is the one with the maximum number of weak interactions.

The stability of a protein is not simply the sum of the free energies of formation of the many weak interactions within it, however. We have already noted that the stability of proteins is marginal. Every hydrogen-bonding group in a polypeptide chain was hydrogen bonded to water prior to folding. For every hydrogen bond formed in a protein, hydrogen bonds (of similar strength) between the same groups and water were broken. The net stability contributed by a given weak interaction, or the difference in free energies of the folded and unfolded state, is close to zero. We must therefore explain why the native conformation of a protein is favored. The contribution of weak interactions to protein stability can be understood in terms of the properties of water (Chapter 4). Pure water contains a network of hydrogen-bonded water molecules. No other molecule has the hydrogen-bonding potential of water, and other molecules present in an aqueous solution will disrupt

the hydrogen bonding of water to some extent. Optimizing the hydrogen bonding of water around a hydrophobic molecule results in the formation of a highly structured shell or solvation layer of water in the immediate vicinity, resulting in an unfavorable decrease in the entropy of water. The association among hydrophobic or nonpolar groups results in a decrease in this structured solvation layer, or a favorable increase in entropy. As described in Chapter 4, this entropy term is the major thermodynamic driving force for the association of hydrophobic groups in aqueous solution, and hydrophobic amino acid side chains therefore tend to be clustered in a protein’s interior, away from water.

The formation of hydrogen bonds and ionic interactions in a protein is also driven largely by this same entropic effect. Polar groups can generally form hydrogen bonds with water and hence are soluble in water. However, the number of hydrogen bonds per unit mass is generally greater for pure water than for any other liquid or solution, and there are limits to the solubility of even the most polar molecules because of the net decrease in hydrogen bonding that occurs when they are present. Therefore, a solvation shell of structured water will also form to some extent around polar molecules. Even though the energy of formation of an intramolecular hydrogen bond or ionic interaction between two polar groups in a macromolecule is largely canceled out by the elimination of such interactions between the same groups and water, the release of structured water when the intramolecular interaction is formed provides an entropic driving force for folding. Most of the net change in free energy that occurs when weak interactions are formed within a protein is therefore derived from the increase in entropy in the surrounding aqueous solution.

Of the different types of weak interactions, hydrophobic interactions are particularly important in stabilizing a protein conformation; the interior of a protein is generally a densely packed core of hydrophobic amino acid side chains. It is also important that any polar or charged groups in the protein interior have suitable partners for hydrogen bonding or ionic interactions. One hydrogen bond makes only a small apparent contribution to the stability of a native structure, but the presence of a single hydrogen-bonding group without a partner in the hydrophobic core of a protein can be so destabilizing that conformations containing such a group are often thermodynamically untenable.

Most of the structural patterns outlined in this chapter reflect these two simple rules: (1) hydrophobic residues must be buried in the protein interior and away from water, and (2) the number of hydrogen bonds must be maximized. Insoluble proteins and proteins within membranes (Chapter 10) follow somewhat different rules because of their function or their environment, but weak interactions are still critical structural elements.

Several types of secondary structure are particularly stable and occur widely in proteins. The most prominent are the α helix and β conformations described below. Using fundamental chemical principles and a few experimental observations, Linus Pauling and Robert Corey predicted the existence of these secondary structures in 1951, several years before the first complete protein structure was elucidated.

In considering secondary structure, it is useful to classify proteins into two major groups: fibrous proteins, having polypeptide chains arranged in long strands or sheets, and globular proteins, with polypeptide chains folded into a spherical or globular shape. Fibrous proteins play important structural roles in the anatomy and physiology of vertebrates, providing external protection, support, shape, and form. They may constitute one-half or more of the total body protein in larger animals. Most enzymes and peptide hormones are globular proteins. Globular proteins tend to be structurally complex, often containing several types of secondary structure; fibrous proteins usually consist largely of a single type of secondary structure. Because of this structural simplicity, certain fibrous proteins played a key role in the development of the modern understanding of protein structure and provide particularly clear examples of the relationship between structure and function; they are considered in some detail after the general discussion of secondary structure.
Figure 7–4  (a) The planar peptide group. Each peptide bond has some double-bond character due to resonance and cannot rotate. The carbonyl oxygen has a partial negative charge and the amide nitrogen a partial positive charge, setting up a small electric dipole. Note that the oxygen and hydrogen atoms in the plane are on opposite sides of the C–N bond. This is the trans configuration. Virtually all peptide bonds in proteins occur in this configuration, although an exception is noted in Fig. 7–10. (b) Three bonds separate sequential Cα carbons in a polypeptide chain. The N–Cα and Cα–C bonds can rotate, with bond angles designated Φ and ψ, respectively. (c) Limited rotation can occur around two of the three types of bonds in a polypeptide chain. The C–N bonds in the planar peptide groups (shaded in blue), which make up one-third of all the backbone bonds, are not free to rotate. Other single bonds in the backbone may also be rotationally hindered, depending on the size and charge of the R groups. (d) By convention, Φ and ψ are both defined as 0° when the two peptide bonds flanking an α carbon are in the same plane. In a protein, this conformation is prohibited by steric overlap between a carbonyl oxygen and an α-amino hydrogen atom.

Pauling and Corey began their work on protein structure in the late 1930s by first focusing on the structure of the peptide bond. The α carbons of adjacent amino acids are separated by three covalent bonds, arranged Cα–C–N–Cα. X-ray diffraction studies of crystals of amino acids and of simple dipeptides and tripeptides demonstrated that the amide C–N bond in a peptide is somewhat shorter than the C–N bond in a simple amine and that the atoms associated with the bond are coplanar. This indicated a resonance or partial sharing of two pairs of electrons between the carbonyl oxygen and the amide nitrogen (Fig. 7–4a).

The oxygen has a partial negative charge and the nitrogen a partial positive charge, setting up a small electric dipole. The four atoms of the peptide group lie in a single plane, in such a way that the oxygen atom of the carbonyl group and the hydrogen atom of the amide nitrogen are trans to each other. From these studies Pauling and Corey concluded that the amide C–N bonds are unable to rotate freely because of their partial double-bond character. The backbone of a polypeptide chain can thus be pictured as a series of rigid planes separated by substituted methylene groups, –CH(R)– (Fig. 7–4c). The rigid peptide bonds limit the number of conformations that can be assumed by a polypeptide chain.

Rotation is permitted about the N–Cα and the Cα–C bonds. By convention the bond angles resulting from rotations are labeled Φ (phi) for the N–Cα, bond and ψ (psi) for the Cα–C bond. Again by convention, both Φ and ψ are defined as 0° in the conformation in which the two peptide bonds connected to a single a carbon are in the same plane, as shown in Figure 7–4d. In principle, Φ and ψ can have any value between –180° and +180° but many values of Φ and ψ are prohibited by steric interference between atoms in the polypeptide backbone and amino acid side chains. The conformation in which Φ and ψ are both 0° is prohibited for this reason; this is used merely as a reference point for describing the angles of rotation.

Every possible secondary structure is described completely by the two bond angles Φ and ψ that are repeated at each residue. Allowed values for Φ and ψ can be shown graphically by simply plotting Φ versus ψ, an arrangement known as a Ramachandran plot. The Ramachandran plot in Figure 7–5 shows the conformations permitted for most amino acid residues.

Figure 7–5  A Ramachandran plot. The theoretically allowed conformations of peptides are shown, defined by the values of Φ and ψ. The shaded areas reflect conformations that can be take up by all amino acids (dark shading) or all except valine and isoleucine (medium shading); the lightest shading reflects conformations that are somewhat unstable but are found in some protein structures.
Figure 7–6  Four models of the α helix, showing different aspects of its structure. (a) Formation of a right-handed α helix. The planes of the rigid peptide bonds are parallel to the long axis of the helix. (b) Ball-and-stick model of a right-handed α helix, showing the intrachain hydrogen bonds. The repeat unit is a single turn of the helix, 3.6 residues. (c) The α helix as viewed from one end, looking down the longitudinal axis. Note the positions of the R groups, represented by red spheres. (d) A space-filling model of the α helix.

Pauling and Corey were aware of the importance of hydrogen bonds in orienting polar chemical groups such as the –C=O and –N–H groups of the peptide bond. They also had the experimental results of William Astbury, who in the 1930s had conducted pioneering x-ray studies of proteins. Astbury demonstrated that the protein that makes up hair and wool (the fibrous protein α-keratin) has a regular structure that repeats every 0.54 nm. With this information and their data on the peptide bond, and with the help of precisely constructed models, Pauling and Corey set out to determine the likely conformations of protein molecules.

The simplest arrangement the polypeptide chain could assume with its rigid peptide bonds (but with the other single bonds free to rotate) is a helical structure, which Pauling and Corey called the α helix (Fig. 7–6). In this structure the polypeptide backbone is tightly wound around the long axis of the molecule, and the R groups of the amino acid residues protrude outward from the helical backbone. The repeating unit is a single turn of the helix, which extends about 0.56 nm along the long axis, corresponding closely to the periodicity

Astbury observed on x-ray analysis of hair keratin. The amino acid residues in an α helix have conformations with ψ = –45° to –50° and Φ = –60° and each helical turn includes 3.6 amino acids. The twisting of the helix has a right-handed sense (Box 7–1) in the most common form of the α helix, although a very few left-handed variants have been observed.

The α helix is one of two prominent types of secondary structure in proteins. It is the predominant structure in α-keratins. In globular proteins, about one-fourth of all amino acid residues are found in α helices, the fraction varying greatly from one protein to the next.

Why does such a helix form more readily than many other possible conformations? The answer is, in part, that it makes optimal use of internal hydrogen bonds. The structure is stabilized by a hydrogen bond between the hydrogen atom attached to the electronegative nitrogen atom of each peptide linkage and the electronegative carbonyl oxygen atom of the fourth amino acid on the amino-terminal side of it in the helix (Fig. 7–6b). Every peptide bond of the chain participates in such hydrogen bonding. Each successive coil of the α helix is held to the adjacent coils by several hydrogen bonds, which in summation give the entire structure considerable stability.

Further model-building experiments have shown that an α helix can form with either L- or D-amino acids. However, all residues must be of one stereoisomeric series; a D-amino acid will disrupt a regular structure consisting of L-amino acids, and vice versa. Naturally occurring L-amino acids can form either right- or left-handed helices, but, with rare exceptions, only right-handed helices are found in proteins.

B O X  7–1
Knowing the Right Hand from the Left

There is a simple method for determining the handedness of a helical structure, whether right-handed or left-handed. Make fists of your two hands with thumbs outstretched and pointing away from you. Looking at your right hand, think of a helix spiraling away in the direction indicated by your right thumb, and the spiral occurring in the direction in which the other four fingers are curled as shown (clockwise). The resulting helix is right-handed. Repeating the process with your left hand will produce an image of a left-handed helix, which rotates in the counterclockwise direction as it spirals away from you.
Figure 7–7  Interactions between R groups of amino acids three residues apart in an α helix. An ionic interaction between Asp100 and Arg103 in an α-helical region of the protein troponin C is shown in this space-filling model. The polypeptide backbone (carbons, α-amino nitrogens, and α-carbonyl oxygens) is shown in white for a helix segment about 12 amino acids long. The only side chains shown are the interacting Asp and Arg residues, with the aspartate in red and the arginine in blue. The side chain interaction illustrated occurs within the white connecting helix in Fig. 7–3.
Figure 7–8  The electric dipole of a peptide bond (Fig. 7–4a) is transmitted along an α-helical segment through the intrachain hydrogen bonds, resulting in an overall helix dipole. In this illustration, the amino and carbonyl constituents of each peptide bond are indicated by + and – symbols, respectively. Unbonded amino and carbonyl constituents in the peptide bonds near either end of the α-helical region are shown in red.

Not all polypeptides can form a stable α helix. Additional interactions occur between amino acid side chains that can stabilize or destabilize this structure. For example, if a polypeptide chain has many Glu residues in a long block, this segment of the chain will not form an α helix at pH 7.0. The negatively charged carboxyl groups of adjacent Glu residues repel each other so strongly that they overcome the stabilizing influence of hydrogen bonds on the α helix. For the same reason, if there are many adjacent Lys and/or Arg residues, with positively charged R groups at pH 7.0, they will also repel each other and prevent formation of the α helix. The bulk and shape of certain R groups can also destabilize the α helix or prevent its formation. For example, Asn, Ser, Thr, and Leu residues tend to prevent formation of the α helix if they occur close together in the chain.

The twist of an α helix ensures that critical interactions occur between an amino acid side chain and the side chain three (and sometimes four) residues away on either side of it (Fig. 7–7). Positively charged amino acids are often found three residues away from negatively charged amino acids, permitting the formation of an ionic interaction. Two aromatic amino acids are often similarly spaced, resulting in a hydrophobic interaction.

A minor constraint on the formation of the α helix is the presence of Pro residues. In proline the nitrogen atom is part of a rigid ring (Fig. 5–6), and rotation about the N–Cα bond is not possible. In addition, the nitrogen atom of a Pro residue in peptide linkage has no substituent hydrogen-to-hydrogen bond with other residues. For these reasons, proline is only rarely found within an α helix.

A final factor affecting the stability of an α helix is the identity of the amino acids located near the ends of the α-helical segment of a polypeptide. A small electric dipole exists in each peptide bond (see Fig. 7–4). These dipoles add across the hydrogen bonds in the helix so that the net dipole increases as helix length increases (Fig. 7–8). The four amino acids at either end of the helix do not participate fully in the helix hydrogen bonds. The partial positive and negative charges of the helix dipole actually reside on the peptide amino and carbonyl groups near the amino-terminal and carboxyl-terminal ends of the helix, respectively. For this reason, negatively charged amino acids are often found near the amino terminus of the helical segment, where they have a stabilizing interaction with the positive charge of the helix dipole; a positively charged amino acid at the amino-terminal end is destabilizing. The opposite is true at the carboxyl-terminal end of the helical segment.

Thus there are five different kinds of constraints that affect the stability of an α helix: (1) the electrostatic repulsion (or attraction) between amino acid residues with charged R groups, (2) the bulkiness of adjacent R groups, (3) the interactions between amino acid side chains spaced three (or four) residues apart, (4) the occurrence of Pro residues, and (5) the interaction between amino acids at the ends of the helix and the electric dipole inherent to this structure.
Figure 7–9  The β conformation of polypeptide chains. Views show the R groups extending out from the β pleated sheet and emphasize the pleated sheet described by the planes of the peptide bonds. Hydrogen-bond cross-links between adjacent chains are also shown. (a) Antiparallel β sheets, in which the amino-terminal to carboxyl-terminal orientation of adjacent chains (arrows) is inverse. (b) Parallel β sheets. (c) Silk fibers are made up of the protein fibroin. Its structure consists of layers of antiparallel β sheets rich in Ala (purple) and Gly (yellow) residues. The small side chains interdigitate and allow close packing of each layered sheet, as shown in this side view.

Pauling and Corey predicted a second type of repetitive structure, the β conformation. This is the more extended conformation of the polypeptide chains, as seen in the silk protein fibroin (a member of a class of fibrous proteins called β-keratins), and its structure has been confirmed by x-ray analysis. In the β conformation, which like the α helix is common in proteins, the backbone of the polypeptide chain is extended into a zigzag rather than helical structure (Fig. 7–9). In fibroin the zigzag polypeptide chains are arranged side by side to form a structure resembling a series of pleats; such a structure is called a β pleated sheet. In the β conformation the hydrogen bonds can be either intrachain, or interchain between the peptide linkages of adjacent polypeptide chains. All the peptide linkages of β-keratin participate in interchain hydrogen bonding. The R groups of adjacent amino acids protrude in opposite directions from the zigzag structure, creating an alternating pattern as seen in the side view (Fig. 7–9c).

The adjacent polypeptide chains in a β pleated sheet can be either parallel (having the same amino-to-carboxyl polypeptide orientation) or antiparallel (having the opposite amino-to-carboxyl orientation). The structures are similar, although the repeat period is shorter for the parallel conformation (0.65 nm, as opposed to 0.7 nm for antiparallel).

In some structural situations there are limitations to the kinds of amino acids that can occur in the β structure. When two or more pleated sheets are layered closely together within a protein, the R groups of the amino acid residues on the contact surfaces must be relatively small. β-Keratins such as silk fibroin and the protein of spider webs have a very high content of Gly and Ala residues, those with the smallest R groups. Indeed, in silk fibroin Gly and Ala alternate over large parts of the sequence (Fig. 7–9c).

Figure 7–10  Structure of a β turn or β bend. (a) Note the hydrogen bond between the peptide groups of the first and fourth residues involved in the bend. (b) The trans and cis isomers of a peptide bond involving the imino nitrogen of proline. Over 99.95% of the peptide bonds between amino acid residues other than Pro are in the trans configuration. About 6% of the peptide bonds involving the imino nitrogen of proline, however, are in the cis configuration, and many of these occur at β turns.

The α helix and the β conformation are the major repetitive secondary structures easily recognized in a wide variety of proteins. Other repetitive structures exist, often in only one or a few specialized proteins. An example is the collagen helix (see Fig. 7–14). One other type of secondary structure is common enough to deserve special mention. This is a β bend or β turn (Fig. 7–10), often found where a polypeptide chain abruptly reverses direction. (These turns often connect the ends of two adjacent segments of an antiparallel β pleated sheet, hence the name.) The structure is a tight turn (~180°) involving four amino acids. The peptide groups flanking the first amino acid are hydrogen bonded to the peptide groups flanking the fourth. Gly and Pro residues often occur in β turns, the former because it is small and flexible; and the latter because peptide bonds involving the imino nitrogen of proline readily assume the cis configuration (Fig. 7–10b), a form that is particularly amenable to a tight turn. β Turns are often found near the surface of a protein.

Figure 7–11  A Ramachandran plot. The values of Φ and ψ for the various secondary structures are overlaid on the plot from Fig. 7–5.

The α helix and β conformation are stable because steric repulsion is minimized and hydrogen bonding is maximized. As shown by a Ramachandran plot, these structures fall within a range of sterically allowed structures that is relatively restricted. Values of Φ and ψ for common secondary structures are shown in Figure 7–11. Most values of Φ and ψ for amino acid residues, taken from known protein structures, fall into the expected regions, with high concentrations near the α helix and β conformation values as expected. The only amino acid often found in a conformation outside these regions is glycine. Because its hydrogen side chain is small, a Gly residue can take up many conformations that are sterically forbidden for other amino acids.

Some amino acids are accommodated in the different types of secondary structures better than others. An overall summary is presented in Figure 7–12. Some biases, such as the presence of Pro and Gly residues in β turns, can be explained readily; other evident biases are not understood.

Figure 7–12  Relative probabilities that a given
amino acid will occur in the three common types
of secondary structure.
Figure 7–13  (a) Hair α-keratin is an elongated α helix with somewhat thicker domains near the amino and carboxy termini. Pairs of these helices are interwound, probably in a left-handed sense, to form two-chain coiled coils. These then combine in higher-order structures called protofilaments and protofibrils, as shown in (b). (About four protofibrils combine to form a filament.) The individual two-chain coiled coils in the various substructures also appear to be interwound, but the handedness of the interwinding and other structural details are unknown.
Figure 7–14  Structure of collagen. The collagen helix is a repeating secondary structure unique to this protein. (a) The repeating tripeptide sequence Gly–X–Pro or Gly–X–Hyp adopts a left-handed helical structure with three residues per turn. The repeating sequence used to generate this model is Gly–Pro–Hyp. (b) Space-filling model of the collagen helix shown in (a). (c) Three of these helices wrap around one another with a right-handed twist. The resulting three-stranded molecule is referred to as tropocollagen (see Fig. 7–15).
(d) The three-stranded collagen superhelix shown from one end, in a ball-and-stick representation. Glycine residues are shown in red. Glycine, because of its small size, is required at the tight junction where the three chains are in contact.
Figure 7–15  The structure of collagen fibers. Tropocollagen (Mr 300,000) is a rod-shaped molecule, about 300 nm long and only 1.5 nm thick. The three helically intertwined polypeptides are of equal length, each having about 1,000 amino acid residues. In some collagens all three chains are identical in amino acid sequence, but in others two chains are identical and the third differs. The heads of adjacent molecules are staggered, and the alignment of the head groups of every fourth molecule produces characteristic cross-striations 64 nm apart that are evident in an electron micrograph.
Figure 7–16  Tropoelastin molecules and their linkage to form a network of polypeptide chains in elastin. Elastin consists of tropoelastin molecules cross-linked to give two-dimensional or three-dimensional elasticity. In addition to desmosine residues (in red), which can link two, three, or four tropoelastin molecules, as shown, elastin contains other kinds of cross-linkages, such as lysinonorleucine, also designated in red.

α-Keratin, collagen, and elastin provide clear examples of the relationship between protein structure and biological function (Table 7–1). These proteins share properties that give strength and/or elasticity to structures in which they occur. They have relatively simple structures, and all are insoluble in water, a property conferred by a high concentration of hydrophobic amino acids both in the interior of the protein and on the surface. These proteins represent an exception to the rule that hydrophobic groups must be buried. The hydrophobic core of the molecule therefore contributes less to structural stability, and covalent bonds assume an especially important role.

α-Keratin and collagen have evolved for strength. In vertebrates, α-keratins constitute almost the entire dry weight of hair, wool, feathers, nails, claws, quills, scales, horns, hooves, tortoise shell, and much of the outer layer of skin. Collagen is found in connective tissue such as tendons, cartilage, the organic matrix of bones, and the cornea of the eye. The polypeptide chains of both proteins have simple helical structures. The α-keratin helix is the right-handed α helix found in many other proteins (Fig. 7–13). However, the collagen helix is unique. It is left-handed (see Box 7–1) and has three amino acid residues per turn (Fig. 7–14). In both α-keratin and collagen, a few amino acids predominate. α-Keratin is rich in the hydrophobic residues Phe, Ile, Val, Met, and Ala. Collagen is 35% Gly, 11% Ala, and 21% Pro and Hyp (hydroxyproline; see Fig. 5–8). The unusual amino acid content of collagen is imposed by structural constraints unique to the collagen helix. The amino acid sequence in collagen is generally a repeating tripeptide unit, Gly–X–Pro or Gly–X–Hyp, where X can be any amino acid. The food product gelatin is derived from collagen. Although it is protein, it has little nutritional value because collagen lacks significant amounts of many amino acids that are essential in the human diet.

In both α-keratin and collagen, strength is amplified by wrapping multiple helical strands together in a superhelix, much the way strings are twisted to make a strong rope (Figs. 7–13, 7–14). In both proteins the helical path of the supertwists is opposite in sense to the twisting of the individual polypeptide helices, a conformation that permits the closest possible packing of the multiple polypeptide chains. The superhelical

twisting is probably left-handed in α-keratin (Fig. 7–13) and right-handed in collagen (Fig. 7–14). The tight wrapping of the collagen triple helix provides great tensile strength with no capacity to stretch: Collagen fibers can support up to 10,000 times their own weight and are said to have greater tensile strength than a steel wire of equal cross section.

The strength of these structures is also enhanced by covalent cross-links between polypeptide chains within the multi-helical "ropes" and between adjacent ones. In α-keratin, the cross-links are contributed by disulfide bonds (Box 7–2). In the hardest and toughest α-keratins, such as those of tortoise shells and rhinoceros horns, up to 18% of the residues are cysteines involved in disulfide bonds. The arrangement of α-keratin to form a hair fiber is shown in Figure 7–13. In collagen, the cross-links are contributed by an unusual type of covalent link between two Lys residues that creates a nonstandard amino acid residue called lysinonorleucine, found only in certain fibrous proteins.

Collagen fibrils consist of recurring three-stranded polypeptide units called tropocollagen, arranged head to tail in parallel bundles (Fig. 7–15). The rigid, brittle character of the connective tissue in older people is the result of an accumulation of covalent cross-links in collagen as we age.

Human genetic defects involving collagen illustrate the close relationship between amino acid sequence and three-dimensional structure in this protein. Osteogenesis imperfecta results in abnormal bone formation in human babies. Ehlers–Danlos syndrome is characterized by loose joints. Both can be lethal and both result from the substitution of a Cys or Ser residue, respectively, for a Gly (a different Gly residue in each case) in the amino acid sequence of collagen. These seemingly small substitutions have a catastrophic effect on collagen function because they disrupt the Gly–X–Pro repeat that gives collagen its unique helical structure.

Elastic connective tissue contains the fibrous protein elastin, which resembles collagen in some of its properties but is very different in others. The polypeptide subunit of elastin fibrils is tropoelastin (Mr 72,000), containing about 800 amino acid residues. Like collagen, it is rich in Gly and Ala residues. Tropoelastin differs from tropocollagen in having many Lys but few Pro residues; it forms a special type of helix, different from the a helix and the collagen helix. Tropoelastin consists of lengths of helix rich in Gly residues separated by short regions containing Lys and Ala residues. The helical portions stretch on applying tension but revert to their original length when tension is released.

The regions containing Lys residues form covalent cross-links. Four Lys side chains come together and are enzymatically converted into desmosine (see Fig. 5–8) and a related compound, isodesmosine; these amino acids are found only in elastin. Lysinonorleucine (p. 173) also occurs in elastin. These nonstandard amino acids are capable of joining tropoelastin chains into arrays that can be stretched reversibly in all directions (Fig. 7–16).
B O X  7–2
Permanent Waving Is Biochemical Engineering

       α-Keratins exposed to moist heat can be stretched into the β conformation, but on cooling revert to the α-helical conformation spontaneously. This is because the R groups of α-keratins are larger on average than those of β-keratins and thus are not compatible with a stable β conformation. This characteristic of α-keratins, as well as their content of disulfide cross-linkages, is the basis of permanent waving. The hair to be waved is first bent around a form of appropriate shape. A solution of a reducing agent, usually a compound containing a thiol or sulfhydryl group (–SH), is then applied with heat. The reducing agent cleaves the disulfide cross-linkages by reducing each cystine to two cysteine residues, one in each adjacent chain. The moist heat breaks hydrogen bonds and causes the α-helical structure of the polypeptide chains to uncoil and stretch. After a time the reducing solution is removed, and an oxidizing agent is added to establish new disulfide bonds between pairs of Cys residues of adjacent polypeptide chains, but not the same pairs that

existed before the treatment. On washing and cooling the hair, the polypeptide chains revert to their α-helical conformation. The hair fibers now curl in the desired fashion because new disulfide cross-linkages have been formed where they will exert some torsion or twist on the bundles of α-helical coils in the hair fibers.
www.bioinfo.org.cn/book/biochemistry/chapt07/bio1.htm

Although fibrous proteins generally have only one type of secondary structure, globular proteins can incorporate several types of secondary structure in the same molecule. Globular proteins – including enzymes, transport proteins, some peptide hormones, and immunoglobulins – are folded structures much more compact than α or β conformations (as shown for serum albumin in Figure 7–17).

The three-dimensional arrangement of all atoms in a protein is referred to as the tertiary structure, and this now becomes our focus. Whereas the secondary structure of polypeptide chains is determined by the short-range structural relationship of amino acid residues, tertiary structure is conferred by longer-range aspects of amino acid sequence. Amino acids that are far apart in the polypeptide sequence and reside in different types of secondary structure may interact when the

protein is folded. The formation of bends in the polypeptide chain during folding and the direction and angle of these bends are determined by the number and location of specific bend-producing amino acids, such as Pro, Thr, Ser, and Gly residues. Moreover, loops of the highly folded polypeptide chain are held in their characteristic tertiary positions by different kinds of weak-bonding interactions (and sometimes by covalent bonds such as disulfide cross-links) between R groups of adjacent loops.

We will now consider how secondary structures contribute to the tertiary folding of a polypeptide chain in a globular protein, and how this structure is stabilized by weak interactions, in particular by hydrophobic interactions involving nonpolar amino acid side chains in the tightly packed core of the protein.

Figure 7–18  The heme group, present in myoglobin, hemoglobin, cytochrome b, and many other heme proteins, consists of a complex organic ring structure, protoporphyrin, to which is bound an iron atom in its ferrous (Fe2+) state. Two representations are shown in (a) and (b).
(c) The iron atom has six coordination bonds, four in the plane of, and bonded to, the flat porphyrin molecule and two perpendicular to it. (d) In myoglobin and hemoglobin, one of the perpendicular coordination bonds is bound to a nitrogen atom of a His residue. The other is “open” and serves as the binding site for an O2 molecule, as shown here in the edge view.
Figure 7–19  Tertiary structure of sperm whale
myoglobin. The orientation of the protein is the
same in all panels; the heme group is shown in red.
(a) The polypeptide backbone, shown in a ribbon
representation of a type introduced by Jane Richardson;
this highlights regions of secondary structure.
The α-helical regions in myoglobin are evident.
Amino acid side chains are not shown. (b) A
space-filling model, showing that the heme group is
largely buried. All amino acid side chains are included.
(c) A ribbon representation, including side chains
(purple) for the hydrophobic residues Leu, Ile,
Val, and Phe. (d) A space-filling model with all
amino acid side chains. The hydrophobic residues
are again shown in purple; most are not visible
because they are buried in the interior of the
protein.

The breakthrough in understanding globular protein structure came from x-ray diffraction studies of the protein myoglobin carried out by John Kendrew and his colleagues in the 1950s (Box 7–3). Myoglobin is a relatively small (Mr 16,700), oxygen-binding protein of muscle cells that functions in the storage and transport of oxygen for mitochondrial oxidation of cell nutrients. Myoglobin contains a single polypeptide chain of 153 amino acid residues of known sequence and a single iron-porphyrin, or heme, group (Fig. 7–18), identical to that of hemoglobin, the oxygen-binding protein of erythrocytes. The heme group is responsible for the deep red-brown color of both myoglobin and hemoglobin. Myoglobin is particularly abundant in the muscles of diving mammals such as the whale, seal, and porpoise, whose muscles are so rich in this protein that they are brown. Storage of oxygen by muscle myoglobin permits these animals to remain submerged for long periods of time.

Figure 7–19 shows several structural representations of myoglobin, illustrating how the polypeptide chain is folded in three dimensions – its tertiary structure. The backbone of the myoglobin molecule is made up of eight relatively straight segments of a helix interrupted by bends. The longest α helix has 23 amino acid residues and the shortest only seven; all are right-handed. More than 70% of the amino acids in the myoglobin molecule are in these α-helical regions. X-ray analysis also revealed the precise position of each of the R groups, which occupy nearly all the open space between the folded loops.

Other important conclusions were drawn from the structure of myoglobin. The positioning of amino acid side chains reflects a structure that derives much of its stability from hydrophobic interactions. Most of the hydrophobic R groups are in the interior of the myoglobin molecule, hidden from exposure to water. All but two of the polar R groups are located on the outer surface of the molecule, and all of them are hydrated. The myoglobin molecule is so compact that in its interior there is room for only four molecules of water. This dense hydrophobic core is typical of globular proteins. The fraction of space occupied by atoms in an organic liquid is 0.25 to 0.35; in a typical solid the fraction is 0.75. In a protein the fraction is 0.72 to 0.76, very comparable to that in a solid. In this closely packed environment weak interactions strengthen and reinforce each other. For example, the nonpolar side chains in the core are so close together that short-range van der Waals interactions make a significant contribution to stabilizing hydrophobic interactions. By contrast, in an oil droplet suspended in water, the van der Waals interactions are minimal and the cohesiveness of the droplet is based almost exclusively on entropy.

The structure of myoglobin both confirmed some expectations and introduced some new elements of secondary structure. As predicted by Pauling and Corey, all the peptide bonds are in the planar trans configuration. The α helices in myoglobin provided the first direct experimental evidence for the existence of this type of secondary structure. Each of the four Pro residues of myoglobin occurs at a bend (recall that the rigid R group of proline is largely incompatible with α-helical structure). Other bends contain Ser, Thr, and Asn residues, which are among the amino acids that tend to be incompatible with α-helical structure if they are in close proximity (p. 168).

The flat heme group rests in a crevice, or pocket, in the myoglobin molecule. The iron atom in the center of the heme group has two bonding (coordination) positions perpendicular to the plane of the heme.

One of these is bound to the R group of the His residue at position 93; the other is the site to which an O2 molecule is bound. Within this pocket, the accessibility of the heme group to solvent is highly restricted. This is important for function because free heme groups in an oxygenated solution are rapidly oxidized from the ferrous (Fe2+) form, which is active in the reversible binding of O2, to the ferric (Fe3+) form, which does not bind O2.
B O X  7–3
X-Ray Diffraction

The spacing of atoms in a crystal lattice can be determined by measuring the angles and the intensities at which a beam of x rays of a given wavelength is diffracted by the electron shells around the atoms. For example, x-ray analysis of sodium chloride crystals shows that Na+ and Cl ions are arranged in a simple cubic lattice. The spacing of the different kinds of atoms in complex organic molecules, even very large ones such as proteins, can also be analyzed by x-ray diffraction methods. However, this is far more difficult than for simple salt crystals because the very large number of atoms in a protein molecule yields thousands of diffraction spots that must be analyzed by computer.

The process may be understood at an elementary level by considering how images are generated in a light microscope. Light from a point source is focused on an object. The light waves are scattered by the object, and these scattered waves are recombined by a series of lenses to generate an enlarged image of the object. The limit to the size of an object whose structure can be determined by such a system (i.e., its resolving power) is determined by the wavelength of the light. Objects smaller than half the wavelength of the incident light cannot be resolved. This is why x rays, with wavelengths in the range of a few tenths of a nanometer (often measured in angstroms, Å; 1 Å = 0.1 nm), must be used for proteins. There are no lenses that can recombine x rays to form an image; the pattern of diffracted light is collected directly and converted into an image by computer analysis.

Operationally, there are several steps in x-ray structural analysis. The amount of information obtained depends on the degree of structural order in the sample. Some important structural parameters were obtained from early studies of the diffraction patterns of the fibrous proteins that occur in fairly regular arrays in hair and wool. More detailed three-dimensional structural information, however, requires a highly ordered crystal of a protein. Protein crystallization is something of an empirical science, and the structures of many important proteins are not yet known simply because they have proven difficult to crystallize. Once a crystal is obtained, it is placed in an x-ray beam between the x-ray source and a detector. A regular array of spots called reflections (Fig. 1) is generated by precessional motion of the crystal. The spots represent

reflections of the x-ray beam, and each atom in a molecule makes a contribution to each spot. The overall pattern of spots is related to the structure of the protein through a mathematical device called a Fourier transform. The intensity of each spot is measured from the positions and intensities of the spots in several of these diffraction patterns, and the precise three-dimensional structure of the protein is calculated.

John Kendrew found that the x-ray diffraction pattern of crystalline myoglobin from muscles of the sperm whale is very complex, with nearly 25,000 reflections. Computer analysis of these reflections took place in stages. The resolution improved at each stage, until in 1959 the positions of virtually all the atoms in the protein could be determined. The amino acid sequence deduced from the structure agreed with that obtained by chemical analysis. The structures of hundreds of proteins have since been determined to a similar level of resolution, many of them much more complex than myoglobin.

Figure 7–20  The three-dimensional structures of three small proteins: cytochrome c, lysozyme, and ribonuclease. For lysozyme and ribonuclease the active site of the enzyme faces the viewer. Key functional groups (the heme in cytochrome c, and amino acid side chains in the active site of lysozyme and ribonuclease) are shown in red; disulfide bonds are shown in yellow. Two representations of each protein are shown: a space-filling model and a ribbon representation. In the ribbon depictions, the β structures are represented by flat arrows and the α helices by spiral ribbons; the orientation in each case is the same as that of the space-filling model, to facilitate comparison.

With the elucidation of the tertiary structures of hundreds of other globular proteins by x-ray analysis, it is clear that myoglobin represents only one of many ways in which a polypeptide chain can be folded. In Figure 7–20 the structures of cytochrome c, lysozyme, and ribonuclease are compared. All have different amino acid sequences and different tertiary structures, reflecting differences in function. Like myoglobin, cytochrome c is a small heme protein (Mr 12,400) containing a single polypeptide chain of about 100 residues and a single heme group, which in this case is covalently attached to the polypeptide. It functions as a component of the respiratory chain of mitochondria (Chapter 18). X-ray analysis of cytochrome c (Fig. 7–20) shows that only about 40% of the polypeptide is in α-helical segments, compared with almost 80% of the myoglobin chain. The rest of the cytochrome c chain contains bends, turns, and irregularly coiled and extended segments. Thus, cytochrome c and myoglobin differ markedly in structure, even though both are small heme proteins.

Lysozyme (Mr 14,600) is an enzyme in egg white and human tear; that catalyzes the hydrolytic cleavage of polysaccharides in the protective cell walls of some families of bacteria. Lysozyme is so named because it can lyse, or degrade, bacterial cell walls and thus serve as a bactericidal agent. Like cytochrome c, about 40% of its 129 amino acid residues are in α-helical segments, but the arrangement is different and some β structure is also present. Four disulfide bonds contribute stability to this structure. The α helices line a long crevice in the side of the molecule (Fig. 7–20), called the active site, which is the site of substrate binding and action. The bacterial polysaccharide that is the substrate for lysozyme fits into this crevice.

Ribonuclease, another small globular protein (Mr 13,700), is an enzyme secreted by the pancreas into the small intestine, where it catalyzes the hydrolysis of certain bonds in the ribonucleic acids present in ingested food. Its tertiary structure, determined by x-ray analysis, shows that little of its 124 amino acid polypeptide chain is in α-helical conformation, but it contains many segments in the β conformation. Like lysozyme, ribonuclease has four disulfide bonds between loops of the polypeptide chain (Fig. 7–20).

Table 7–2 shows the relative percentages of α helix and β conformation among several small, single-chain, globular proteins. Each of these proteins has a distinct structure, adapted for its particular biological function. These proteins do share several important properties, however. Each is folded compactly, and in each case the hydrophobic amino acid side chains are oriented toward the interior (away from water) and the hydrophilic side chains are on the surface. These specific structures are also stabilized by a multitude of hydrogen bonds and some ionic interactions.

The way to demonstrate the importance of a specific protein structure for biological function is to alter the structure and determine the effect on function. One extreme alteration is the total loss or randomization of three-dimensional structure, a process called denaturation. This is the familiar process that occurs when an egg is cooked. The white of the egg, which contains the soluble protein egg albumin, coagulates to a white solid on heating. It will not redissolve on cooling to yield a clear solution of protein as in the original unheated egg white. Heating of egg albumin has therefore changed it, seemingly in an irreversible manner. This effect of heat occurs with virtually all globular proteins, regardless of their size or biological function, although the precise temperature at which it occurs may vary and it is not always irreversible. The change in structure brought about by denaturation is almost invariably associated with loss of function. This is an expected consequence of the principle that the specific three-dimensional structure of a protein is critical to its function.

Proteins can be denatured not only by heat, but also by extremes of pH, by certain miscible organic solvents such as alcohol or acetone, by certain solutes such as urea, or by exposure of the protein to detergents. Each of these denaturing agents represents a relatively mild treatment in the sense that no covalent bonds in the polypeptide chain are broken. Boiling a protein solution disrupts a variety of weak interactions. Organic solvents, urea, and detergents act primarily by disrupting the hydrophobic interactions that make up the stable core of globular proteins; extremes of pH alter the net charge on the protein,

causing electrostatic repulsion and disruption of some hydrogen bonding. Remember that the native structure of most proteins is only marginally stable. It is not necessary to disrupt all of the stabilizing weak interactions to reduce the thermodynamic stability to a level that is insufficient to keep the protein conformation intact.
Figure 7–21  Renaturation of unfolded, denatured ribonuclease, with reestablishment of correct disulfide cross-links. Urea is added to denature ribonuclease, and mercaptoethanol (HOCH2CH2SH) to reduce and thus cleave the disulfide bonds of the four cystine residues to yield eight cysteine residues.

The most important proof that the tertiary structure of a globular protein is determined by its amino acid sequence came from experiments showing that denaturation of some proteins is reversible. Some globular proteins denatured by heat, extremes of pH, or denaturing reagents will regain their native structure and their biological activity, a process called renaturation, if they are returned to conditions in which the native conformation is stable.

A classic example is the denaturation and renaturation of ribonuclease. Purified ribonuclease can be completely denatured by exposure to a concentrated urea solution in the presence of a reducing agent. The reducing agent cleaves the four disulfide bonds to yield eight Cys residues, and the urea disrupts the stabilizing hydrophobic interactions, thus freeing the entire polypeptide from its folded conformation. Under these conditions the enzyme loses its catalytic activity and undergoes complete unfolding to a randomly coiled form (Fig. 7–21). When the urea and the reducing agent are removed, the randomly coiled, denatured ribonuclease spontaneously refolds into its correct tertiary structure, with full restoration of its catalytic activity (Fig. 7–21). The refolding of ribonuclease is so accurate that the four intrachain disulfide bonds are reformed in the same positions in the renatured molecule as in the native ribonuclease. In theory, the eight Cys residues could have recombined at random to form up to four disulfide bonds in 105 different ways. This classic experiment, carried out by Christian Animsen in the 1950s, proves that the amino acid sequence of the polypeptide chain of proteins contains all the information required to fold the chain into its native, three-dimensional structure.

The study of homologous proteins has strengthened this conclusion. We have seen that in a series of homologous proteins, such as cytochrome c, from different species, the amino acid residues at certain positions in the sequence are invariant, whereas at other positions the amino acids may vary (see Fig. 6–15). This is also true for myoglobins isolated from different species of whales, from the seal, and from some terrestrial vertebrates. The similarity of the tertiary structures and amino acid sequences of myoglobins from different sources led to the conclusion that the amino acid sequence of myoglobin somehow must determine its three-dimensional folding pattern, an idea substantiated by the similar structures found by x-ray analysis of myoglobins from different species. Other sets of homologous proteins also show this relationship; in each case there are sequence homologies as well as similar tertiary structures.

Many of the invariant amino acid residues of homologous proteins appear to occur at critical points along the polypeptide chain. Some are found at or near bends in the chain, others at cross-linking points between loops in the tertiary structure, such as Cys residues involved in disulfide bonds. Still others occur at the catalytic sites of enzymes or at the binding sites for prosthetic groups, such as the heme group of cytochrome c.

Looking at naturally occurring amino acid substitutions has an important limitation. Any change that abolishes the function of an essential protein (e.g., a change in an invariant residue) usually results in death of the organism very early in development. This severe form of natural selection eliminates many potentially informative changes from study. Fortunately, biochemists have devised methods to specifically alter amino acid sequences in the laboratory and examine the effects of these changes on protein structure and function. These methods are derived from recombinant DNA technology (Chapter 28) and rely on altering the genetic material encoding the protein. By this process, called site-directed mutagenesis, specific amino acid sequences can be changed by deleting, adding, rearranging, or substituting amino acid residues. The catalytic roles of certain amino acids lining the active sites of enzymes such as triose phosphate isomerase and chymotrypsin have been elucidated by substituting different amino acids in their place. The importance of certain amino acids in protein folding and structure is being addressed in the same way.

Although the native tertiary conformation of a globular protein is the thermodynamically most stable form its polypeptide chain can assume, this conformation must not be regarded as absolutely rigid. Globular proteins have a certain amount of flexibility in their backbones and undergo short-range internal fluctuations. Many globular proteins also undergo small conformational changes in the course of their biological function. In many instances, these changes are associated with the binding of a ligand. The term ligand in this context refers to a specific molecule that is bound by a protein (from Latin, ligare, "to tie" or "bind"). For example, the hemoglobin molecule, which we shall examine later in this chapter, has one conformation when oxygen is bound, and another when the oxygen is released. Many enzyme molecules also undergo a conformational change on binding their substrates, a process that is part of their catalytic action (Chapter 8).

Figure 7–22  A possible protein-folding pathway. (a) Protein folding often begins with spontaneous formation of a structural nucleus consisting of a few particularly stable regions of secondary structure. (b) As other regions adopt secondary structure, they are stabilized by long-range interactions with the structural nucleus. (c) The folding process continues until most of the polypeptide has assumed regular secondary structure. (d) The final structure generally represents the most thermodynamically stable conformation.
Figure 7–23  Extended β chains of amino acids tend to twist in a right-handed sense because the slightly twisted conformation is more stable than the linear conformation (a). This influences the conformation of the polypeptide segments that connect two β strands, and also the stable conformations assumed by several adjacent β strands. (b) Connections between parallel β chains are right-handed. (c) The β turn is a common connector between antiparallel β chains. (d) The tendency for right-handed twisting is seen in two particularly stable arrangements of adjacent β chains: the ß barrel and the saddle; these structures form the stable core of many proteins.
Figure 7–24  The βαβ loop. The shaded region denotes the area where stabilizing hydrophobic interactions occur.

In living cells, proteins are made from amino acids at a very high rate. For example, Escherichia coli cells can make a complete, biologically active protein molecule containing 100 amino acid residues in about 5 s at 37 °C. Yet calculations show that at least 1050 yr would be required for a polypeptide chain of 100 amino acid residues to fold itself spontaneously by a random process in which it tries out all possible conformations around every single bond in its backbone until it finds its native, biologically active form. Thus protein folding cannot be a completely random, trial-and-error process. There simply must be shortcuts.

The folding pathway of a large polypeptide chain is unquestionably complicated, and the principles that guide this process have not yet been worked out in detail. For several proteins, however, there is evidence that folding proceeds through several discrete intermediates, and that some of the earliest steps involve local folding of regions of secondary structure. In one model (Fig. 7–22), the process is envisioned as hierarchical, following the levels of structure outlined at the beginning of this chapter. Local secondary structures would form first, followed by longer-range interactions between, say, two α helices with compatible amino acid side chains, a process continuing until folding

was complete. In an alternative model, folding is initiated by a spontaneous collapse of the polypeptide into a compact state mediated by hydrophobic interactions among nonpolar residues. The state resulting from this "hydrophobic collapse" may have a high content of secondary structure, but many amino acid side chains are not entirely fixed. Either or both models (and perhaps others) may apply to a given protein.

A number of structural constraints help to guide the interaction of regions of secondary structure. The most common patterns are sometimes referred to as supersecondary structures. A prominent one is a tendency for extended β conformations to twist in a right-handed sense (Fig. 7–23a). This influences both the arrangement of β sheets relative to one another and the path of the polypeptide segment connecting two β strands. Two parallel β strands, for example, must be connected by a crossover strand (Fig. 7–23b). In principle, this crossover could have a right- or left-handed conformation, but only the right-handed form is found in proteins. The twisting of β sheets also leads to a characteristic twisting of the structure formed when many sheets are put together. Two examples of resulting structures are the β barrel and saddle shapes (Fig. 7–23d), which form the core of many larger structures.

Weak-bonding interactions represent the ultimate thermodynamic constraint on the interaction of different regions of secondary structure. The R groups of amino acids project outward from α-helical and β structures, and thus the need to bury hydrophobic residues means that water-soluble proteins must have more than one layer of secondary structure. One simple structural method for burying hydrophobic residues is a supersecondary structural unit called a βαβ loop (Fig. 7–24), a structure often repeated multiple times in larger proteins. More elaborate structures are domains made up of facing β sheets (with hydrophobic residues sandwiched between), and β sheets covered on one side with several α helices, as described later.

It becomes more difficult to bury hydrophobic residues in smaller structures, and the number of potential weak interactions available for stabilization decreases. For this reason, smaller proteins are often held together with a number of covalent bonds, principally disulfide linkages. Recall the multiple disulfide bonds in the small proteins insulin (see Fig. 6–10) and ribonuclease (Fig. 7–21). Other types of covalent bonds also occur. The heme group in cytochrome c, for example, is covalently linked to the protein on two sides, providing a significant stabilization of the entire protein structure.

Not all proteins fold spontaneously as they are synthesized in the cell. Proteins that facilitate the folding of other proteins have been found in a wide variety of cells. These are called polypeptide chain binding proteins or molecular chaperones. Several of these proteins can bind to polypeptide chains, preventing nonspecific aggregation of weak-bonding side chains. They guide the folding of some polypeptides, as well as the assembly of multiple polypeptides into larger structures. Dissociation of polypeptide chain binding proteins from polypeptides is often coupled to ATP hydrolysis. One family of such proteins has structures that are highly conserved in organisms ranging from bacteria to mammals. These proteins (Mr 70,000), as well as several other families of polypeptide chain binding proteins, were originally identified as “heat shock” proteins because they are induced in many cells when heat stress is applied, and apparently help stabilize other proteins.

Some proteins have also been found that promote polypeptide folding by catalyzing processes that otherwise would limit the rate of folding, such as the reversible formation of disulfide bonds or proline isomerization (the interconversion of the cis and trans isomers of peptide bonds involving the imino nitrogen of proline; see Fig. 7–10).

Figure 7–25  Examples of some common structural motifs in proteins. (a) The α/β barrel, found in pyruvate kinase and triose phosphate isomerase, enzymes of the glycolytic pathway. This structure also occurs in the larger domain of ribulose-1,5-bisphosphate carboxylase/oxygenase (known also as rubisco), an enzyme essential to the fixation of CO2 by plants; in glycolate oxidase, an enzyme in photorespiration; and in a number of other unrelated proteins. (b) The four-helix bundle, shown here in cytochrome b562 and myohemerythrin. A dinuclear iron center and coordinating amino acids in myohemerythrin are shown in orange. Myohemerythrin is a nonheme oxygen-transporting protein found in certain worms and mollusks. The four-helix bundle is also found in apoferritin and the tobacco mosaic virus coat-protein. Apoferritin is a widespread protein involved in iron transport and storage. (c) αβ with saddle at core, in carboxypeptidase, a protein-hydrolyzing (proteolytic) enzyme, and lactate dehydrogenase, a glycolytic enzyme.
(d) β–β Sandwich. In the protein insecticyanin of moths, the hydrophobic pocket binds biliverdin, a colored substance that plays a role in camouflage. α1-Antitrypsin is a naturally occurring inhibitor of the proteolytic enzyme trypsin.

Following the folding patterns outlined above and others yet to be discovered, a newly synthesized polypeptide chain quickly assumes its most stable tertiary structure. Although each protein has a unique structure, several patterns of tertiary structure seem to occur repeatedly in proteins that differ greatly in biological function and amino acid sequence (Fig. 7–25). This may reflect an unusual degree of stability and/or functional flexibility conferred by these particular tertiary structures. It also demonstrates that biological function is determined not only by the overall three-dimensional shape of the protein, but also by the arrangement of amino acids within that shape.

One structural motif is made up of eight β strands arranged in a circle with each β strand connected to its neighbor by an α helix. The β regions are arranged in the barrel structure described in Figure 7–23, and they influence the overall tertiary structure, giving rise to the name α/β barrel (Fig. 7–25a). This structure is found in many enzymes; a binding site for a cofactor or substrate is often found in a pocket formed near an end of the barrel.

Another structural motif is the four-helix bundle (Fig. 7–25b), in which four α helices are connected by three peptide loops. The helices are slightly tilted to form a pocket in the middle, which often contains a binding site for a metal or other cofactors essential for biological function. A somewhat similar structure in which seven helices are arranged

in a barrel-like motif is found in some membrane proteins (see Fig. 10–10). The seven helices often surround a channel that spans the membrane.

A third motif has a β sheet in the "saddle" conformation forming a stable core, often surrounded by a number of α-helical regions (Fig. 7–25c). Structures of this kind are found in many enzymes. The location of the substrate binding site varies, determined by the placement of the a helices and other variable structural elements.

One final motif makes use of a sandwich of β sheets, layered so that the strands of the sheets form a quiltlike cross-hatching when viewed from above (Fig. 7–25d). This creates a hydrophobic pocket between the β sheets that is often a binding site for a planar hydrophobic molecule.

www.bioinfo.org.cn/book/biochemistry/chapt07/bio2.htm

Some proteins contain two or more separate polypeptide chains or subunits, which may be identical or different in structure. One of the best-known examples of a multisubunit protein is hemoglobin, the oxygen-carrying protein of erythrocytes. Among the larger, more complex multisubunit proteins are the enzyme RNA polymerase of E. coli, responsible for initiation and synthesis of RNA chains; the enzyme aspartate transcarbamoylase (12 chains; see Fig. 8–26), important in the synthesis of nucleotides; and, as an extreme case, the enormous pyruvate dehydrogenase complex of mitochondria, which is a cluster of three enzymes containing a total of 102 polypeptide chains.

The arrangement of proteins and protein subunits in three-dimensional complexes constitutes quaternary structure. The interactions between subunits are stabilized and guided by the same forces that stabilize tertiary structure: multiple noncovalent interactions. The association of polypeptide chains can serve a variety of functions. Many multisubunit proteins serve regulatory functions; their activities are altered by the binding of certain small molecules. Interactions between subunits can permit very large changes in enzyme activity in response to small changes in the concentration of substrate or regulatory molecules (Chapter 8). In other cases, separate subunits can take on separate but related functions. Entire metabolic pathways are often organized by the association of a supramolecular complex of enzymes, permitting an efficient channeling of pathway intermediates from one enzyme to the next. Other associations, such as the histones in a nucleosome or the coat proteins of a virus, serve primarily structural roles. Large assemblies sometimes reflect complex functions. One obvious example is the complicated structure of ribosomes (see Fig. 26–12), which carry out protein synthesis.

X-ray and other analytical methods for structure determination become more difficult as the size and number of subunits in a protein increases. Nevertheless, sufficient data are already available to yield some very important information about the structure and function of multisubunit proteins.

Figure 7–26  The three-dimensional (quaternary) structure of deoxyhemoglobin, revealed by x-ray diffraction analysis, showing how the four subunits are packed together. (a) A ribbon representation. (b) A space-filling model. The α subunits are shown in white and light blue; the β subunits are shown in pink and purple. Note that the heme groups, shown in red, are relatively far apart.
Figure 7–27  Scanning electron micrographs of (a) normal and (b) sickled human erythrocytes. The sickled cells are fragile, and their breakdown causes anemia.

The first oligomeric protein to be subjected to x-ray analysis was hemoglobin (Mr 64,500), which contains four polypeptide chains and four heme prosthetic groups, in which the iron atoms are in the ferrous (Fe2+) state. The protein portion, called globin, consists of two α chains (141 residues each) and two β chains (146 residues each). Note that α and β do not refer to secondary structures in this case. Because hemoglobin is four times as large as myoglobin, much more time and effort were required to solve its three-dimensional structure, finally achieved by Max Perutz, John Kendrew, and their colleagues in 1959.

The hemoglobin molecule is roughly spherical, with a diameter of about 5.5 nm. The α and β chains contain several segments of α helix separated by bends, with a tertiary structure very similar to that of the single polypeptide of myoglobin. In fact, there are 27 invariant amino acid residues in these three polypeptide chains, and closely related amino acids at 40 additional positions, indicating that these polypeptides (myoglobin and the α and β chains of hemoglobin) are evolutionarily related. The four polypeptide chains in hemoglobin fit together in an approximately tetrahederal arrangement (Fig. 7–26).

One heme is bound to each polypeptide chain of hemoglobin. The oxygen-binding sites are rather far apart given the size of the molecule, about 2.5 nm from one another. Each heme is partially buried in a pocket lined with hydrophobic amino acid side chains. It is bound to its polypeptide chain through a coordination bond of the iron atom to the R group of a His residue (see Fig. 7–18). The sixth coordination bond of the iron atom of each heme is available to bind O2.

Closer examination of the quaternary structure of hemoglobin, with the help of molecular models, shows that although there are few contacts between the two α chains or between the two β chains, there are many contact points between the α and β chains. These contact points consist largely of hydrophobic side chains of amino acid residues, but also include ionic interactions involving the carboxyl-terminal residues of the four subunits.

Naturally occurring changes in the amino acid sequence of hemoglobin provide some useful insights into the relationship between structure and function in proteins. More than 300 genetic variants of hemoglobin are known to occur in the human population. Most of these variations are single amino acid changes that have only minor structural or functional effects. An exception is a substitution of valine for glutamate at position 6 of the β chain. This residue is on the outer surface of the molecule, and the change produces a "sticky" hydrophobic spot on the surface that results in abnormal quaternary association of hemoglobin. When oxygen concentrations are below a critical level, the subunits polymerize into linear arrays of fibers that distort cell shape. The result is a sickling of erythrocytes (Fig. 7–27), the cause of sickle-cell anemia.

Figure 7–28  The oxygen-binding curves of myoglobin (Mb) and hemoglobin (Hb). Myoglobin has a much greater affinity for oxygen than does hemoglobin. It is 50% saturated at oxygen partial pressures (pO2) of only 0.15 to 0.30 kPa, whereas hemoglobin requires a pO2 of about 3.5 kPa for 50% saturation. Note that although both hemoglobin and myoglobin are more than 95% saturated at the pO2 in arterial blood leaving the lungs (~13 kPa), hemoglobin is only about 75% saturated in resting muscle, where the pO2 is about 5 kPa, and only 10% saturated in working muscle, where the pO2 is only about 1.5 kPa. Thus hemoglobin can release its oxygen very effectively in muscle and other peripheral tissues. Myoglobin, on the other hand, is still about 80% saturated at a pO2 of 1.5 kPa, and therefore unloads very little oxygen even at very low pO2. Thus the sigmoid O2-saturation curve of hemoglobin is a molecular adaptation for its transport function in erythrocytes, assuring the binding and release of oxygen in the appropriate tissues.
Figure 7–29  Conformational changes induced in hemoglobin when oxygen binds. (The oxygen-bound form is shown at bottom.) There are multiple structural changes, some not visible here; most of the changes are subtle. The a and β subunits are colored as in Fig. 7–26.

Hemoglobin is an instructive model for studying the function of many regulatory oligomeric proteins. The blood in a human being must carry about 600 L of oxygen from the lungs to the tissues every day, but very little of this is carried by the blood plasma because oxygen is only sparingly soluble in aqueous solutions. Nearly all the oxygen carried by whole blood is bound and transported by the hemoglobin of the erythrocytes. Normal human erythrocytes are small (6 to 9 μm), biconcave disks (Fig. 7–27a). They have no nucleus, mitochondria, endoplasmic reticulum, or other organelles. The hemoglobin of the erythrocytes in arterial blood passing from the lungs to the peripheral tissues is about 96% saturated with oxygen. In the venous blood returning to the heart, the hemoglobin is only about 64% saturated. Thus blood passing through a tissue releases about one-third of the oxygen it carries.

The special properties of the hemoglobin molecule that make it such an effective oxygen carrier are best understood by comparing the O2-binding or O2-saturation curves of myoglobin and hemoglobin (Fig. 7–28). These show the percentage of O2-binding sites of hemoglobin or myoglobin that are occupied by O2 molecules when solutions of these proteins are in equilibrium with different partial pressures of oxygen in the gas phase. (The partial pressure of oxygen, abbreviated pO2, is the pressure contributed by oxygen to the overall pressure of a mixture of gases, and is directly related to the concentration of oxygen in the mixture.)

From its saturation curve, it is clear that myoglobin has a very high affinity for oxygen (Fig. 7–28). Furthermore, the O2-saturation curve of myoglobin is a simple hyperbolic curve, as might be expected from the mass action of oxygen on the equilibrium myoglobin + O2 ⇌ oxymyoglobin. In contrast, the oxygen affinity of each of the four O2-binding sites of deoxyhemoglobin is much lower, and the O2-saturation curve of hemoglobin is sigmoid (S-shaped) (Fig. 7–28). This shape indicates that whereas the affinity of hemoglobin for binding the first O2 molecule (to any of the four sites) is relatively low, the second, third, and fourth O2 molecules are bound with a very much higher affinity. This accounts for the steeply rising portion of the sigmoid curve. The increase in the affinity of hemoglobin for oxygen after the first O2 molecule is bound is almost 500-fold. Thus the oxygen affinity of each heme–polypeptide subunit of hemoglobin depends on whether O2 is bound to neighboring subunits. The conversion of deoxyhemoglobin to oxyhemoglobin requires the disruption of ionic interactions involving the carboxyl-terminal residues of the four subunits, interactions that constrain

the overall structure in a low-affinity state. The increase in affinity for successive O2 molecules reflects the fact that more of these ionic interactions must be broken for binding the first O2 than for binding later ones.

Once the first heme–polypeptide subunit binds an O2 molecule, it communicates this information to the remaining subunits through interactions at the subunit interfaces. The subunits respond by greatly increasing their oxygen affinity. This involves a change in the conformation of hemoglobin that occurs when oxygen binds (Fig. 7–29). Such communication among the four heme–polypeptide subunits of hemoglobin is the result of cooperative interactions among the subunits. Because binding of one O2 molecule increases the probability that further O2 molecules will be bound by the remaining subunits, hemoglobin is said to have positive cooperativity. Sigmoid binding curves, like that of hemoglobin for oxygen, are characteristic of positive cooperative binding. Cooperative oxygen binding does not occur with myoglobin, which has only one heme group within a single polypeptide chain and thus can bind only one O2 molecule; its saturation curve is therefore hyperbolic. The multiple subunits of hemoglobin and the interactions between these subunits result in a fundamental difference between the O2-binding actions of myoglobin and hemoglobin.

Positive cooperativity is not the only result of subunit interactions in oligomeric proteins. Some oligomeric proteins show negative cooperativity: binding of one ligand molecule decreases the probability that further ligand molecules will be bound. These and additional regulatory mechanisms used by these proteins are considered in Chapter 8.

In the lungs the pO2 in the air spaces is about 13 kPa; at this pressure hemoglobin is about 96% saturated with oxygen. However, in the cells of a working muscle the pO2 is only about 1.5 kPa because muscle cells use oxygen at a high rate and thus lower its local concentration. As the blood passes through the muscle capillaries, oxygen is released from the nearly saturated hemoglobin in the erythrocytes into the blood plasma and thence into the muscle cells. As is evident from the O2-saturation curve in Figure 7–28, hemoglobin releases about a third of its bound oxygen as it passes through the muscle capillaries, so that when it leaves the muscle, it is only about 64% saturated. When the blood returns to the lungs, where the pO2 is much higher (13 kPa), the hemoglobin quickly binds more oxygen until it is 96% saturated again.

Now suppose that the hemoglobin in the erythrocyte were replaced by myoglobin. We see from the hyperbolic O2-saturation curve of myoglobin (Fig. 7–28) that only 1 or 2% of the bound oxygen can be released from myoglobin as the pO2 decreases from 13 kPa in the lungs to 3 kPa in the muscle. Myoglobin therefore is not very well adapted for carrying oxygen from the lungs to the tissues, because it has a much higher affinity for oxygen and releases very little of it at the pO2 in muscles and other peripheral tissues. However, in its true biological function within muscle cells, which is to store oxygen and make it available to the mitochondria, myoglobin is in fact much better suited than hemoglobin, because its very high affinity for oxygen at low pO2 enables it to bind and store oxygen effectively. Thus hemoglobin and myoglobin are specialized and adapted for different kinds of O2-binding functions.

The relatively large size of proteins reflects their functions. The function of an enzyme, for example, requires a protein large enough to form a specifically structured pocket to bind its substrate. The size of proteins has limits, however, imposed by the genetic coding capacity of nucleic acids and the accuracy of the protein biosynthetic process. The use of many copies of one or a few proteins to make a large enclosing structure is important for viruses because this strategy conserves genetic material. Remember that there is a linear correspondence between the sequence of a gene in nucleic acid and the amino acid sequence of the protein for which it codes (see Fig. 6–14). The nucleic acids of viruses are much too small to encode the information required for a protein shell made of a single polypeptide. By using many copies of much smaller proteins for the virus coat, a much shorter nucleic acid is needed for the protein subunits, and this nucleic acid can be efficiently used over and over again. Cells also use large protein complexes in muscle, cilia, the cytoskeleton, and other structures. It is simply more efficient to make many copies of a small protein than one copy of a very large one. The second factor limiting the size of proteins is the error frequency during protein biosynthesis. This error frequency is low but can become significant for very large proteins. Simply put, the potential for incorporating a "wrong" amino acid in a protein is greater for a large protein than a small one.

Figure 7–30  Myosin and actin, the two filamentous proteins of contractile systems. (a) The myosin molecule has a long tail consisting of two supercoiled α-helical polypeptide chains (heavy chains). The head of each heavy chain is associated with two light chains and is an enzyme capable of hydrolyzing ATP. (b) A representation of an F-actin fiber, which consists of two chains of G-actin subunits coiled about each other to form a filament.

The same principles that govern the stability of secondary, tertiary, and quaternary structure in proteins guide the formation of very large protein complexes. These function, for example, as biological engines (muscle and cilia), large structural enclosures (virus coats), cellular skeletons (actin and tubulin filaments), DNA-packaging complexes (chromatin), and machines for protein synthesis (ribosomes). In many cases the complex consists of a small number of distinct proteins, specialized so that they spontaneously polymerize to form large structures.

Muscle provides an example of a supramolecular complex of multiple copies of a limited number of proteins. The contractile force of muscle is generated by the interaction of two proteins, actin and myosin (Chapter 2). Myosin is a long, rodlike molecule (Mr 540,000) consisting of six polypeptide chains, two so-called heavy chains (Mr ~230,000) and four light chains (Mr ~20,000) (Fig. 7–30a). The two heavy chains have long α-helical tails that twist around each other in a left-handed fashion. The large head domain, at one end of each heavy chain, interacts with actin and contains a catalytic site for ATP hydrolysis. Many myosin molecules assemble together to form the thick filaments of skeletal muscle (Fig. 7–31).

The other protein, actin, is a polymer of the globular protein G-actin (Mr 42,000); two such polymers coil around each other in a right-handed helix to form a thin filament (Fig. 7–30b). The interaction between actin and myosin is dynamic; contacts consist of multiple weak interactions that are strong enough to provide a stable association but weak enough to allow dissociation when needed. Hydrolysis of ATP in the myosin head is coupled to a series of conformational changes that bring about muscle contraction (Fig. 7–32). A similar engine involving an interaction between tubulin and dynein brings about the motion of cilia (Chapter 2).

The protein structures in virus coats (called capsids) generally function simply as enclosures. In many cases capsids are made up of one or a few proteins that assemble spontaneously around a viral DNA or RNA molecule. Two types of viral structures are shown in Figure 7–33. The tobacco mosaic virus is a right-handed helical filament with 2,130 copies of a single protein that interact to form a cylinder enclosing the RNA genome. Another common structure for virus coats is the icosahedron, a regular 12-cornered polyhedron having 20 equilateral triangular faces. Two examples are poliovirus and human rhinovirus 14 (a common cold virus), each made up of 60 protein units (Fig. 7–33). Each protein unit consists of single copies of four different polypeptide chains, three of which are accessible at the outer surface. The resulting shell encloses the genetic material (RNA) of the virus.

The primary forces guiding the assembly of even these very large structures are the weak noncovalent interactions that have dominated this discussion. Each protein has several surfaces that are complementary to surfaces in adjacent protein subunits. Each protein is most stable only when it is part of the larger structure.

www.bioinfo.org.cn/book/biochemistry/chapt07/bio3.htm
Summary

Every protein has a unique three-dimensional structure that reflects its function, a structure stabilized by multiple weak interactions. Hydrophobic interactions provide the major contribution to stabilizing the globular form of most soluble proteins; hydrogen bonds and ionic interactions are optimized in the specific structure that is thermodynamically most stable.

There are four generally recognized levels of protein structure. Primary structure refers to the amino acid sequence and the location of disulfide bonds. Secondary structure refers to the spatial relationship of adjacent amino acids. Tertiary structure is the three-dimensional conformation of an entire polypeptide chain. Quaternary structure involves the spatial relationship of multiple polypeptide chains (e.g., enzyme subunits) that are tightly associated.

The nature of the bonds in the polypeptide chain places constraints on structure. The peptide bond is characterized by a partial double-bond character that keeps the entire amide group in a rigid planar configuration. The N–Cα and Cα–C bonds can rotate with bond angles Φ and ψ, respectively. Secondary structure can be defined completely by these two bond angles.

There are two general classes of proteins: fibrous and globular. Fibrous proteins, which serve mainly structural roles, have simple repeating structures and provided excellent models for the early studies of protein structure. Two major types of secondary structure were predicted by model building based on information obtained from fibrous proteins: the α helix and the β conformation. Both are characterized by optimal hydrogen bonding between amide nitrogens and carbonyl oxygens in the peptide backbone. The stability of these structures within a protein is influenced by their amino acid content and by the relative placement of amino acids in the sequence. Another nonrepeating type of secondary structure common in proteins is the β bend.

In fibrous proteins such as keratin and collagen, a single type of secondary structure predominates. The polypeptide chains are supertwisted into ropes and then combined in larger bundles to provide strength. The structure of elastin permits stretching.

Globular proteins have more complicated tertiary structures, often containing several types of secondary structure in the same polypeptide chain. The first globular protein structure to be determined, using x-ray diffraction methods, was that of myoglobin. This structure confirmed that a predicted secondary structure (α helix) occurs in proteins; that hydrophobic amino acids are located in the protein interior; and that globular proteins are compact. Subsequent research on protein structure has reinforced these conclusions while demonstrating that different proteins often differ in tertiary structure.

The three-dimensional structure of proteins can be destroyed by treatments that disrupt weak interactions, a process called denaturation. Denaturation destroys protein function, demonstrating a relationship between structure and function. Some denatured proteins (e.g., ribonuclease) can renature spontaneously to give active protein, showing that the tertiary structure of a protein is determined by its amino acid sequence.

The folding of globular proteins is believed to begin with local formation of regions of secondary structure, followed by interactions of these regions and adjustments to reach the final tertiary structure. Sometimes regions of a polypeptide chain, called domains, fold up separately and can have separate functions. The final structure and the steps taken to reach it are influenced by the need to bury hydrophobic amino acid side chains in the protein interior away from water, the tendency of a polypeptide chain to twist in a right-handed sense, and the need to maximize hydrogen bonds and ionic interactions. These constraints give rise to structural patterns such as the βαβ fold and twisted β pleated sheets. Even at the level of tertiary structure, some common patterns are found in proteins that have no known functional relationship.

Quaternary structure refers to the interaction between the subunits of oligomeric proteins or large protein assemblies. The best-studied oligomeric protein is hemoglobin. The four subunits of hemoglobin exhibit cooperative interactions on oxygen binding. Binding of oxygen to one subunit facilitates oxygen binding to the next, giving rise to a sigmoid binding curve. These effects are mediated by subunit–subunit interactions and subunit conformational changes. Very large protein structures consisting of many copies of one or a few different proteins are referred to as supramolecular complexes. These are found in cellular skeletal structures, muscle and other types of cellular "engines," and virus coats.

Further Reading

General

Anfinsen, C.B. (1973) Principles that govern the folding of protein chains. Science 181, 223–230. 
The author reviews his classic work on ribonuclease.

Cantor, C.R. & Schimmel, P.R. (1980) Biophysical Chemistry, Part I: The Conformation of Biological Macromolecules, W.H. Freeman and Company, New York. 

Evolution of Catalytic Function. (1987) Cold Spring Harb. Symp. Quant. Biol. 52. 
A source of excellent articles on many topics, including protein structure, folding, and function.

Creighton, T.E. (1984) Proteins: Structures and Molecular Properties, W.H. Freeman and Company, New York. 

Oxender, D.L. (ed) (1987) Protein Structure, Folding, and Design 2, UCLA Symposia on Molecular and Cellular Biology, New Series, Vol. 69, Alan R. Liss, Inc., New York. 
Summary papers from a major symposium on the title subject.

Structure and Function of Proteins. (1989) Trends Biochem. Sci. 14 (July). 
A special issue devoted to reviews on protein chemistry and protein structure. Includes good summaries of protein folding, protein structure prediction, and many other topics.

Secondary, Tertiary, and Quaternary Structure

Dickerson, R.E. & Geis, I. (1982) Hemoglobin: Structure, Function, Evolution, and Pathology, The Benjamin/Cummings Publishing Company, Menlo Park, CA. 

Ingram, V.M. (1957) Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature 180, 326–328. 
Discovery of the amino acid replacement in sickle-cell hemoglobin (hemoglobin S).

Kendrew, J.C. (1961) The three-dimensional structure of a protein molecule. Sci. Am. 205 (December), 96–111. 
Describes how the structure of myoglobin was determined and what was learned from it.

Kim, P.S. & Baldwin, R.L. (1990) Intermediates in the folding reactions of small proteins. Annu. Rev. Biochem. 59, 631–660. 

Koshland, D.E., Jr. (1973) Protein shape and biological control. Sci. Am. 229 (October), 52–64. 
A discussion of the importance of flexibility in protein structures.

McPherson, A. (1989) Macromolecular crystals, Sci. Am., 260 (March), 62–69. 
Describes how macromolecules such as proteins are crystallized.

Pace, C.N. (1990) Conformational stability of globular proteins. Trends Biochem. Sci. 15, 14–17. 

Perutz, M.F. (1978) Hemoglobin structure and respiratory transport. Sci. Am. 239 (December), 92–125. 

Richards, F.M. (1991) The protein folding problem. Sci. Am. 264 (January), 54–63. 

Richardson, J.S. (1981) The anatomy and taxonomy of protein structure. Adv. Prot. Chem. 34, 167–339. 
An outstanding summary of protein structural patterns and principles; the author originated the very useful "ribbon" representations of protein structure that are used in many places in this chapter.

Rothman, J.E. (1989) Polypeptide chain binding proteins: catalysts of protein folding and related processes in cells. Cell 59, 591–601. 

Shortle, D. (1989) Probing the determinants of protein folding and stability with amino acid substitutions. J. Biol. Chem. 264, 5315–5318. 

Problems

1. Properties of the Peptide Bond  In x-ray studies of crystalline peptides Linus Pauling and Robert Corey found that the C–N bond in the peptide link is intermediate in length (0.132 nm) between a typical C–N single bond (0.149 nm) and a C=N double bond (0.127 nm). They also found that the peptide bond is planar (all four atoms attached to the C–N group are located in the same plane) and that the two α-carbon atoms attached to the C–N are always trans to each other (on opposite sides of the peptide bond):

       (a) What does the length of the C–N bond in the peptide linkage indicate about its strength and its bond order, i.e., whether it is single, double, or triple?

       (b) In light of your answer to part (a), provide an explanation for the observation that such a C–N bond is intermediate in length between a double and single bond.
       (c) What do the observations of Pauling and Corey tell us about the ease of rotation about the C–N peptide bond?

2. Early Observations on the Structure of Wool  William Astbury discovered that the x-ray pattern of wool shows a repeating structural unit spaced about 0.54 nm along the direction of the wool fiber. When he steamed and stretched the wool, the x-ray pattern showed a new repeating structural unit at a spacing of 0.70 nm. Steaming and stretching the wool and then letting it shrink gave an x-ray pattern consistent with the original spacing of about 0.54 nm. Although these observations provided important clues to the molecular structure of wool, Astbury was unable to interpret them at the time. Given our current understanding of the structure of wool, interpret Astbury’s observations.

3. Rate of Synthesis of Hair α-Keratin  In human dimensions, the growth of hair is a relatively slow process, occurring at a rate of 15 to 20 cm/yr. All this growth is concentrated at the base of the hair fiber, where α-keratin filaments are synthesized inside living epidermal cells and assembled into ropelike structures (see Fig. 7–13). The fundamental structural element of α-keratin is the α helix, which has 3.6 amino acid residues per turn and a rise of 0.56 nm per turn (see Fig. 7–6). Assuming that the biosynthesis of α-helical keratin chains is the rate-limiting factor in the growth of hair, calculate the rate at which peptide bonds of α-keratin chains must be synthesized (peptide bonds per second) to account for the observed yearly growth of hair.

4. The Effect of pH on the Conformations of Polyglutamate and Polylysine  The unfolding of the α helix of a polypeptide to a randomly coiled conformation is accompanied by a large decrease in a property called its specific rotation, a measure of a solution’s capacity to rotate plane-polarized light. Polyglutamate, a polypeptide made up of only L-Glu residues, has the α-helical conformation at pH 3. However, when the pH is raised to 7, there is a large decrease in the specific rotation of the solution. Similarly, polylysine (L-Lys residues) is an α helix at pH 10, but when the pH is lowered to 7, the specific rotation also decreases, as shown by the following graph.

What is the explanation for the effect of the pH changes on the conformations of poly(Glu) and poly(Lys)? Why does the transition occur over such a narrow range of pH?

5. The Disulfide-Bond Content Determines the Mechanical Properties of Many Proteins  A number of natural proteins are very rich in disulfide bonds, and their mechanical properties (tensile strength, viscosity, hardness, etc.) are correlated with the degree of disulfide bonding. For example, glutenin, a wheat protein rich in disulfide bonds, is responsible for the cohesive and elastic character of dough made from wheat flour. Similarly, the hard, tough nature of tortoise shell is due to the extensive disulfide bonding in its α-keratin. What is the molecular basis for the correlation between disulfide-bond content and mechanical properties of the protein?

6. Why Does Wool Shrink?  When wool sweaters or socks are washed in hot water and/or dried in an electric dryer, they shrink. From what you know of α-keratin structure, how can you account for this? Silk, on the other hand, does not shrink under the same conditions. Explain.

7. Heat Stability of Proteins Containing Disulfide Bonds  Most globular proteins are denatured and lose their activity when briefly heated to 65 °C. Globular proteins that contain multiple disulfide bonds often must be heated longer at higher temperatures to denature them. One such protein is bovine pancreatic trypsin inhibitor (BPTI), which has 58 amino acid residues in a single chain and contains three disulfide bonds. On cooling a solution of denatured BPTI, the activity of the protein is restored. Can you suggest a molecular basis for this property?

8. Bacteriorhodopsin in Purple Membrane Proteins  Under the proper environmental conditions, the salt-loving bacterium Halobacterium halobium synthesizes a membrane protein (Mr 26,000) known as bacteriorhodopsin, which is purple because it contains retinal. Molecules of this protein aggregate into “purple patches” in the cell membrane. Bacteriorhodopsin acts as a light-activated proton pump that provides energy for cell functions. X-ray analysis of this protein reveals that it consists of seven parallel α-helical segments, each of which traverses the bacterial cell membrane (thickness 4.5 nm). Calculate the minimum number of amino acids necessary for one segment of α helix to traverse the membrane completely. Estimate the fraction of the bacteriorhodopsin protein that occurs in α-helical form. Justify all your assumptions. (Use an average amino acid residue weight of 110.)

9. Biosynthesis of Collagen  Collagen, the most abundant protein in mammals, has an unusual amino acid composition. Unlike most other proteins, collagen is very rich in proline and hydroxyproline (see p. 172). Hydroxyproline is not one of the 20 standard amino acids, and its incorporation in collagen could occur by one of two routes: (1) proline is hydroxylated by enzymes before incorporation into collagen or (2) a Pro residue is hydroxylated after incorporation into collagen. To differentiate between these two possibilities, the following experiments were performed. When [14C]proline was administered to a rat and the collagen from the tail isolated, the newly synthesized tail collagen was found to be radioactive. If however, [14C]hydroxyproline was administered to a rat, no radioactivity was observed in the newly synthesized collagen. How do these experiments differentiate between the two possible mechanisms for introducing hydroxyproline into collagen?

10. Pathogenic Action of Bacteria That Cause Gas Gangrene  The highly pathogenic anaerobic bacterium Clostridium perfringens is responsible for gas gangrene, a condition in which animal tissue structure is destroyed. This bacterium secretes an enzyme that efficiently catalyzes the hydrolysis of the peptide bond indicated in red in the sequence:

H2O
–XGly–Pro–Y–       –X–COO  +  H3N+–Gly–Pro–Y

where X and Y are any of the 20 standard amino acids. How does the secretion of this enzyme contribute to the invasiveness of this bacterium in human tissues? Why does this enzyme not affect the bacterium itself?

11. Formation of Bends and Intrachain Cross-Linkages in Polypeptide Chains  In the following polypeptide, where might bends or turns occur? Where might intrachain disulfide cross-linkages be formed?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ile– Ala– His– Thr– Tyr– Gly– Pro– Phe– Glu– Ala– Ala– Met– Cys– Lys– Trp– Glu– Ala– Gln– Pro– Asp–

21 22 23 24 25 26 27 28
Gly– Met– Glu– Cys– Ala– Phe– His– Arg–

12. Location of Specific Amino Acids in Globular Proteins  X-ray analysis of the tertiary structure of myoglobin and other small, single-chain globular proteins has led to some generalizations about how the polypeptide chains of soluble proteins fold. With these generalizations in mind, indicate the probable location, whether in the interior or on the external surface, of the following amino acid residues in native globular proteins: Asp, Leu, Ser, Val, Gln, Lys. Explain your reasoning.

13. The Number of Polypeptide Chains in an Oligomeric Protein  A sample (660 mg) of an oligomeric protein of Mr 132,000 was treated with an excess of 1-fluoro-2,4-dinitrobenzene under slightly alkaline conditions until the chemical reaction was complete. The peptide bonds of the protein were then completely hydrolyzed by heating it with concentrated HCl. The hydrolysate was found to contain 5.5 mg of the following compound:

However, 2,4-dinitrophenyl derivatives of the α-amino groups of other amino acids could not be found.

       (a) Explain why this information can be used to determine the number of polypeptide chains in an oligomeric protein.
       (b) Calculate the number of polypeptide chains in this protein.

14. Molecular Weight of Hemoglobin  The first indication that proteins have molecular weights greatly exceeding those of the (then known) organic compounds was obtained over 100 years ago. For example, it was known at that time that hemoglobin contains 0.34% by weight of iron.

       (a) From this information determine the minimum molecular weight of hemoglobin.
       (b) Subsequent experiments indicated that the true molecular weight of hemoglobin is 64,500. What information did this provide about the number of iron atoms in hemoglobin?

15. Comparison of Fetal and Maternal Hemoglobin  Studies of oxygen transport in pregnant mammals have shown that the O2-saturation curves of fetal and maternal blood are markedly different when measured under the same conditions. Fetal erythrocytes contain a structural variant of hemoglobin, hemoglobin F, consisting of two α and two γ subunits (α2γ2), whereas maternal erythrocytes contain the usual hemoglobin A (α2β2).

       (a) Which hemoglobin has a higher affinity for oxygen under physiological conditions, hemoglobin A or hemoglobin F? Explain.

       (b) What is the physiological significance of the different oxygen affinities? Explain.
www.bioinfo.org.cn/book/biochemistry/chapt07/bio4.htm
Chapter 8
Enzymes

We now come to the most remarkable and highly specialized proteins, the enzymes. Enzymes are the reaction catalysts of biological systems. They have extraordinary catalytic power, often far greater than that of synthetic catalysts. They have a high degree of specificity for their substrates, they accelerate specific chemical reactions, and they function in aqueous solutions under very mild conditions of temperature and pH. Few nonbiological catalysts show all these properties.

Enzymes are one of the keys to understanding how cells survive and proliferate. Acting in organized sequences, they catalyze the hundreds of stepwise reactions in metabolic pathways by which nutrient molecules are degraded, chemical energy is conserved and transformed, and biological macromolecules are made from simple precursors. Some of the many enzymes participating in metabolism are regulatory enzymes, which can respond to various metabolic signals by changing their catalytic activity accordingly. Through the action of regulatory enzymes, enzyme systems are highly coordinated to yield a harmonious interplay among the many different metabolic activities necessary to sustain life.

The study of enzymes also has immense practical importance. In some diseases, especially inheritable genetic disorders, there may be a deficiency or even a total absence of one or more enzymes in the tissues (see Table 6–6). Abnormal conditions can also be caused by the excessive activity of a specific enzyme. Measurements of the activity of certain enzymes in the blood plasma, erythrocytes, or tissue samples are important in diagnosing disease. Enzymes have become important practical tools, not only in medicine but also in the chemical industry, in food processing, and in agriculture. Enzymes play a part even in everyday activities in the home such as food preparation and cleaning.

The chapter begins with descriptions of the properties of enzymes and the principles underlying their catalytic power. Following is an introduction to enzyme kinetics, a discipline that provides much of the framework for any discussion of enzymes. Specific examples of enzyme mechanisms are then provided, illustrating principles introduced earlier in the chapter. We will end with a discussion of regulatory enzymes.

Figure 8–1  Crystals of pyruvate kinase, an enzyme of the glycolytic pathway. The protein in a crystal is generally characterized by a high degree of purity and structural homogeneity.

Much of the history of biochemistry is the history of enzyme research. Biological catalysis was first recognized and described in the early 1800s, in studies of the digestion of meat by secretions of the stomach

and the conversion of starch into sugar by saliva and various plant extracts. In the 1850s Louis Pasteur concluded that fermentation of sugar into alcohol by yeast is catalyzed by "ferments." He postulated that these ferments, later named enzymes, are inseparable from the structure of living yeast cells, a view that prevailed for many years. The discovery by Eduard Buchner in 1897 that yeast extracts can ferment sugar to alcohol proved that the enzymes involved in fermentation can function when removed from the structure of living cells. This encouraged biochemists to attempt the isolation of many different enzymes and to examine their catalytic properties.

James Sumner’s isolation and crystallization of urease in 1926 provided a breakthrough in early studies of the properties of specific enzymes. Sumner found that the urease crystals consisted entirely of protein and postulated that all enzymes are proteins. Lacking other examples, this idea remained controversial for some time. Only later in the 1930s, after John Northrop and his colleagues crystallized pepsin and trypsin and found them also to be proteins, was Sumner’s conclusion widely accepted. During this period, J.B.S. Haldane wrote a treatise entitled "Enzymes." Even though the molecular nature of enzymes was not yet fully appreciated, this book contained the remarkable suggestion that weak-bonding interactions between an enzyme and its substrate might be used to distort the substrate and catalyze the reaction. This insight lies at the heart of our current understanding of enzymatic catalysis. The latter part of the twentieth century has seen intensive research on the enzymes catalyzing the reactions of cell metabolism. This has led to the purification of thousands of enzymes (Fig. 8–1), elucidation of the structure and chemical mechanism of hundreds of these, and a general understanding of how enzymes work.

With the exception of a small group of catalytic RNA molecules (Chapter 25), all enzymes are proteins. Their catalytic activity depends upon the integrity of their native protein conformation. If an enzyme is denatured or dissociated into subunits, catalytic activity is usually lost. If an enzyme is broken down into its component amino acids, its catalytic activity is always destroyed. Thus the primary, secondary, tertiary, and quaternary structures of protein enzymes are essential to their catalytic activity.

Enzymes, like other proteins, have molecular weights ranging from about 12,000 to over 1 million. Some enzymes require no chemical groups other than their amino acid residues for activity. Others require an additional chemical component called a cofactor. The cofactor may be either one or more inorganic ions, such as Fe2+, Mg2+, Mn2+, or Zn2+ (Table 8–1), or a complex organic or metalloorganic molecule called a coenzyme (Table 8–2). Some enzymes require both a coenzyme and one or more metal ions for activity. A coenzyme or metal ion that is covalently bound to the enzyme protein is called a prosthetic group. A complete, catalytically active enzyme together with its coenzyme and/or metal ions is called a holoenzyme. The protein part of such an enzyme is called the apoenzyme or apoprotein. Coenzymes function as transient carriers of specific functional groups (Table 8–2). Many vitamins, organic nutrients required in small amounts in the diet, are precursors of coenzymes. Coenzymes will be considered in more detail as they are encountered in the discussion of metabolic pathways in Part III of this book.

Finally, some enzymes are modified by phosphorylation, glycosylation, and other processes. Many of these alterations are involved in the regulation of enzyme activity.

Many enzymes have been named by adding the suffix "-ase" to the name of their substrate or to a word or phrase describing their activity. Thus urease catalyzes hydrolysis of urea, and DNA polymerase catalyzes the synthesis of DNA. Other enzymes, such as pepsin and trypsin, have names that do not denote their substrates. Sometimes the same enzyme has two or more names, or two different enzymes have the same name. Because of such ambiguities, and the ever-increasing number of newly discovered enzymes, a system for naming and classifying enzymes has been adopted by international agreement. This system places all enzymes in six major classes, each with subclasses, based on the type of reaction catalyzed (Table 8–3). Each enzyme is assigned a four-digit classification number and a systematic name, which identifies the reaction catalyzed. As an example, the formal systematic name of the enzyme catalyzing the reaction

ATP + D-glucose   →   ADP + D-glucose-6-phosphate

is ATP:glucose phosphotransferase, which indicates that it catalyzes the transfer of a phosphate group from ATP to glucose. Its enzyme classification number (E.C. number) is 2.7.1.1; the first digit (2) denotes the class name (transferase) (see Table 8–3); the second digit (7), the

subclass (phosphotransferase); the third digit (1), phosphotransferases with a hydroxyl group as acceptor; and the fourth digit (1), D-glucose as the phosphate-group acceptor. When the systematic name of an enzyme is long or cumbersome, a trivial name may be used – in this case hexokinase.

A complete list and description of the thousands of known enzymes would be well beyond the scope of this book. This chapter is instead devoted primarily to principles and properties common to all enzymes.

Figure 8–2  Binding of a substrate to an enzyme at the active site. The enzyme chymotrypsin is shown, bound to a substrate (in blue). Some key active-site amino acids are shown in red.

The enzymatic catalysis of reactions is essential to living systems. Under biologically relevant conditions, uncatalyzed reactions tend to be slow. Most biological molecules are quite stable in the neutral-pH, mild-temperature, aqueous environment found inside cells. Many common reactions in biochemistry involve chemical events that are unfavorable or unlikely in the cellular environment, such as the transient formation of unstable charged intermediates or the collision of two or more molecules in the precise orientation required for reaction. Reactions required to digest food, send nerve signals, or contract muscle simply do not occur at a useful rate without catalysis.

An enzyme circumvents these problems by providing a specific environment within which a given reaction is energetically more favorable. The distinguishing feature of an enzyme-catalyzed reaction is that it occurs within the confines of a pocket on the enzyme called the active site (Fig. 8–2). The molecule that is bound by the active site and acted upon by the enzyme is called the substrate. The enzyme–substrate complex is central to the action of enzymes, and it is the starting point for mathematical treatments defining the kinetic behavior of enzyme-catalyzed reactions and for theoretical descriptions of enzyme mechanisms.

Figure 8–3  Reaction coordinate diagram for a chemical reaction. The free energy of the system is plotted against the progress of the reaction. A diagram of this kind is a description of the energetic course of the reaction, and the horizontal axis (reaction coordinate) reflects the progressive chemical changes (e.g., bond breakage or formation) as S is converted to P. The S and P symbols mark the free energies of the substrate and product ground states. The transition state is indicated by the symbol ‡. The activation energies, ΔG, for the S → P and P → S reactions are indicated. ΔG°’ is the overall standard free-energy change in going from S to P.
Figure 8–4  Reaction coordinate diagram comparing the enzyme-catalyzed and uncatalyzed reactions S → P. The ES and EP intermediates occupy minima in the energetic progress curve of the enzyme-catalyzed reaction. The terms ΔGuncat and ΔGcat correspond to the activation energies for the uncatalyzed and catalyzed reactions, respectively. The activation energy for the overall process is lower when the enzyme catalyzes the reaction.

A tour through an enzyme-catalyzed reaction serves to introduce some important concepts and definitions.

A simple enzymatic reaction might be written

E  +  S   ⇌   ES   ⇌   EP   ⇌   E  +  P
(8–1)

where E, S, and P represent the enzyme, substrate, and product, respectively. ES and EP are complexes of the enzyme with the substrate and with the product, respectively.

To understand catalysis, we must first appreciate the important distinction between reaction equilibria (discussed in Chapter 4) and reaction rates. The function of a catalyst is to increase the rate of a reaction. Catalysts do not affect reaction equilibria. Any reaction, such as S ⇌ P, can be described by a reaction coordinate diagram (Fig. 8–3). This is a picture of the energetic course of the reaction. As introduced in Chapters 1 and 3, energy in biological systems is described in terms of free energy, G. In the coordinate diagram, the free energy of the system is plotted against the progress of the reaction (reaction coordinate). In its normal stable form or ground state, any molecule (such as S or P) contains a characteristic amount of free energy. To describe the free-energy changes for reactions, chemists define a standard set of conditions (temperature 298 K; partial pressure of gases each 1 atm or 101.3 kPa; concentration of solutes each 1 M), and express the free-energy change for this reacting system as ΔG°, the standard free-energy change. Because biochemical systems commonly involve H+ concentrations far from 1 M, biochemists define a constant ΔG°’, the standard free-energy change at pH 7.0, which we will employ throughout the book. A more complete definition of ΔG°’ is given in Chapter 13.

The equilibrium between S and P reflects the difference in the free energy of their ground states. In the example shown in Figure 8–3, the free energy of the ground state of P is lower than that of S, so ΔG°’ for the reaction is negative and the equilibrium favors P. This equilibrium is not affected by any catalyst.

A favorable equilibrium, however, does not mean that the S → P conversion is fast. The rate of a reaction is dependent on an entirely different parameter. There is an energetic barrier between S and P that represents the energy required for alignment of reacting groups, formation of transient unstable charges, bond rearrangements, and other transformations required for the reaction to occur in either direction. This is illustrated by the energetic "hill" in Figures 8–3 and 8–4. To undergo reaction, the molecules must overcome this barrier and therefore must be raised to a higher energy level. At the top of the energy hill is a point at which decay to the S or P state is equally probable (it is downhill either way). This is called the transition state. The transition state is not a chemical species with any significant stability and should not be confused with a reaction intermediate. It is simply a fleeting molecular moment in which events such as bond breakage, bond formation, and charge development have proceeded to the precise point at which a collapse to either substrate or product is equally likely. The difference between the energy levels of the ground state and the transition state is called the activation energyG). The rate of a reaction reflects this activation energy; a higher activation energy corresponds to a slower reaction. Reaction rates can be increased by raising the temperature, thereby inereasing the number of molecules with sufficient energy to overcome this energy barrier. Alternatively

the activation energy can be lowered by adding a catalyst (Fig. 8–4). Catalysts enhance reaction rates by lowering activation energies.

Enzymes are no exception to the rule that catalysts do not affect reaction equilibria. The bidirectional arrows in Equation 8–1 make this point: any enzyme that catalyzes the reaction S → P also catalyzes the reaction P → S. Its only role is to accelerate the interconversion of S and P. The enzyme is not used up in the process, and the equilibrium point is unaffected. However, the reaction reaches equilibrium much faster when the appropriate enzyme is present because the rate of the reaction is increased.

This general principle can be illustrated by considering the reaction of glucose and O2 to form CO2 and H2O. This reaction has a very large and negative ΔG°’, and at equilibrium the amount of glucose present is negligible. Glucose, however, is a stable compound, and it can be combined in a container with O2 almost indefinitely without reacting. Its stability reflects a high activation energy for reaction. In cells, glucose is broken down in the presence of O2 to CO2 and H2O in a pathway of reactions catalyzed by enzymes. These enzymes not only accelerate the reactions, they organize and control them so that much of the energy released in this process is recovered in other forms and made available to the cell for other tasks. This is the primary energy-yielding pathway for cells (Chapters 14 and 18), and these enzymes allow it to occur on a time scale that is useful to the cells.

In practice, any reaction may have several steps involving the formation and decay of transient chemical species called reaction intermediates. When the S ⇌ P reaction is catalyzed by an enzyme, the ES and EP complexes are intermediates (Eqn 8–1); they occupy valleys in the reaction coordinate diagram (Fig. 8–4). When several steps occur in a reaction, the overall rate is determined by the step (or steps) with the highest activation energy; this is called the rate-limiting step. In a simple case the rate-limiting step is the highest-energy point in the diagram for interconversion of S and P (Fig. 8–4). In practice, the rate-limiting step can vary with reaction conditions, and for many enzymes several steps may have similar activation energies, which means they are all partially rate-limiting.

As described in Chapter 1, activation energies are energetic barriers to chemical reactions; these barriers are crucial to life itself. The stability of a molecule increases with the height of its activation barrier. Without such energetic barriers, complex macromolecules would revert spontaneously to much simpler molecular forms. The complex and highly ordered structures and metabolic processes in every cell could not exist. Enzymes have evolved to lower activation energies selectively for reactions that are needed for cell survival.

Reaction equilibria are inextricably linked to ΔG°’ and reaction rates are linked to ΔG. A basic introduction to these thermodynamic relationships is the next step in understanding how enzymes work.

As introduced in Chapter 4, an equilibrium such as S ⇌ P is described by an equilibrium constant, Keq. Under the standard conditions used to compare biochemical processes, an equilibrium constant is denoted Keq’:

[P]
Keq’  =  
[S]
(8–2)
From thermodynamics, the relationship between Keq’ and ΔG°’ can be described by the expression

ΔG°’  =  –RT ln Keq
(8–3)

where R is the gas constant (8.315 J/mol ∙ K) and T is the absolute temperature (298 K). This expression will be developed and discussed in more detail in Chapter 13. The important point here is that the equilibrium constant is a direct reflection of the overall standard free-energy change in the reaction (Table 8–4). A large negative value for ΔG°’ reflects a favorable reaction equilibrium, but as already noted this does not mean the reaction will proceed at a rapid rate.

The rate of any reaction is determined by the concentration of the reactant (or reactants) and by a rate constant, usually denoted by the symbol k. For the unimolecular reaction S → P, the rate or velocity of the reaction, V, representing the amount of S that has reacted per unit time, is expressed by a rate law:

V  =  k[S]
(8–4)

In this reaction, the rate depends only on the concentration of S. This is called a first-order reaction. The factor k is a proportionality constant that reflects the probability of reaction under a given set of conditions (pH, temperature, etc.). Here, k is a first-order rate constant and has units of reciprocal time (e.g., s–1). If a first-order reaction has a rate constant k of 0.03 s–1, this may be interpreted (qualitatively) to mean that 3% of the available S will be converted to P in 1 s. A reaction with a rate constant of 2,000 s–1 will be over in a small fraction of a second. If the reaction rate depends on the concentration of two different compounds, or if two molecules of the same compound react, the reaction is second order and k is a second-order rate constant (with the units M–1s–1). The rate law has the form

V  =  k[S1][S2]
(8–5)

From transition-state theory, an expression can be derived that relates the magnitude of a rate constant to the activation energy:

k T
k  =  
 e–ΔG/RT
h
(8–6)

where k is the Boltzmann constant and h is Planck’s constant. The important point here is that the relationship between the rate constant, k, and the activation energy, ΔG, is inverse and exponential. In simplified terms, this is the basis for the statement that a lower activation energy means a higher reaction rate, and vice versa.

Now we turn from what enzymes do to how they do it.

Enzymes are extraordinary catalysts. The rate enhancements brought about by enzymes are often in the range of 7 to 14 orders of magnitude (Table 8–5). Enzymes are also very specific, readily discriminating between substrates with quite similar structures. How can these enormous and highly selective rate enhancements be explained? Where does the energy come from to provide a dramatic lowering of the activation energies for specific reactions?

Part of the explanation for enzyme action lies in well-studied chemical reactions that take place between a substrate and enzyme functional groups (specific amino acid side chains, metal ions, and coenzymes). Catalytic functional groups on enzymes can interact transiently with a substrate and activate it for reaction. In many cases, these groups lower the activation energy (and thereby accelerate the reaction) by providing a lower-energy reaction path. Common types of enzymatic catalysis are outlined later in this chapter.

Catalytic functional groups, however, are not the only contributor to enzymatic catalysis. The energy required to lower activation energies is generally derived from weak, noncovalent interactions between the substrate and the enzyme. The factor that really sets enzymes apart from most nonenzymatic catalysts is the formation of a specific ES complex. The interaction between substrate and enzyme in this complex is mediated by the same forces that stabilize protein structure, including hydrogen bonds and hydrophobic, ionic, and van der Waals interactions (Chapter 7). Formation of each weak interaction in the ES complex is accompanied by a small release of free energy that provides a degree of stability to the interaction. The energy derived from enzyme–substrate interaction is called binding energy. Its significance extends beyond a simple stabilization of the enzyme–substrate interaction. Binding energy is the major source of free energy used by enzymes to lower the activation energies of reactions.

Two fundamental and interrelated principles provide a general explanation for how enzymes work. First, the catalytic power of enzymes is ultimately derived from the free energy released in forming the multiple weak bonds and interactions that occur between an enzyme and its substrate. This binding energy provides specificity as well as catalysis. Second, weak interactions are optimized in the reaction transition state; enzyme active sites are complementary not to the substrates per se, but to the transition states of the reactions they catalyze. These themes are critical to an understanding of enzymes, and they now become the primary focus of the chapter.

Figure 8–5  Complementary shapes of a substrate and its binding site on an enzyme. The enzyme dihydrofolate reductase is shown with its substrate, NADP+ (red), unbound (top) and bound (bottom). Part of a tetrahydrofolate molecule (yellow), also bound to the enzyme, is visible. The NADP+ binds to a pocket that is complementary to it in shape and ionic properties. Emil Fischer proposed that enzymes and their substrates have shapes that closely complement each other, like a lock and key. This idea can readily be extended to the interactions of other types of proteins with ligands or other proteins. In reality, the complementarity is rarely perfect, and the interaction of a protein with a ligand often involves changes in the conformation of one or both molecules. This lack of perfect complementarity between an enzyme and its substrate (not evident in this figure) is important to enzymatic catalysis.
Figure 8–6  An imaginary enzyme (stickase) designed to catalyze the breaking of a metal stick.
(a) To break, the stick must first be bent (the transition state). In the stickase, magnetic interactions take the place of weak-bonding interactions between enzyme and substrate. (b) An enzyme with a magnet-lined pocket complementary in structure to the stick (the substrate) will stabilize this substrate. Bending will be impeded by the magnetic attraction between stick and stickase. (c) An enzyme complementary to the reaction transition state will help to destabilize the stick, resulting in catalysis of the reaction. The magnetic interactions provide energy that compensates for the increase in free energy required to bend the stick. Reaction coordinate diagrams show the energetic consequences of complementarity to substrate versus complementarity to transition state. The term ΔGM represents the energy contributed by the magnetic interactions between the stick and stickase. When the enzyme is complementary to the substrate, as in (b), the ES complex is more stable and has less free energy in the ground state than substrate alone. The result is an increase in the activation energy. For simplicity, the EP complexes are not shown.
Figure 8–7  The role of binding energy in catalysis. To lower the activation energy for a reaction, the system must acquire an amount of energy equivalent to the amount by which ΔG is lowered. This energy comes largely from binding energy (ΔGB) contributed by formation of weak noncovalent interactions between substrate and enzyme in the transition state. The role of ΔGB is analogous to that of ΔGM in Fig. 8–6.

How does an enzyme use binding energy to lower the activation energy for reaction? Formation of the ES complex is not the explanation in itself, although some of the earliest considerations of enzyme mechanisms began with this idea. Studies on enzyme specificity carried out by Emil Fischer led him to propose, in 1894, that enzymes were structurally complementary to their substrates, so that they fit together like a "lock and key" (Fig. 8–5).

This elegant idea, that a specific (exclusive) interaction between two biological molecules is mediated by molecular surfaces with complementary shapes, has greatly influenced the development of biochemistry, and lies at the heart of many biochemical processes. However, the "lock and key" hypothesis can be misleading when applied to the question of enzymatic catalysis. An enzyme completely complementary to its substrate would be a very poor enzyme. Consider an imaginary reaction, the breaking of a metal stick. The uncatalyzed reaction is shown in Figure 8–6a. We will examine two imaginary enzymes to catalyze this reaction, both of which employ magnetic forces as a paradigm for the binding energy used by real enzymes. We first

design an enzyme perfectly complementary to the substrate (Fig. 8–6b). The active site of this "stickase" enzyme is a pocket lined with magnets. To react (break), the stick must reach the transition state of the reaction. The stick fits so tightly in the active site that it cannot bend, because bending of the stick would eliminate some of the magnetic interactions between stick and enzyme. Such an enzyme impedes the reaction, stabilizing the substrate instead. In a reaction coordinate diagram (Fig. 8–6b), this kind of ES complex would correspond to an energy well from which it would be difficult for the substrate to escape. Such an enzyme would be useless.

The modern notion of enzymatic catalysis was first proposed by Haldane in 1930, and elaborated by Linus Pauling in 1946. In order to catalyze reactions, an enzyme must be complementary to the reaction transition state. This means that the optimal interactions (through weak bonding) between substrate and enzyme can occur only in the transition state. Figure 8–6c demonstrates how such an enzyme can work. The metal stick binds, but only a few magnetic interactions are used in forming the ES complex. The bound substrate must still undergo the increase in free energy needed to reach the transition state. Now, however, the increase in free energy required to draw the stick into a bent and partially broken conformation is offset or "paid for" by the magnetic interactions that form between the enzyme and substrate in the transition state. Many of these interactions involve parts of the stick that are distant from the point of breakage; thus interactions

between the stickase and nonreacting parts of the stick provide some of the energy needed to catalyze stick breakage. This "energy payment" translates into a lower net activation energy and a faster reaction rate.

Real enzymes work on an analogous principle. Some weak interactions are formed in the ES complex, but the full complement of possible weak interactions between substrate and enzyme are formed only when the substrate reaches the transition state. The free energy (binding energy) released by the formation of these interactions partially offsets the energy required to get to the top of the energy hill. The summation of the unfavorable (positive) ΔG and the favorable (negative) binding energy (ΔGB) results in a lower net activation energy (Fig. 8–7). Even on the enzyme, the transition state represents a brief point in time that the substrate spends atop an energy hill. The enzyme-catalyzed reaction is much faster than the uncatalyzed process, however, because the hill is much smaller. The important principle is that weak-bonding interactions between the enzyme and the substrate provide the major driving force for enzymatic catalysis. The groups on the substrate that are involved in these weak interactions can be at some distance from the bonds that are broken or changed. The weak interactions that are formed only in the transition state are those that make the primary contribution to catalysis.

The requirement for multiple weak interactions to drive catalysis is one reason why enzymes (and some coenzymes) are so large. The enzyme must provide functional groups for ionic interactions, hydrogen bonds, and other interactions, and also precisely position these groups so that binding energy is optimized in the transition state.

Can binding energy account for the huge rate accelerations brought about by enzymes? Yes. As a point of reference, Equation 8–6 allows us to calculate that about 5.7 kJ/mol of free energy is required to accelerate a first-order reaction by a factor of ten under conditions commonly found in cells. The energy available from formation of a single weak interaction is generally estimated to be 4 to 30 kJ/mol. The overall energy available from formation of a number of such interactions can lower activation energies by the 60 to 80 kJ/mol required to explain the large rate enhancements observed for many enzymes.

The same binding energy that provides energy for catalysis also makes the enzyme specific. Specificity refers to the ability of an enzyme to discriminate between two competing substrates. Conceptually, this idea is easy to distinguish from the idea of catalysis. Catalysis and specificity are much more difficult to distinguish experimentally because they arise from the same phenomenon. If an enzyme active site has functional groups arranged optimally to form a variety of weak interactions with a given substrate in the transition state, the enzyme will not be able to interact as well with any other substrate. For example, if the normal substrate has a hydroxyl group that forms a specific hydrogen bond with a Glu residue on the enzyme, any molecule lacking that particular hydroxyl group will generally be a poorer substrate for the enzyme. In addition, any molecule with an extra functional group for which the enzyme has no pocket or binding site is likely to be excluded from the enzyme. In general, specificity is also derived from the formation of multiple weak interactions between the enzyme and many or all parts of its specific substrate molecule.

The general principles outlined above can be illustrated by a variety of recognized catalytic mechanisms. These mechanisms are not mutually exclusive, and a given enzyme will often incorporate several in its own complete mechanism of action. It is often difficult to quantify the contribution of any one catalytic mechanism to the rate and/or specificity of an enzyme-catalyzed reaction.

Binding energy is the dominant driving force in several mechanisms, and these can be the major, and sometimes the only, contribution to catalysis. This can be illustrated by considering what needs to occur for a reaction to take place. Prominent physical and thermodynamic barriers to reaction include (1) entropy, the relative motion of two molecules in solution; (2) the solvated shell of hydrogen-bonded water that surrounds and helps to stabilize most biomolecules in aqueous solution; (3) the electronic or structural distortion of substrates that must occur in many reactions; and (4) the need to achieve proper alignment of appropriate catalytic functional groups on the enzyme. Binding energy can be used to overcome all of these barriers.

A large reduction in the relative motions of two substrates that are to react, or entropy reduction, is one of the obvious benefits of binding them to an enzyme. Binding energy holds the substrates in the proper orientation to react – a major contribution to catalysis because productive collisions between molecules in solution can be exceedingly rare. Substrates can be precisely aligned on the enzyme. A multitude of weak interactions between each substrate and strategically located groups on the enzyme clamp the substrate molecules into the proper positions. Studies have shown that constraining the motion of two reactants can produce rate enhancements of as much as 108 M (a rate equivalent to that expected if the reactants were present at the impossibly high concentration of 100,000,000 M).

Formation of weak bonds between substrate and enzyme also results in desolvation of the substrate. Enzyme–substrate interactions replace most or all of the hydrogen bonds that may exist between the substrate and water in solution.

Binding energy involving weak interactions formed only in the reaction transition state helps to compensate thermodynamically for any strain or distortion that the substrate must undergo to react. Distortion of the substrate in the transition state may be electrostatic or structural.

The enzyme itself may undergo a change in conformation when the substrate binds, induced again by multiple weak interactions with the substrate. This is referred to as induced fit, a mechanism postulated by Daniel Koshland in 1958. Induced fit may serve to bring specific functional groups on the enzyme into the proper orientation to catalyze the reaction. The conformational change may also permit formation of additional weak-bonding interactions in the transition state. In either case the new conformation may have enhanced catalytic properties.

Figure 8–8  Unfavorable charge development during cleavage of an amide. This type of reaction is catalyzed by chymotrypsin and other proteases. Charge development can be circumvented by donation of a proton by H3O+ (specific acid catalysis) or by HA (general acid catalysis), where HA represents any acid. Similarly, charge can be neutralized by proton abstraction by OH (specific base catalysis) or by B ⁚ (general base catalysis), where B ⁚ represents any base.
Figure 8–9  Many organic reactions are promoted by proton donors (general acids) or proton acceptors (general bases). The active sites of some enzymes contain amino acid functional groups, such as those shown here, that can participate in the catalytic process as proton donors or proton acceptors.
Figure 8–10  The first step in the reaction catalyzed by chymotrypsin, also called the acylation step. The hydroxyl group of Ser195 is the nucleophile in a reaction aided by general base catalysis (the base is the side chain of His57). The chymotrypsin reaction is described in more detail in Fig. 8–19.

Once a substrate is bound, additional modes of catalysis can be employed by an enzyme to aid bond cleavage and formation, using properly positioned catalytic functional groups. Among the best characterized mechanisms are general acid–base catalysis and covalent catalysis. These are distinct from mechanisms based on binding energy because they generally involve covalent interaction with a substrate, or group transfer to or from a substrate.

General Acid–Base Catalysis  Many biochemical reactions involve the formation of unstable charged intermediates that tend to break down rapidly to their constituent reactant species, thus failing to undergo reaction (Fig. 8–8). Charged intermediates can often be stabilized (and the reaction thereby catalyzed) by transferring protons to or from the substrate or intermediate to form a species that breaks down to products more readily than to reactants. The proton transfers can involve the constituents of water alone or may involve other weak proton donors or acceptors. Catalysis that simply involves the H+ (H3O+) or OH ions present in water is referred to as specific acid or base catalysis. If protons are transferred between the intermediate and water faster than the intermediate breaks down to reactants, the intermediate will effectively be stabilized every time it forms. No additional catalysis mediated by other proton acceptors or donors will occur. In many cases, however, water is not enough. The term general acid–base catalysis refers to proton transfers mediated by other classes of molecules. It is observed in aqueous solutions only when the unstable reaction intermediate breaks down to reactants faster than
the rate of proton transfer to or from water. A variety of weak organic acids can supplement water as proton donors in this situation, or weak organic bases can serve as proton acceptors. A number of amino acid side chains can similarly act as proton donors and acceptors (Fig. 8–9). These groups can be precisely positioned in an enzyme active site to allow proton transfers, providing rate enhancements on the order of 102 to 105.

Covalent Catalysis  This involves the formation of a transient covalent bond between the enzyme and substrate. Consider the hydrolysis of a bond between groups A and B:

H2O
A–B       A  +  B

In the presence of a covalent catalyst (an enzyme with a nucleophilic group X ⁚ ) the reaction becomes

H2O
A–B  +  X ⁚   →   A–X  +  B       A  +  X ⁚  +  B

This alters the pathway of the reaction and results in catalysis only when the new pathway has a lower activation energy than the uncatalyzed pathway. Both of the new steps must be faster than the uncatalyzed reaction. A number of amino acid side chains (including all of those in Fig. 8–9), as well as the functional groups of some enzyme cofactors, serve as nucleophiles on some enzymes in the formation of covalent bonds with substrates. These covalent complexes always undergo further reaction to regenerate the free enzyme. The covalent bond formed between the enzyme and the substrate can activate a substrate for further reaction in a manner that is usually specific to the group or coenzyme involved. The chemical contribution to catalysis provided by individual coenzymes is described in detail as each coenzyme is encountered in Part III of this book.

Metal Ion Catalysis  Metals, whether tightly bound to the enzyme or taken up from solution along with the substrate, can participate in catalysis in several ways. Ionic interactions between an enzyme-bound metal and the substrate can help orient a substrate for reaction or stabilize charged reaction transition states. This use of weak-bonding interactions between the metal and the substrate is similar to some of the uses of enzyme–substrate binding energy described earlier. Metals can also mediate oxidation–reduction reactions by reversible changes in the metal ion’s oxidation state. Nearly a third of all known enzymes require one or more metal ions for catalytic activity.

A combination of several catalytic strategies is usually employed by an enzyme to bring about a rate enhancement. A good example of the use of both covalent catalysis and general acid–base catalysis occurs in chymotrypsin. The first step in the reaction catalyzed by chymotrypsin is the cleavage of a peptide bond. This is accompanied by formation of a covalent linkage between a Ser residue on the enzyme and part of the substrate; this reaction is enhanced by general base catalysis by other groups on the enzyme (Fig. 8–10). The chymotrypsin reaction is described in more detail later in this chapter.
www.bioinfo.org.cn/book/biochemistry/chapt08/bio1.htm

Multiple approaches are commonly used to study the mechanism of action of purified enzymes. A knowledge of the three-dimensional structure of a protein provides important information. The value of structural information is greatly enhanced by classical protein chemistry and modern methods of site-directed mutagenesis (changing the amino acid sequence of a protein in a defined way by genetic engineering; see Chapter 28) that permit enzymologists to examine the role of individual amino acids in structure and enzyme action. However, the rate of the catalyzed reaction can also reveal much about the enzyme. The study of reaction rates and how they change in response to changes in experimental parameters is known as kinetics. This is the oldest approach to understanding enzyme mechanism, and one that remains most important today. The following is a basic introduction to the kinetics of enzyme-catalyzed reactions. The more advanced student may wish to consult the texts and articles cited at the end of this chapter.

Figure 8–11  Effect of substrate concentration on the initial velocity of an enzyme-catalyzed reaction. Vmax can only be approximated from such a plot, because V0 will approach but never quite reach Vmax. The substrate concentration at which V0 is half maximal is Km, the Michaelis–Menten constant. The concentration of enzyme E in an experiment such as this is generally so low that [S] ≫ [E] even when [S] is described as low or relatively low. The units given are typical for enzyme-catalyzed reactions and are presented only to help illustrate the meaning of V0 and [S]. (Note that the curve describes part of a rectangular hyperbola, with one asymptote at Vmax. If the curve were continued below [S] = 0, it would approach a vertical asymptote at [S] = –Km.)

A discussion of kinetics must begin with some fundamental concepts. One of the key factors affecting the rate of a reaction catalyzed by a purified enzyme in vitro is the amount of substrate present, [S]. But studying the effects of substrate concentration is complicated by the fact that [S] changes during the course of a reaction as substrate is converted to product. One simplifying approach in a kinetic experiment is to measure the initial rate (or initial velocity), designated V0, when [S] is generally much greater than the concentration of enzyme. Then, if the time is sufficiently short following the start of a reaction, changes in [S] are negligible, and [S] can be regarded as a constant.

The effect on V0 of varying [S] when the enzyme concentration is held constant is shown in Figure 8–11. At relatively low concentrations of substrate, V0 increases almost linearly with an increase in [S]. At higher substrate concentrations, V0 increases by smaller and smaller amounts in response to increases in [S]. Finally, a point is reached beyond which there are only vanishingly small increases in V0 with increasing [S] (Fig. 8–11). This plateau is called the maximum velocity, Vmax.

The ES complex is the key to understanding this kinetic behavior, just as it represented a starting point for the discussion of catalysis. The kinetic pattern in Figure 8–11 led Victor Henri to propose in 1903 that an enzyme combines with its substrate molecule to form the ES complex as a necessary step in enzyme catalysis. This idea was expanded into a general theory of enzyme action, particularly by Leonor Michaelis and Maud Menten in 1913. They postulated that the enzyme first combines reversibly with its substrate to form an enzyme–substrate complex in a relatively fast reversible step:

k1
E  +  S      ⇌      ES
k−1
(8–7)

The ES complex then breaks down in a slower second step to yield the free enzyme and the reaction product P:

k2
ES      ⇌      E  +  P
k−2
(8–8)

In this model the second reaction (Eqn 8–8) is slower and therefore limits the rate of the overall reaction. It follows that the overall rate of the enzyme-catalyzed reaction must be proportional to the concentration of the species that reacts in the second step, that is, ES.

At any given instant in an enzyme-catalyzed reaction, the enzyme exists in two forms, the free or uncombined form E and the combined form ES. At low [S], most of the enzyme will be in the uncombined form E. Here, the rate will be proportional to [S] because the equilibrium of Equation 8–7 will be pushed toward formation of more ES as [S] is increased. The maximum initial rate of the catalyzed reaction (Vmax) is observed when virtually all of the enzyme is present as the ES complex and the concentration of E is vanishingly small. Under these conditions, the enzyme is “saturated” with its substrate, so that further increases in [S] have no effect on rate. This condition will exist when [S] is sufficiently high that essentially all the free enzyme will have been converted into the ES form. After the ES complex breaks down to yield the product P, the enzyme is free to catalyze another reaction. The saturation effect is a distinguishing characteristic of enzyme catalysts and is responsible for the plateau observed in Figure 8–11.

When the enzyme is first mixed with a large excess of substrate, there is an initial period called the pre-steady state during which the concentration of the ES complex builds up. The pre-steady state is usually too short to be easily observed. The reaction quickly achieves a steady state in which [ES] (and the concentration of any other intermediates) remains approximately constant over time. The measured V0 generally reflects the steady state even though V0 is limited to early times in the course of the reaction. Michaelis and Menten concerned themselves with the steady-state rate, and this type of analysis is referred to as steady-state kinetics.

Figure 8–12  Dependence of initial velocity on substrate concentration, showing the kinetic parameters that define the limits of the curve at high and low [S]. At low [S], Km ≫ [S], and the [S] term in the denominator of the Michaelis–Menten equation (Eqn 8–20) becomes insignificant; the equation simplifies to V0 = Vmax[S]/Km and V0 exhibits a linear dependence on [S], as observed. At high [S], where [S] ≫ Km, the Km term in the denominator of the Michaelis–Menten equation becomes insignificant, and the equation simplifies to V0 = Vmax; this is consistent with the plateau observed at high [S]. The Michaelis–Menten equation is therefore consistent with the observed dependence of V0 on [S], with the shape of the curve defined by the terms Vmax/Km at low [S] and Vmax at high [S].

Figure 8–11 shows the relationship between [S] and V0 for an enzymatic reaction. The curve expressing this relationship has the same general shape for most enzymes (it approaches a rectangular hyperbola). The hyperbolic shape of this curve can be expressed algebraically by the Michaelis–Menten equation, derived by these workers starting from their basic hypothesis that the rate-limiting step in enzymatic reactions is the breakdown of the ES complex to form the product and the free enzyme.

The important terms are [S], V0, Vmax, and a constant called the Michaelis–Menten constant or Km. All of these terms are readily measured experimentally.

Here we shall develop the basic logic and the algebraic steps in a modern derivation of the Michaelis–Menten equation. The derivation starts with the two basic reactions involved in the formation and breakdown of ES (Eqns 8–7 and 8–8). At early times in the reaction, the concentration of the product [P] is negligible and the simplifying assumption is made that k−2 can be ignored. The overall reaction then reduces to

k1 k2
E  +  S      ⇌      ES      →      E  +  P
k−1
(8–9)

V0 is determined by the breakdown of ES to give product, which is determined by [ES]:

V0  =  k2[ES]
(8–10)

As [ES] in Equation 8–10 is not easily measured experimentally, we must begin by finding an alternative expression for [ES]. First, we will introduce the term [Et], representing the total enzyme concentration (the sum of the free and substrate-bound enzyme). Free or unbound enzyme can then be represented by [Et] – [ES]. Also, because [S] is ordinarily far greater than [Et], the amount of substrate bound by the enzyme at any given time is negligible compared with the total [S]. With these in mind, the following steps will lead us to an expression for V0 in terms of parameters that are easily measured.

Step 1.  The rates of formation and breakdown of ES are determined by the steps governed by the rate constants k1 (formation) and k−1 + k2 (breakdown), according to the expressions

Rate of ES formation  =  k1([Et] – [ES])[S]
(8–11)
Rate of ES breakdown  =  k−1[ES]  +  k2[ES]
(8–12)

Step 2.  An important assumption is now made that the initial rate of reaction reflects a steady state in which [ES] is constant, i.e., the rate of formation of ES is equal to its rate of breakdown. This is called the steady-state assumption. The expressions in Equations 8–11 and 8–12 can be equated at the steady state, giving

k1([Et] – [ES])[S]  =  k−1[ES]  +  k2[ES]
(8–13)

Step 3.  A series of algebraic steps is now taken to solve Equation 8–13 for [ES]. The left side is multiplied out and the right side is simplified to give

k1[Et][S] – k1[ES][S]  =  (k−1 + k2)[ES]
(8–14)
Adding the term k1[ES][S] to both sides of the equation and simplifying gives

k1[Et][S]  =  (k1[S] + k−1 + k2)[ES]
(8–15)

Solving this equation for [ES] gives

k1[Et][S]
[ES]  =  
k1[S] + k−1 + k2
(8–16)

This can now be simplified further, in such a way as to combine the rate constants into one expression:

[Et][S]
[ES]  =  
[S] + (k2 + k−1)/k1
(8–17)

The term (k2 + k−1)/k1 is defined as the Michaelis–Menten constant, Km. Substituting this into Equation 8–17 simplifies the expression to

[Et][S]
[ES]  =  
Km + [S]
(8–18)

Step 4.  V0 can now be expressed in terms of [ES]. Equation 8–18 is used to substitute for [ES] in Equation 8–10, giving

k2[Et][S]
V0  =  
Km + [S]
(8–19)

This equation can be further simplified. Because the maximum velocity will occur when the enzyme is saturated and [ES] = [Et], Vmax can be defined as k2[Et]. Substituting this in Equation 8–19 gives

Vmax[S]
V0  =  
Km + [S]
(8–20)

This is the Michaelis–Menten equation, the rate equation for a one-substrate, enzyme-catalyzed reaction. It is a statement of the quantitative relationship between the initial velocity V0, the maximum initial velocity Vmax, and the initial substrate concentration [S], all related through the Michaelis–Menten constant Km. Does the equation fit the facts? Yes; we can confirm this by considering the limiting situations where [S] is very high or very low, as shown in Figure 8–12.

An important numerical relationship emerges from the Michaelis–Menten equation in the special case when V0 is exactly one-half Vmax (Fig. 8–12). Then

Vmax[S]
Vmax/2  =  
Km + [S]
(8–21)

On dividing by Vmax, we obtain

[S]
1/2  =  
Km + [S]
(8–22)

Solving for Km, we get Km + [S] = 2[S], or

Km  =  [S],   when V0 = Vmax/2
(8–23)

This represents a very useful, practical definition of Km: Km is equivalent to that substrate concentration at which V0 is one-half Vmax. Note that Km has units of molarity.

The Michaelis–Menten equation (8–20) can be algebraically transformed into forms that are useful in the practical determination of Km and Vmax (Box 8–1) and, as we will describe later, in the analysis of inhibitor action (see Box 8–2).

B O X  8–1
Transformations of the Michaelis–Menten Equation: The Double-Reciprocal Plot

The Michaelis–Menten equation:

Vmax[S]
V0  =  
Km + [S]

can be algebraically transformed into forms that are more useful in plotting experimental data. One common transformation is derived simply by taking the reciprocal of both sides of the Michaelis–Menten equation to give

1 Km + [S]

  =  
V0 Vmax[S]

Separating the components of the numerator on the right side of the equation gives

1 Km [S]

  =  
  +  
V0 Vmax[S] Vmax[S]

which simplifies to

1 Km 1 1

  =  
 
  +  
V0  Vmax  [S] Vmax

This equation is a transform of the Michaelis–Menten equation called the Lineweaver–Burk equation. For enzymes obeying the Michaelis–Menten relationship, a plot of 1/V0 versus 1/[S] (the “double-reciprocal” of the V0-versus-[S] plot we have been using to this point) yields a straight line (Fig. 1). This line will have a slope of Km/Vmax, an intercept of 1/Vmax on the 1/V0 axis, and an intercept of –1/Km on the 1/[S] axis. The double-reciprocal presentation, also called a Lineweaver–Burk plot, has the great advantage of allowing a more accurate determination of Vmax, which can only be approximated from a simple plot of V0 versus [S] (see Fig. 8–12).
Figure 1 
A double-reciprocal, or
Lineweaver–Burk, plot.

Other transformations of the Michaelis–Menten equation have been derived and used. Each has some particular advantage in analyzing enzyme kinetic data.

The double-reciprocal plot of enzyme reaction rates is very useful in distinguishing between certain types of enzymatic reaction mechanisms (see Fig. 8–14) and in analyzing enzyme inhibition (see Box 8–2).

It is important to distinguish between the Michaelis–Menten equation and the specific kinetic mechanism upon which it was originally based. The equation describes the kinetic behavior of a great many enzymes, and all enzymes that exhibit a hyperbolic dependence of V0 on [S] are said to follow Michaelis–Menten kinetics. The practical rule that Km = [S] when V0 = Vmax/2 (Eqn 8–23) holds for all enzymes that follow Michaelis–Menten kinetics (the major exceptions to Michaelis–Menten kinetics are the regulatory enzymes, discussed at the end of this chapter). However, this equation does not depend on the relatively simple two-step reaction mechanism proposed by Michaelis and Menten (Eqn 8–9). Many enzymes that follow Michaelis–Menten kinetics have quite different reaction mechanisms, and enzymes that catalyze

reactions with six or eight identifiable steps will often exhibit the same steady-state kinetic behavior. Even though Equation 8–23 holds true for many enzymes, both the magnitude and the real meaning of Vmax and Km can change from one enzyme to the next. This is an important limitation of the steady-state approach to enzyme kinetics. Vmax and Km are parameters that can be obtained experimentally for any given enzyme, but by themselves they provide little information about the number, rates, or chemical nature of discrete steps in the reaction. Steady-state kinetics nevertheless represents the standard language by which the catalytic efficiencies of enzymes are characterized and compared. We now turn to the application and interpretation of the terms Vmax and Km.

A simple graphical method for obtaining an approximate value for Km is shown in Figure 8–12. A more convenient procedure, using a double-reciprocal plot, is presented in Box 8–1. The Km can vary greatly from enzyme to enzyme, and even for different substrates of the same enzyme (Table 8–6). The term is sometimes used (inappropriately) as an indication of the affinity of an enzyme for its substrate.

The actual meaning of Km depends on specific aspects of the reaction mechanism such as the number and relative rates of the individual steps of the reaction. Here we will consider reactions with two steps. On page 214 Km is defined by the expression

k2 + k−1
Km  =  
k1
(8–24)

For the Michaelis–Menten reaction, k2 is rate-limiting; thus k2k−1 and Km reduces to k−1/k1, which is defined as the dissociation constant, KS, for the ES complex. Where these conditions hold, Km does represent a measure of the affinity of the enzyme for the substrate in the ES complex. However, this scenario does not apply to all enzymes. Sometimes k2k−1, and then Km = k2/k1. In other cases, k2 and k−1 are comparable, and Km remains a more complex function of all three rate constants (Eqn 8–24). These situations were first analyzed by Haldane along with George E. Briggs in 1925. The Michaelis–Menten equation and the characteristic saturation behavior of the enzyme still apply, but Km cannot be considered a simple measure of substrate affinity. Even more common are cases in which the reaction goes through multiple steps after formation of the ES complex; Km can then become a very complex function of many rate constants.

Vmax also varies greatly from one enzyme to the next. If an enzyme reacts by the two-step Michaelis–Menten mechanism, Vmax is equivalent to k2[Et], where k2 is the rate-limiting step. However, the number of reaction steps and the identity of the rate-limiting step(s) can vary from enzyme to enzyme. For example, consider the quite common situation where product release, EP → E + P, is rate-limiting:

k1 k2 k3
E  +  S      ⇌      ES      ⇌      EP      →      E  +  P
k−1 k−2
(8–25)

In this case, most of the enzyme is in the EP form at saturation, and Vmax = k3[Et]. It is useful to define a more general rate constant, kcat, to describe the limiting rate of any enzyme-catalyzed reaction at saturation. If there are several steps in the reaction, and one is clearly rate-limiting, kcat is equivalent to the rate constant for that limiting step. For the Michaelis–Menten reaction, kcat = k2. For the reaction of Equation 8–25, kcat = k3. When several steps are partially rate-limiting, kcat can become a complex function of several of the rate constants that define each individual reaction step. In the Michaelis–Menten equation, kcat = Vmax/[Et], and Equation 8–19 becomes

kcat[Et][S]
V0  =  
Km + [S]
(8–26)

The constant kcat is a first-order rate constant with units of reciprocal time, and is also called the turnover number. It is equivalent to the number of substrate molecules converted to product in a given unit of time on a single enzyme molecule when the enzyme is saturated with substrate. The turnover numbers of several enzymes are given in Table 8–7.

The kinetic parameters kcat and Km are generally useful for the study and comparison of different enzymes, whether their reaction mechanisms are simple or complex. Each enzyme has optimum values of kcat and Km that reflect the cellular environment, the concentration of substrate normally encountered in vivo by the enzyme, and the chemistry of the reaction being catalyzed.

Comparison of the catalytic efficiency of different enzymes requires the selection of a suitable parameter. The constant kcat is not entirely satisfactory. Two enzymes catalyzing different reactions may have the same kcat (turnover number), yet the rates of the uncatalyzed reactions may be different and thus the rate enhancement brought about by the enzymes may differ greatly. Also, kcat reflects the properties

of an enzyme when it is saturated with substrate, and is less useful at low [S]. The constant Km is also unsatisfactory by itself. As shown by Equation 8–23, Km must have some relationship to the normal [S] found in the cell. An enzyme that acts on a substrate present at a very low concentration in the cell will tend to have a lower Km than an enzyme that acts on a substrate that is normally abundant.

The most useful parameter for a discussion of catalytic efficiency is one that includes both kcat and Km. When [S] ≪ Km, Equation 8–26 reduces to the form

kcat
V0  =  
 [Et][S]
Km
(8–27)