Defining the Genetic Coding Problem, 1954-1957

James Watson and Francis Crick's insight that genetic information is embedded in the physical structure of deoxyribonucleic acid (DNA) made possible a new understanding of heredity at the molecular level and opened up new avenues of research into the genetic control of essential biological processes, most importantly the synthesis of proteins. Watson and Crick were the first to realize that the seemingly random sequence of the four bases in DNA formed a code which specified the order of the twenty amino acids that make up most proteins. (It was Watson and Crick who drew up the list of twenty from dispersed and confused information in the biochemical literature.)

Before their discovery of the double helix, the term genetic code had no meaning; afterwards, deciphering the code--putting together the dictionary by which the four-letter nucleic acid language is translated into the twenty-letter protein language--became the most urgent and ambitious undertaking of biologists throughout the world, an effort that defined the classical age of molecular biology. Francis Crick was the intellectual leader in this effort, distributing, critiquing, and connecting experimental results from many sources, mediating between scientists of differing opinions, and proposing new experiments and lines of investigation.

During the 1950s, Crick pursued research of remarkable breadth. He developed a general theory about the structure of protein and ribonucleic acid (RNA) in viruses with Watson when the latter returned to Cambridge for a visit in 1955. Also during the mid 1950s, Crick collaborated with the American biochemist Alexander Rich in unraveling the structure of collagen, a family of extracellular proteins that are found in connective tissue and give it strength and resilience, and of polyadenylic acid, a uniform polymer composed entirely of a combination of phosphate groups with adenosine. Adenosine, a base that is chemically similar to adenine and takes its place in RNA, plays a major role in regulating metabolism, and so was of particular interest to biomedical researchers.

Crick's most important and enduring work during the 1950s and 1960s, however, was on the genetic code for protein synthesis, work begun with Vernon Ingram and continued with Sydney Brenner. Uncovering how amino acids were strung together in polypeptide chains to form proteins, nature's most complex, diverse, and versatile molecules, and how genes controlled this process, were problems that fascinated Crick when he first entered biology in 1947. By the early 1950s, researchers had developed two competing theories about the mechanism for protein synthesis. The first of these, the peptide or multi-enzyme theory, held that proteins were assembled by stepwise coupling of small peptides, the amino acids, into polypeptide chains, a process guided by enzymes, proteins that catalyze a wide range of reactions in the cell. The second was the template theory, which held that proteins were synthesized on templates, one for each protein. The templates were genes.

Scientists who followed the discovery of the double helix--many did not--saw that it strengthened the template theory, because it suggested how stretches of DNA might serve as templates for protein synthesis. Among these scientists was the physicist and cosmologist George Gamow, who conjectured that DNA provided a direct template for proteins. When looked at in a certain way, Gamow contended, DNA could be seen to have twenty different cavities along its length, a number equal to that of the common amino acids.

To explore further these emerging ideas about genetic control of protein synthesis, Gamow founded the RNA Tie Club, a hand-picked group of twenty molecular biologists, which included Crick, Watson, and Rich. As a badge of membership, each received a tie with the chemical symbol of one of the twenty amino acids embroidered upon it. The group never met, but Crick used it as a sounding board for some of his most important theories about protein synthesis by circulating his ideas in draft form among its members.

Crick, like others, doubted that either DNA or ribonucleic acid (RNA, a single-stranded nucleic acid whose sugar groups consists of ribose, not deoxyribose as in DNA) could provide a physical template intricate enough to accommodate the twenty different types of amino acid. Moreover, protein synthesis took place in the cytoplasm of the cell outside the nucleus, while almost all DNA was found in the nucleus. Watson, Crick, and others agreed that a genetic messenger system was needed, with RNA as the likely messenger, a mechanism for transcribing genetic information from DNA onto RNA and for exporting RNA to the site of protein synthesis in the cytoplasm, a workbench called the ribosome. In addition, Crick postulated the existence of a second type of RNA molecule, this one designed to capture amino acids (also found in the cytoplasm of the cell), to transport them to the ribosome, and to fit them into the growing polypeptide chain. He circulated his idea among members of the RNA Tie Club in January 1955 in a note entitled "On Degenerate Templates and the Adaptor Hypothesis," one of the most important unpublished papers in the history of science.

Crick suggested that the job of transporting and joining amino acids to the polypeptide chain at the ribosome was performed by twenty different adaptor molecules, one for each amino acid and all consisting of RNA. At one end, each adaptor was shaped to hold a particular amino acid, which was attached to the adaptor by enzymes whose existence was also predicted by Crick. At the other end, adaptors had a sequence of bases (soon to be called anticodon) that matched a complementary sequence (the codon) in the messenger RNA template through base pairing. The adaptor molecule could thus identify the place on the messenger RNA at which a particular amino acid was called for, to be inserted into the growing polypeptide chain. Once it had done its job, the adaptor would be released, to perform its job again. To the surprise of biochemists (who thought that if adaptor enzymes existed they would have already been found), experiments soon confirmed the existence of these adaptors and enzymes exactly as Crick had predicted. The adaptors have since come to be known as transport RNA (tRNA).

The major problem yet to be resolved was how a sequence of the four bases of DNA (A, T, C, G) could encode instructions for assembling the twenty amino acids in proteins. Clearly, the code could not consist of two letters, or bases, because there are only sixteen possible combinations of the four bases taken two at a time (4x4). A three-letter code, on the other hand, offered 64 possible combinations (4x4x4), seemingly too many to code for only twenty amino acids. In trying to understand how a triplet code might work, Crick surmised that a mutation in a gene could arise as a change in the order of the bases. According to his coding hypothesis, this would produce a change in the order of the amino acids in the protein for which the gene coded; if such a change in the protein could be detected, it might be possible to deduce the correlation, or as Crick called it, the colinearity between the order of the bases in the gene and of the amino acids in the protein.

In the summer of 1954, Crick embarked on a series of experiments to test this hypothesis, working with lysozyme, an enzyme found in egg white but also in human tears. Each morning during the course of these experiments, an assistant would hold a slice of raw onion under Crick's eye and collect his tears flowing forth. (When Crick proposed to use the tears of his two-year old daughter, he was sternly forbidden to do so by his wife.) However, Crick did not find a mutant of the lysozyme in either the tears or the egg whites he used, and thus could not test his theory.

Crick summarized his ideas about the genetic code in a paper entitled "On Protein Synthesis" and presented it at University College London in September 1957. The paper, according to the historian Horace Judson, "permanently altered the logic of biology." In it Crick proposed the "sequence hypothesis," which held that genetic information was encoded in the sequence of the bases in DNA. The base sequence was to be read in linear fashion, from a fixed starting point and in one direction. Secondly, Crick propounded the "Central Dogma" of molecular biology, which stated that genetic information which had been transcribed from DNA to messenger RNA and used to build a protein, could not again flow in the reverse direction from protein to RNA. The Central Dogma thus implied that acquired changes in a protein could not be inherited, an implication that conformed to Charles Darwin's theory of evolution. The Central Dogma further implied that DNA contained all the information necessary for specifying the sequence of amino acids in a protein, and thus its shape and function; no external information was needed. Finally, Crick asserted that the genetic code was universal to all higher forms of life, as in fact it has proved to be.