Documente Academic
Documente Profesional
Documente Cultură
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Sigma Xi, The Scientific Research Society is collaborating with JSTOR to digitize, preserve and extend access
to American Scientist.
http://www.jstor.org
David B. Searls
metaphors have been a mathematics and computer science. It English sentence in progressively
part ofmolecular biology ever since
Linguistic now appears that the idea of a genera? greater detail, until finally individual
the structure of DNA was solved in tive grammar may also be a powerful words are introduced. The first rule
1953. Biologists speak of the genetic tool for expressing the biological mes? states that a sentence consists of a noun
code, of gene expression and of reading sages inscribed inDNA and RNA. phrase followed by a verb phrase; the
frames in nucleic acids. DNA is tran? Finding effective methods for read? second rule then defines a noun phrase
scribed intoRNA, which is then translat? ing the language of nucleic acids is as an article followed by a modified
ed into protein. Certain enzymes are rapidly becoming an issue of practical noun; the third rule declares that a verb
even said to editRNA. Despite all this concern. The Human Genome Project, phrase ismade up of a verb and a noun
linguistic terminology, however, there in itsmost ambitious phase, proposes phrase. The fourth rule offers a choice:
has been littleeffortto apply the tools of to record the entire sequence of the The vertical bar I simply means "ox,"
formal language theory to the problems three billion DNA base pairs thatmake so that a modified noun may be either a
of interpretingbiological sequences. up the human genetic endowment. If noun by itself or an adjective followed
At about the same time thatWatson this archive is to be of any use, biolo? by a modified noun. The remaining
and Crick were laying the foundation gistswill have tobe able to identifyand production rules give a small vocabu?
ofmolecular biology with theirmodel retrievemeaningful segments of it.This lary of nouns, verbs, adjectives and arti?
of the double helix, Noam Chomsky is no trivial challenge, given that func? cles. All of the italicized words are
was initiating an equally fundamental tional genes make up only a few per? "nonterminal" symbols, meaning that
revolution in linguistics. The engine of cent of the totalDNA, and our knowl? they never appear in the actual sen?
this revolution was the idea of a genera? edge of how they are controlled by tences that are the final output of the
tive grammar: a set of rules that can surrounding elements of the genome is grammar; lexical items such as lin?
generate all the grammatically accept? still rudimentary. Equipped with guist and sees, on the other hand, are
able sentences of a language, with? knowledge of the linguistic structure of "terminal" symbols thatwill be part of
out
generating any erroneous ones. the genome, one can endeavor towrite the generated sentences. The arrow
Through this device a relatively small a computer program that parses genes within each rule can be read as "pro?
set of rules can capture the structure of a and other high-level features ofDNA. duces" or
"expands
into."
opstrtam
CCAAT-box_ TftTA-bOX_
' ' '
aggacacaggtcagcctigaccaatgacttttaagtaccatgga^aacagggggccagaacttcggcagtaaagaataaaaggccagacagaga^gcagc ' '
19410 19*420 19430 19440 19450 19*460 19470 19480 19490 19*500
transcript
???tr?M
CAP C? Ktt val his Ph< thr *l* gvx glv lys ala ala val _tiir ser
' ' ' ' * ' '
agcacaLaLCtgcttccgacacagctgcaatcactagcaagctctcaggcctggcatcatggtgcattitactgcigaggagaaggcXgccgtcactagc
19*510 19520 19530 19540 19*550 19*560 19570 19580 39?S0<
19*590
ff**I...in
I ?. .i nil... i..
transcript
?xon
l?n trp s?r lys mtt ?asn val gin glu ala aly qly qlu alaiatron_
ltu qlv 5' splict
cigtggagcaagatgaaigiggaagagj^ggagglgaagcct^ ' '
19620 19*610 19*630 19640 196S0
13660 19670 19630
19680 1970(
ass*_
transcript
intron ?xon_ branch 3' splict arg ltu leu val v?l tyr pro trp__
thr gin
' ' ' ' '
caagttgattgggaaagtcctcaagattttttgcatctciaaitttgtatctgatatggtgtcatttcatagactcctcgttgtttaccccCggacccag ' ' '
19*710 19*720 19^30 19*740 19*750 19*760 19*770 19780 19*790 19801
transcript
arg phi phc asp s?r pht gly asn ltu str $tc pro s?r ala lie l<u ?gly asn pro lys val lys ala his gly lys lys val Ittt thr str phe gly asp
' ' '
agattttitgacagcttXggaaacctgtCRtctccctcTgccatcctgggcaaccccaagp^aa ' ' ' '
jgcccaTggcaagaaggtgctgacttccttxggas '
19810 19820 19830 19940 198S0 19860 13*870 19*880 19*890 19900
transcript
ala asp ^V* am
iys^asp
asn^tt Pro ly*_j^?_ieJL. hl*_?V-L..^I**-^ __^ Xt^ .i?L. grA. *^-JE^i* *y*__ jPj."*
'
axgcrLBT^aaaaacarggacaaccTcaBgccc^c ' ' ' ' ' '
19*910 19*920 19930 19*940 19*950 19*960 19*970 19*980 20000
19*990
ass_
transcript
intron
5' splice
' ' ' ' * '
tlcaggtgctggtgatgtgattttttggctttatattttgacattaattgaagctcataatcttattggaaagaccaacaaagatctcagaaatcatggg ' ' '
20*010 20*020 20*030 20040 20050 20*060 20*070 20*080 20090 20100
infinitecapacity is the rule formodified A Hierarchy of Languages 010101101, and countless others. In?
noun,which has a recursive structure: It Formal languages look nothing like the deed, the language is infinite, even
invokes itself.Each time modified-noun of
languages everyday experience. The though the alphabet has only a finite
expands into the sequence adjectivemod? theory of formal languages defines a number of symbols.
ified-noun, there is another instance of language as a set of "strings" formed by The language of binary integers has a
modified-noun to be expanded. The writing down a sequence of symbols simple generative grammar:
process can string together an unlimited drawn from some specified alphabet.
S -> OS I IS I8
number of adjectives. Although the sen? The strings do not have tomean any?
tences would quickly grow tiresome
thing; indeed, the term "string" is fa? Here the symbol S is a nonterminal,
and even nonsensical, there is no point vored over more familiar alternatives whereas 0 and 1 are terminals.As in the
atwhich adding just one more adjective such as "word" or "sentence" because it English grammar, the vertical bar
makes the sentence ungrammatical. carries no connotation that the sequence means "or." The one new element of the
Such a toy grammar can hardly begin of symbols should be meaningful. A grammar is the symbol e (theGreek let?
to express the diversity of English syn? string is a member of a
language if it terepsilon), which represents the "emp?
tax. To make the grammar more realis? satisfies some formally defined proper? ty string," the string of zero length. The
tic, itwould have tobe greatly elaborat? ty,such as being derivable from a given three alternative productions can be
ed, but that exercise would probably grammar?there are no other criteria. translated intowords as follows: "Any
reveal more about the peculiarities of As an example, consider the alpha? instance of the nonterminal S yields the
natural languages than about thenature bet made up of the two symbols 0 and string OS or the string IS or the empty
of grammars. A better strategy is to 1, and the language consisting of all string." Choosing the third possibility,
look instead at the grammars of still possible strings drawn from this 8, amounts to erasing the S.
alpha?
simpler languages, namely the abstract bet. The language, which is equivalent The operation of the grammar is
and formal languages studied inmath? to the set of all integers expressed inbi?
straightforward. Beginning with S, any
ematics and computer science?and nary notation O's), in? of the productions can be chosen. If the
(allowing leading
molecular genetics. cludes strings such as 0, 101, 111011, firstproduction selected is S ?> IS, then
said to be nested, and nested depen? l's and 2's. In the language on the left, the
length.
dencies are characteristic of context-free When thememory tape is allowed to number of l's in each string must equal the
number of 2's, whereas in the language on the
languages. When dependencies cross grow to any length, the resulting au? the number of O's always equals the
tomaton is a Turing machine, which is right
(as in a copy language), a context-sensi? number of l's. Both languages are context
tive language ismandated, because the device needed to generate an unre?
free, and their union (the set of all strings that
crossing dependencies can be estab? stricted language. The Turing machine are members of either language) is also con?
lished only with the freedom ofmove? is a kind of universal computer; no one text-free. However, the intersection of the lan?
ment that nonterminals enjoy during a has yet found any enhancement to this guages (the set of strings common to both) is
context-sensitive derivation. computing architecture thatwould al? the language with equal numbers of O's, l's
Another approach to understanding low it to execute some algorithm that it and 2's, which is greater than context-free.
the hierarchy looks at the kind of ma? cannot already handle. Thus any lan?
chine needed to generate each class of guage that can be produced by a digital bearing on the linguistic status ofDNA.
language. The machines in question are computer program can also be precisely Although the term generative gram?
not devices of steel or silicon but ab? specified by some unrestricted gram? mar suggests only the production of
stract and idealized computing mech? mar, and vice versa. language, grammars are equally useful
anisms. For a regular language, the When viewing languages as sets of in recognizing and analyzing language;
machine required is a finite-state au? strings, it is natural to ask what hap? they can listen as well as speak. One
tomaton, or FSA. As thename suggests, pens when languages are combined by way to do this is to apply the rules of
such a machine has a finitenumber of the operations of set union and set in? the grammar "in reverse," startingwith
states?typically one state for each non? tersection. The union of two languages a string of tenrtinal symbols and build
terminal in the grammar?which can be is a concept that should be familiar to
ar? anyone who
represented as nodes connected by is bilingual, and who
rows showing possible transitions be? therefore recognizes strings in either of
tween the states. The FSA begins opera? two languages. Two people who speak
tion in the state corresponding to the closely related languages might define
start symbol, then with each symbol the intersection of their languages as the
produced moves along an arrow to a set of strings theyboth recognize. Math?
new state.An important property of an ematicians investigate the "closure" of
FSA is that it has no storage facility collections of sets under such opera?
apart from the collection of states; ithas tions.A collection of languages isdosed
no way of recording information for lat? under the operation of set union if the
er use, and so it can produce patterns union of any two languages in the col?
only when they are "hard-wired" into lection is also a language in the same
the connections between states. collection. All the levels of the Chom?
More powerful languages are associ? sky hierarchy are closed under set
ated with more sophisticated machines. union; in other words, the union of two
A context-free language is generated by context-free languages (for example) Figure 7. Dependencies between distant ele?
ments of a string characterize the nonregular
a "pushdown automaton," which con? must still be a context-free language. It are strictly
languages. If the dependencies
sists of an FSA augmented by a "stack" is interesting, however, that the inter?
nested, as in the palindrome at the top, the lan?
that provides a limited form of auxil? section of two context-free languages? guage is typically context-free. When depen?
iarymemory. The stack works like a the strings those languages have in dencies of unrestricted extent cross one anoth?
stack of cafeteria trays in that only the common?cannot be guaranteed to be er, as in the example of a copy language at the
topmost item is accessible; it is also context-free. This factmay have some bottom, the language is beyond context-free.
Figure 8. Linguistic themes abound in the mechanism of gene expression, illustrated here in a highly compressed and schematic way.
Transduction, or the of one string to another a different alphabet, is marked
element-by-element mapping having by red arrows; instances
of this process are the transcription of the double-helical DNA (purple) into RNA (blue) by the enzyme RNA polymerase, shown as an ellip?
soid near the middle of the diagram, and the subsequent translation of RNA into protein (orange) by the snowman-shaped ribosome at the
lower right. Recognition of specific classes of strings is indicated
by yellow, as at the upper leftwhere regulatory proteins find promoter
sequences in the DNA that help determine when and where should begin. Transformation, the piecewise or wholesale modi?
transcription
fication of a string, is indicated by orange sequences, such as the intron being spliced out of the RNA at left. The ismediated
splicing
that recognize splice signals shown as yellow circles, squares and diamonds; the resulting lariat-shaped
RNA-protein complexes piece of RNA
is discarded. The ribosome, another RNA-protein is a transducer from triplets of RNA nucleotide bases to amino acids, shown as
complex,
red spheres. The cruciform are transfer RNAs that act as triplet recognizers (at their yellow end) and carry the corresponding amino
shapes
acids for the ribosome to add to the protein. Dependencies between segments of a string are shown in green; they are most obviously mani?
fested in the folding of protein and RNA. Part of the
newly synthesized protein is shown being clipped off in a process called post-transla
tional modification, which is another example of transformation. Even DNA often undergoes transformation, for example via the insertion
of transposable elements such as the small circular DNA molecule at the upper right, which is shown recognizing a specific site.
co?ingrregim w ^
stop-c?don \splice coding-region
codon-> lys I ?sn I tie I thr I met I $er I gin I his l arg I pro I
asp ! ijgiw I ala I gly I?aaJ I iyr \Jrp \cy$ \pne I leu
start-codon stop-codontaa I tag I tga
9. Partial grammar specifies some of the structure of a typical protein-encoding gene. The transcript, which is the part of the gene that is
Figure
RNA, has flanking 5'- and 3'-untranslated-regions, around a coding-region initiated by a start-codon. The coding-region
copied into messenger
consists of a co^ow followed by more coding-region, a recursion that ultimately terminates in a stop-codon. A stop-codon is any of the three
triplets indicated, whereas a codon is any of the 61 other possibilities, given in individual codon rules. Some of those rules refer to the
nucleotide classes purine and pyrimidine to capture variability in the third codon position. The rule for coding-region also allows for a splice at
or
any point in the recursion, resulting in the insertion of an intron. A series of context-sensitive splice rules allow introns to shift left right,
the fact that introns can appear within as well as between codons. An intron of this type is generally bracketed by splice signals
reflecting
as promoters of transcription
including gt and ag. This simple syntactic description omits many other imperfectly understood signals, such
usually found in the region upstream of the transcript.
Before proceeding furtherwith a lin? ed base uracil, so that theRNA alphabet In translation, RNA can be interpret?
guistic analysis of biological sequences, consists of a, c, g and u. Although RNA ed as a language of triplets, inwhich
itwould be well to briefly review the is single-stranded, base-pairing none? successive groups of three adjacent
actual structure and chemistry of nucle? theless has an important place in its bases?called codons?specify the se?
of amino acids in a protein. This
ic acids and proteins. The architecture chemistry: Complementary pairing be? quence
of DNA iswell known: Two strands of tween regions of the strand determines is the language that biologists had in
nucleotides, each thousands ormillions how the RNA folds up to form a dis? mind when they firstbegan to speak of
of bases long, twine around each other tinctive three-dimensional structure. a genetic code and of transcription and
in a double helix. The two strands are The RNA transcribed from most translation. Four letters taken three at a
held together by a specific pattern of genes is not an end product but rather time yield 64 possible codons, all of
hydrogen bonding inwhich g mates
serves as an intermediary, called mes? which are used, although there are only
with c and t with a.' Thus the strands senger RNA, which is subsequently 20 amino acids spelled out in the genet?
have complementary sequences? translated into protein. Proteins are an? ic code. It follows that the code must
wherever a g appears in one strand other class of linear molecules, but have a good deal of redundancy, where
theremust be a c in the other strand, they are assembled from a wholly dif? several codons all specify the same
and likewise every t requires a comple? ferent alphabet, namely 20 amino amino acid. A few codons serve as
mentary a. The strands are oppositely acids. Each sequence of amino acids marks of punctuation, signaling where
oriented, so that if one strand reads folds up to form a specific three-di? translation should start and end.
from left to right, the other reads from mensional structure, guided by chemi? The RNA polymerase molecule can
right to left.Such base-pairing is thekey cal interactions that are even more be viewed as a simple linguistic trans?
to the faithful replication of DNA, in complex than those observed inDNA ducer, which reads bases of DNA and
which the strands separate and then and RNA. The translation from RNA writes complementary bases in the
serve as templates fornew complemen? to protein is done by ribosomes, which slightly different alphabet of RNA. In?
tary strands. are large molecular assemblies made deed, the enzyme is remarkably ma?
When a gene is expressed, a region up of both protein and RNA; amino chine-like in its procession down the
along one strand of, the DNA is tran? acids are carried to the site of transla? DNA template and synthesis of RNA
scribed by the enzyme RNA Poly? tion by transferRNAs. The ribosomal output. Similarly, the ribosome is a
merase. The resultingmolecule of RNA RNAs and transferRNAs are examples transducer fromRNA toprotein. The ri?
is also a chain of nucleotides, and* it is of RNA molecules that are not them? bosome begins by scanning an RNA
complementary to the transcribed selves translated into proteins. Even so, molecule one base at a time, looking for
strand of DNA. RNA is chemically like proteins, they derive their func? the codon aug, which is the start signal
quite similar to DNA, except that tionality in large degree from the shape of the genetic code and which also spec?
thymine is replaced by the closely relat of their folded structure. ifies the amino acid methionine (met).
atgttcgaacat ^^^^^^^^^^^
? ?
caaatcgatcatcgaagagctcttgttg
Figure 10. Biological palindromes in the genome give rise to distinctive secondary structures in folded molecules. In double-stranded DNA
(far left), each g pairs with a c on the opposite strand, and each t with an a. When a substring of one strand appears on the other strand run?
ning in the opposite direction, the resulting pattern is called an inverted repeat. The symmetry of the pattern allows either of the strands to
fold and pair with itself (middle); the RNA can adopt the same stem configuration. The
corresponding single-stranded language of such bio?
logical palindromes is specified by a context-free grammar. (In real nucleic acids there is a loop of unpaired bases at the tip of the stem; the
grammar is easily extended to accommodate such features.) By adding a rule that doubles the start symbol, the grammar is able to generate
strings that form branched secondary structure (right), as is often found in RNA molecules. The parse tree for any derivation of these gram?
mars reflects the actual physical structure of the folded RNA and is shown here drawn within the structure.
On finding an aug, the transducer out? acid but instead terminate the transla? RNA. Even when the ribosome trans?
puts a met unit, then continues scanning tion process. ducer happens to find a valid start sig?
in a mode where it looks at groups of If these two transducers (as described nal in the correct reading frame, it can?
three successive nucleotides. For each so far) could be combined and set loose not be taken for granted that the
triplet, the transducer (with the help of on a real genome,
theywould eventual? resulting protein is an actual gene prod?
transferRNA "adaptors") adds a spe? ly translate into protein all the coding uct. In higher organisms genes are gen?
cific arnino acid to the growing protein regions present in the DNA. Unfortu? erally discontinuous, with long stretch?
chain; for example, gcg corresponds to nately, the transducers would also pro? es of noncoding verbiage thatmust be
alanine {aid), and aag to lysine dys). The duce an enormous volume of utter non? spliced out in a step called processing.
scanning continues until the transducer sense. One problem is that The meaningful regions that are pre?
although
comes upon one of the triplets uaa, uag every protein startswith a methionine served for translation are called exons;
or uga, which do not
specify an amino unit, not every methionine appears at the intervening excised regions are in
the start of a protein chain. Further? trons. During transcription the entire
more, each strand of DNA has three length of the gene, including both ex?
"reading frames," distinguished by ons and introns, is copied into RNA,
whether the transducer begins reading but processing must be completed, to
with the first, the second, or the third remove the introns,before theRNA can
nucleotide; since a DNA molecule is be translated into protein.
double-stranded, there are six reading The removal of introns is largely gov?
frames altogether. Each of the reading erned by specific sequences in theRNA
frames generates a totallydifferentmes? transcript. In some genes in some or?
sage, and with few exceptions only one ganisms these sequences participate in a
such transcript is valid. Thus, the actual characteristic folding of the RNA and
transducers must be guided to the cor? are sufficient to remove the intronwith?
rect sites and the correct reading frames out outside help. Protein-coding genes
by other factors. It isworth remember? rely on a more involved mechanism.
ing in this regard thatDNA is not sim? The intron signals are found at and near
ply an abstract string of symbols but the ends of the introns tobe spliced, and
rather a molecular object in a cellular they are recognized and bound by a
Figure 11. Cloverleaf pattern of transfer RNA context. For example, it spends much of
is an example of branched struc? complex of RNA and protein that su?
secondary its existence tightlywound around pro?
ture. The loop of the topmost stem includes a
pervises the precise removal of the in?
to a codon recognized
teins like thread on a succession of tron.The sequence at the upstream end
triplet complementary
on themessenger RNA. This codon specifies spools, and it interacts with a great of the intron includes, among other fea?
the amino acid carried at the bottom of the many other proteins at specific sites. tures, the two-base sequence gu. The
main stem of the transfer RNA. The reality is no less complex for downstream end of the intron has sev
A Grammar of Genes
gaatattcgaatattc
A grammar describing coding se?
as in the DNA gaatattcgaatattc
quences they appear
would capture the genetic code in a
straightforward manner, recursively
building up strings of codons beginning
with a met codon and ending with one
of the stop codons. The rules mapping
codons to amino acids would constitute
the lowest, lexical, level of the grammar.
gaatattcgaatattc
Intronsmight be inserted at any point
12. Ambiguity in the grammar gives rise to strings that have
during this accretion of codons, but Figure for nested palindromes
more than one parse with more than one secondary
tree, or molecules structure. A double
there is a complication: Processing of
theRNA isnot constrained to a particu?
inverted repeat can form a simple hairpin (upper left), an intermediate cruciform structure
(upper right) or a dumbbell (bottom). Although there is an unambiguous grammar for the
lar reading frame, so that it is quite pos?
language of general secondary structure, the ambiguous grammar may be preferable since the
sible foran intron splice site to interrupt alternative trees reflect alternative structures. Moreover, that
parse secondary any grammar
a codon. One way to accommodate of inverted repeats must be ambiguous.
specifically generates only adjacent pairs
such interpolations iswith context-sen?
sitive grammar rules that give rise to ticnucleotide sequence tata. In higher of examples called a consensus se?
movement of nonterminals in the de? organisms many other sequences may quence. From the other perspective,
veloping string. be present, such as a caat box further recognition models the action of a tran?
The part of the gene grammar deal? upstream?but with more variability? scription factor (forexample) in finding
ingwith translated sequences would be and gc boxes, which can be found in and binding to the appropriate se?
embedded in a higher-level rule for the multiple copies on either strand of the quence. Although grammars have their
entire RNA transcript, including tran? DNA. Promoter sequences are involved shortcomings in this regard, they also
scribed but untranslated regions at both in the binding of RNA polymerase at have some great strengths.
ends of theRNA; and the transcript rule the start of transcription. Their effec? A recurring theme in these recogni?
would in turn be embedded within a tiveness is greatly influenced by the tion regions is thatmore than one se?
rule of stillwider scope covering up? presence of other genetic elements quence comes into play at once. Tran?
stream and downstream control regions called enhancers, whose positions and scription factors seldom act alone;
associated with the gene. This rule orientations relative to the gene vary instead a number of them seem to be
structure allows for a natural hierarchi? enormously; theymay lie thousands of required towork cooperatively. Similar?
cal organization of our knowledge bases away, upstream or downstream. ly, in the processing of introns several
about themechanisms of gene expres? The variation in the sequences of pro? RNA-protein complexes bind different
sion,with detail always presented at its moters, enhancers, splice signals and so sites. Grammars, by their nature, de?
appropriate level. What ismost chal? on has made them hard to identify reli? scribe the relationships ofwords, and of
lenging at thispoint is the incorporation ably. Most of them have been discov? categories of words, and of categories
of grammar elements for the subtle sig? ered either by noting similar sequences of categories. This last point is impor?
nals controllingwhich potential coding at similar positions relative to many tant because some transcription factors
regions are expressed as genes, and genes, or else by direct evidence that recognize not a DNA sequence directly
how they are processed. they are binding sites of other mole? but rather other transcription factors al?
Important features that signal the cules such as transcription factors, the ready bound to specific sequences. The
presence of a protein-coding gene, proteins thatmediate gene expression. picture that emerges is of a complex of
which are collectively called the pro? In either case, the linguistic problem is transcription factors being modeled by
moter region, lie upstream of the tran? one of recognition. From one perspec? a parse treedescribing the organization
script itself. In this region is a sequence tive, the challenge is to recognize a of those factors and of the sites towhich
called the tata box, after its characteris "word" similar to a statistical aggregate they bind on the string ofDNA.
13. Attenuators are regulatory mechanisms that are thought to depend on alternative secondary structure for their operation.
Figure They are found
upstream of certain bacterial genes coding for proteins that help tomanufacture amino acids. When RNA polymerase to transcribe this
begins
upstream region, ribosomes immediately attach to the RNA and begin to translate the sequence. If the amino acid to be synthesized is present in
abundance (and thus the corresponding transfer RNA is abundant as well), the first ribosome reads through a group of codons for that amino acid
and into a region capable of forming either of two alternative
hairpins (left). The ribosome obstructs the first part of this region and thus favors the
formation of the second stem. Formation of the second hairpin sends a
signal that causes the RNA polymerase to cease transcribing and to fall off
the DNA. This shuts down gene expression and the synthesis of the amino acid (which was If the amino acid is scarce, the ribo?
already abundant).
somes cannot read
through the codons for that amino acid (right); they stall upstream and allow the first alternative hairpin to form instead of the
second. This event in turn allows the RNA polymerase to proceed and manufacture the protein thatwill alleviate the shortage of the amino acid.
add to the existing productions a new gram at right, results when one side of a stem resides within the loop of another stem. Only
the context-free part of the pseudoknot grammar is given; the complete grammar also includes
one, stating that S -> SS. Duplicating
all the context-sensitive rules shown above in the grammar for direct repeats. The pseudoknot
the start symbol plants the seed of a
grammar generates an idealized pseudoknot language, without any unpaired bases. Like the
new palindrome
anywhere within an direct-repeat grammar, it "skips over" the nonterminals produced by the Q rules and gathers
existing palindrome. It can be shown them up at the position of the X in the parse tree. Tracing the terminal string around the parse
that this is a completely general gram? tree in this manner produces the topologically structure shown at the
equivalent secondary
mar describing structures of branched left, and preserves the correct base-pairing dependencies.
Figure 16. Evolutionary"operations/' such as the translocation or inversion of Specific secondary structures are
genetic
characteristic of ribosomal RNA and
sequences, may promote a language to a higher level in the
Chomsky hierarchy. The strictly
nested dependencies of the palindrome at the center become crossed dependencies when a transferRNA and other forms of RNA
translocation occurs, as in the word re-ordering at the top. This creates a pseudoknot-like that are not translated into protein. In
structure. An inversion within the palindrome, as at the bottom, may create a direct
repeat, protein-coding genes the evidence for
also with crossing dependencies. such conserved syntactic structures is