Sunteți pe pagina 1din 6

Lambek Grammars Are Context Free

M. Pentus Department of Mathematical Logic Faculty of Mechanics and Mathematics Moscow State University Moscow, RUSSIA, 119899
Abstract
In this paper the Chomsky Conjecture i s proved: all languages recognized by the Lambek calculus are context free.

Introduction
The notion of a basic categorial grammar was introduced in 1 1 In the same paper it was proved that 1. basic categorial grammars are precisely the contextfree ones. Another kind of categorial grammars was introduced by J. Lambek [SI. These grammars are based on a syntactic calculus, presently known as the Lambek calculus (cf. [2] for its semantic interpretations). Chomsky [SI conjectured that these rammars are also equivalent to context-free ones. Inq7] Cohen proved that every basic categorial grammar (and, thus, every context-free grammar) is equivalent to a Lambek grammar. He also proposed a proof of the converse. However, as pointed out in [3], this proof contains an error. Buszkowski proved that some special kinds of Lambek grammars are context-free [3, 4, 51. These grammars use weakly unidirectional types or types of order at most two. The main result of this paper (Theorem 2) says that Lambek grammars generate only context-free languages. Thus they are equivalent to context-free grammars and also to basic categorial grammars.

II+A\B

(-+\)
II A+B

A IIjB

where II is not empty

(-+/)

where II is not empty

II-+B/A

1
1.1

Preliminaries
Lambek calculus
The cut-elimination theorem for this calculus is proved in [8]. We write L F r-+A if the seuuent r - + Ais derivallle in the Lambek calculus. Definition. The length of a type is defined as the total number of primitive type occurrences in the type.

We consider the syntactic calculus introduced in [8]: The types of the Lambek calculus are built of primitive types p l ,p 2 , . . ., and three binary connectives *,\,/. We shall denote the set of all types by T p . Capital letters A , B ,...range over types. Capital Greek letters range over finite (possibly empty) sequences of types. Sequents of the Lambek calculus are of the form r+A, where I? is a nonempty sequence of types. Axioms: pi-+pi Rules:

IlPill * 1
1.2

llA*Bll = llA\Bll = IWBII

IIAII+II~~ll

Lambek grammars and context-free grammars

Definition. We assume that a finite alphabet 7 and a distinguished type D are given. A Lambek grammar

429

1043-6871/93 $03.00 0 1993 IEEE

Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

is a mapping f such that, for all t E 7 ,f(t) c Tp and f(t) is finite. The language generated by the Lambek grammar is defined as the set of all expressions tl . . . t n over the alphabet 7 for which there exists a derivable sequent B1 . . . Bn+D such that Bi E f t i ) for all i In. We shall denote this language by L [ 7 , D , f).

The theorem will be proved in Section 3.

Lemma 1 Lcut; t @+A if and only if Lcut, I- @+A.

Definition. We assume that two disjoint alphabets 7 and W are given. The elements of 7 are called terminal symbols and those of W are auxiliary symbols. A context-free rewrite rule is of the form X e, where X is an auxiliary symbol and e is a word in the alphabet 7 U W . A context-free grammar is a finite set R of contextfree rewrite rules, with one auxiliary symbol S designated its start symbol. By G ( 7 , W, S,RJ we denote the set of all expressions over the alpha et 7 U W that arise through some finite sequence of rewritings of the start symbol S via the rules of a. The language generated by the context-free grammar is defined as ~ ( 7 ,S, a) G ( 7 , W , a) 7+, W, s, n
where 7+denotes the set of all nonempty expressions over the alphabet 7.

PROOF. The 'only if' 'part is obvious. The 'if' part is proved by induction on the length of tohe Lcut;-derivation of the left premise of a cut. A derivation of the form

will be rearranged in the following way.

I'BA+A @+B

@All-C
(CUT) (CUT)

QI'BAII+C OI'@AII+C

Main result

Theorem 2 For any Lambek grammar there exists a context-free grammar such that the languages generated by these grammars coincide.

In this section we show that every language recognized by a Lambek grammar can also be generated by a context-free grammar. The crucial point is that every sequent B1 . . . B,+D derivable in the Lambek calculus follows immediately (i.e., by means of the cut rule only) from some short derivable sequents containing at most three types each, where none of the types is longer than the longest type in B1 . . . Bn+D (cf. Theorem 1). The proof of Theorem 1 will be carried out later in Section 3. In order to formalize the notion of immediate consequence, we introduce for each natural number m two calculi Lcut, and Lcut; .

PROOF. a fixed alphabet 7, designated type D, If a and a mapping f are given, then the set of types relevant in the definition of L 7,D, f ) is finite. Idet m be the maximum of the ( engths of these types. l Then I ID1 I 5 m and, for any t E 7, for any B E f (t), IlBlI Im. The set of primitive types involved in the grammar is also finite. Below we shall consider only types consistin of these primitive types. We taie as the alphabet of auxiliary symbols W the set of all types not longer than m (and containing only relevant primitive types).

Definition. A sequent A1 . . . An+B is an axiom of Lcut, iff (1)

+{

A E TP I IlAllS m}

5 2;

( 2 ) the sequent A1 ... An+B is derivable in the Lambek calculus;

We take the distinguished type D as the start symbol of the context-free grammar. The set R consists of obvious rules describing the mapping f and of Lcut,-axioms with the sequent isrrows reversed.

( 3 ) IlBll

5 m and IlAill 5 m for all i 5 n .

{ B a t

t E 7 and B E f(t)}U

The only rule of Lcut, is (CUT).

Definition. The calculus Lcut, has the same axioms as Lcut,. The only rule of Lcut; is (CUT) with the restriction that the left premise @-+Bmust be an axiom of Lcut; . Theorem 1 Lcut, t B1 . .. Bn+D i and only if [[Bill5 m for all i 5 n, llDl{< m, and L I- B1 .. . Bn+D.

U { A a BC I A , B , C E W and L l - B C+A}U U { A = > B I A , B E W and L l - B + A } First, we prove that L ( 7 , D, f ) c G ( 7 ,W, D,X). Suppose that tl . . .tn E L 7,D, f ) . According to the definition of L(7,D,f)ere are types B1,. . .,,3,, t such that L I- Bi . . . Bn+D and Bi E f(tj) tor all i 5 n . By construction, Bi =>ti 72. for all i 5 n. Thus it suffices to prove th.at B 1 . . .Bn E C ( 7 , W , DYE).

430

Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

In view of Theorem 1 and Lemma 1, Lcut, I- B1 .. . B,+D. Straightforward induction on the length of a L c u t i derivation shows that if Lcut, I- B1 . . . B,+D then ... We extend the mapping f to the set T U h) stipulating f ( B ) = { B } for all B E W . Easy induction on the number of rewritings establishes that if an expression X . . .Xn over the alphabet 7 U W 1 belongs to G ( 7 , W ,D , 72) then there are auxiliary symbols B 1 , . . .,B, such that Bj E f(Xj) for all i 5 n, and Lcut, I- B1 . . . B,-,+D. In particular, if t 2 , . . . , t , are terminal symbols and tl . . . t n belongs to G ( 7 , W ,D , R ) then t i . . .tn E C(7,D , f). W

(1) I'-A is derivable in the Lambek calculus;

(2) ui(I'A) = 2 for all primitive types pi occurring in r+A. Lemma 3 If L I- I' 0 A - C then there is a type B (an interpolant for 0 in I' 0 A-G) such that
(i) L t - 0 - B ; (ii) L I- I' B A-&'. (iii) ujB 5 min(uj@,uj(I'AC));

W ,D , R ) C C ( 7 ,D , f).

Proof of Theorem 1

Let FG stand for the free group generated by all primitive types { p i i E N}. The identity element will be denoted by c. or any element U E FG, we write 1111 for the length of U written as a reduced word, i.e., a word that does not contain any fragments of the form p i p i or p; ' p i .

PROOF. This lemma is a slight modification of the Interpolation theorem for the Lambek calculiis proved by Dirk Roorda in [lo, p. 841. The only difference is that Roorda allows also sequents wi1,h empty antecedents to occur in derivations. We omit the straightforward proof by induction on a cut-free derivation of I' 0 AhC. W Lemma 4 If a sequent I' 0 A+C is thin, then there as a type B such that
(i) the sequent 0 + B is thin; (ii) the sequent I' B A+C i s thin; (iii) IlBll = I[0il, i.e., the length of B equals t o the length o f t e reduced word for the free group izsterpretation of 0 .

Definition. The free group interpretation of types (written as [ ]) is the following mapping of types into FG.

Bpi] [A*B] [A\B] [A/B] [A1 . . . A n ]

* * * + *

pi

[A]o[B] [A]-lo[B] [A]o[B]-' [AI] o . . . o [An]

PROOF.

Lemma 2 If a se vent I'+C is derivable in the Lam= bek calculus, then

iI'] [a.

taining at most one occurrence of every literal. For any i ,

D. Roorda obtained this result in terms of atomic markings. The lemma has also an immediate proof in the free group environment [9].

~ j ( 0 B= ai0 )

+ u ~ B uj(I'0AC) + u ~ B 2 + 1. 5 5

Definition. For each natural number i we define the counter of occurrences of the primitive type pi as follows.
~ i p i

+1

uipj

+ 0,

if i # j

u ~ ( A * B ) .j(A\B) =

= u j ( A / B )+ u ~ + u ~ B A

We extend this definition to finite sequences of types:

~ j ( A.l. . A , )

uiAl+ . . .

+ UjAn

Remark. If a sequent I'-A is derivable in the Lambek calculus, then for any i, uj(I'A) is an even number. Remark. IlAll =
i

Since u i ( 0 B ) is even, we conclude that u j ( 0 B ) is either 0 or 2. This proves (i). The claim (ii) is proved similarly. = According to Lemma 2 [ r ] [ @ ] [ A ] [Cg,whence [a] = [I']-l[Cg[A]-l. conclude that the reduced We coincide. words for [a] and [r]-'[q[A]-l If uj0 = 0 then the letter pi does not occur in [a]. If ai0 = 1 then there is exactly one occurrence of pi in [a]. If ui@= 2, then ui(I'AC) = 0, whence pi doEs not occur in 1I'I-l Cg[A]-' and conse uently it hzs no occurrences in t e reduced word for We have verified that the reduced word for [a] contains exactly the literals that occur in the type B . We also see that no literal has more than one occurrence I in the reduced word. This proves statement (iii). I

I@].

ujA

Lemma 5 If no type in a thin sequent A1 . . . An-& is longer than m, then Lcut, I- A1 . . . An-&.
To prove Lemma 5, we need some facts aboit lengths of reduced words in a free group.

Definition. A sequent I'+A is thin iff

431

Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

Lemma 6 I u,v, w E F G f 1uvw1 > 1vw1.

and IuvI >

[VI,

then

PROOF. Given two reduced words U and v, there exist reduced words U , b, c such that U = ab-l, v = bc, uv = ac, and the words ab-l, bc, ac are reduced. Similarly for v and w there exist reduced words d, e ,and f such that v = de, w = e-l f , vw = df, where de, e-l f, df are reduced. We consider two cases.

PROOF. Induction on n. The case n = 2 is obvious, since Ju1u21= = 0. Now we prove the lemma for n 1 words u1,. . . , un-1, v , w, assuming that it holds for any 36quence of n words. Suppose that IUn-1VI > max(lun-iI,IvI) and lvwl > max(lv1, lwl). By Lemma 7,

Iun-1vwJ

> max(lun-11,1~~1).

111)

CASE lbl 5 Id1 1: Evidently d = bg, where g is a reduced word.

Obviorly e, 1 9 I I lvwl = PI+ !SI I I, and uvl = .1 f uvwl = a1 + gl + f l . The assumption of the lemma implies

, ,

uvw = ab-l bge e-l f u v w

vvv

+ +

Applying the induction hypothesis for u1,. . . ,un-l, U,, where un = vw, we find a number k < n such that I U k U k + l l 5 max(lukI, Iuk+ll). In view of (l), # n - 1. This completes the proof. W E PROOF LEMMA Induction on n. If n 5 2 then OF 5. A1 . . . An+C is an axiom of Lcut,. Assume that n > 2. In view of Lemma 2, [AI] . . . [An][Cg-l = E. We apply Lemma 8 for u1 = [All, 9 un = [An], un+l = gC1-l. Evidently Iuil 5 m for all i 5 n 1. According to Lemma, 8 there is a number k 5 n such that Iukuk+l I 5 m. The following two cases arise.

whence 1.1

> lbl. Thus

CASE E < n 1: This means that I[AkAk+l]l for

A1 . . .Ak-1 AkAk+1

5 m. Applying Lemma4

. . .An -6'
A

CASE2: Ibl > ldl Evidently b = dh, where h is a reduced word. uvw = ah-ld-'
u

dhc c-'h-'f
v w

Obviously uvw = ah-'f and ah-'f is a reduced word. The assumption of the lemma entails

we find a type B such that Ak Ak+I+B is thin, A1 . . . Ak-1 B Ak+2 . . . An+C is thin, and IlBll = I[AkAk+l]l = lukuk+l.l 5 m. Note that Ab-1 A k - + B 1s an axiom of LCUtm. Now we use the induction hypothesis for the sequent A1 . . . At-1 B &+a . . . An+C and .&;er that apply the cut rule. CASE2: k = n This means that IIAn][C]-ll 5 m. Applying Lemma 4 for

--

A1 .. .An-1 An +C

A
is thin,

we find a type B such that A1 . . . An-1-B B An+C is thin, and llBll= l[Al.. .An-i]I. In view of Lemma 2,

Lemma 7 If U, v, w E F G , IuvI > max(lu1, [VI lvwl > max(lv1, Iwl), then 1uvwI > max(lu1, Ivw
PROOF. verify that luvwl > 1vw1 and IuvwI > .1 We First, luvl > 1. implies luvwl> Lemma 6. Dually, lvwl > Iw Thus in view Of luvl > ld9 luvw/> .1.1

[Al.. .An-1] = ([An][C]-l)-l

= (unUn+l)-l.

Lemma 8 I f u1,. . .,yn E F G , n > 1, and u1. . .U, = E, then there i s a number k < n such that Iukuk+ll 5 ma(lukl, Iuk+ll)432

Thus llBll= )(unUn+1)-ll= Iunun+11 5 172. Hence B An+C is an axiom of Lcut,. By the induction hypothesis, Lcutm I- A1 . . . An-l+B. This completes the proof of Lemma 5. W We prove now that Lcutm I- B1 . . . Bn+D if and only if IlBill 5 m for all i 5 n, llDll 5 m, and L I - B l ... Bn-D. PROOF THEOREM The 'only if' part is obvioiis. OF 1. To prove the 'if' part, we assume that IlBill 5 m for all i 5 n, IlDll 5 m ,and L I- B1 . . . Bn+D.

Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

We introduce a new primitive type for each instance of an axiom in the derivation of B1 . . . Bn+D and replace all occurrences of old primitive types in the derivation by corresponding new ones. We obtain a derivation of a sequent B 1 . . . Bn+D, where ai(!, . . . BnB)= 2 for all primitive types pi occurring in ~1 . . . &+D. In view of Lemma 5, Lcut, I- 131 . . . kn+fi. ~ e placing new primitive types by corresponding old ones in this Lcut,-derivation, we obtain a Lcut,derivation of B1 .. . B n - + D . This completes the proof of Theorem 1.

[3] W. Buszkowski. The equivalence of unidirectional Lambek categorial grammars and contextfree grammars. Zeitschrifl f i r mathematische Logik und Grundlagen der Mathematik, 31:36!)384, 1985. [4] W. Buszkowski. Generative power of categ> rial grammars. In R.T. Oehrle, E. Bach, and D. Wheeler, editors, Categorial Grammars and Natural Language Structures, pages 69-94, Reidel, Dordrecht, 1988. [5] W. Buszkowski. On generative capacity of the Lambek calculus. In J. van Eijck, editor, Logics in AI, pages 139-152, Springer, Berlin, 1991. [6] N. Chomsky. Formal properties of grammars. In R.D. Luce et al., editors, Handbook of Mathemailical Psychology, vol. 2, pages 323-418, Wiley, New York, 1963. [7] J.M. Cohen. The equivalence of two concepts of categorial grammar. Information and Control, 10:475-484, 1967. [8] J. Lambek. The mathematics of sentence structure. American Mathematical Monthiy, 65(3):154-170,4958. [9] M. Pentus. Equivalent Types in Lambek Calculus and Linear Logic. Preprint No.2 of the Department of Math. Logic, Steklov Math. Institute, Series Logic and Computer Science, Moscow, 1992.

Acknowledgements
I would like to thank Prof. S. Artemov for guiding me into the subject, pointing out the most important problems, and running a seminar at Moscow University, which provides a proper environment to approach these problems. I am grateful to Prof. M. Kanovich, who teaches the formal grammars and has made several useful comments on the subject of this paper. I also wish to thank L. Beklemishev, V. Krupski, and N. Pankratiev for checking the proof and making a number of valuable suggestions.

References
[l] Y. Bar-Hillel, C. Gaifman, and E. Shamir. On categorial and phrase-structure grammars. Bull. Res. Council Israel Sect. F, 9F:1-16, 1960.
[2] J . van Benthem. Language in Action. NorthHolland, Amsterdam, 1991.

[lo] D. Roorda. Resource Logics: Proof-theoretical Investigations. PhD thesis, Fac. Math. and Comp. Sc., University of Amsterdam, 1991.

433

Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

Lambek Grammars Are Context Free


M. Pentus

(See Addendum, page 429,)

Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:13 from IEEE Xplore. Restrictions apply.

S-ar putea să vă placă și