Sunteți pe pagina 1din 89

A Minimalist Theory of Human Sentence Processing By Amy Weinberg Linguistics Department/UMIACS University of Maryland weinberg@umiacs.umd.edu I.

Introduction Research in the theory of human sentence processing can be characterized by 3 styles of explanation. Researchers taking the first track have tried to motivate principles of structural preference from extralinguistic considerations like storage capacity in working memory, or bounds on complexity of incremental analysis. Frazier and Rayners (1982) Minimal Attachment and Right Association principles, and Gorrells simplicity metric, are examples of this type of theory. The second track eschews "parsing strategies", replacing them with a fairly complex tuning by speaker/hearers to frequency in the hearer's linguistic environment. The difficulty of recovering an analysis of a construction in a particular case is a function of how often similar structures or thematic role arrays appear in the language as a whole. The work of Trueswell et al (1994), Jurafsky (1996) and MacDonald et al (1994) are examples of frequency or probability based constraint satisfaction theories. The third track takes a more representational view and ties processing principles to independently needed restrictions derived from competence and language learning. This approach claims that the natural language faculty is extremely well designed in the sense that the same set of principles that govern language learning also contribute to a theory of sentence processing. This track is represented by the work of Gibson (1981), Gorrell (1995) Pritchett (1992), Philips (1995, 1996) and Weinberg (1992), who argue that processing can be seen as the rapid incremental satisfaction of grammatical constraints such as the Theta Criterion, which are needed independently to explain language learning or language variation. A variant of this approach, represented by Crain and Steedman (1985) among others restraints the grammatical source for parsing principles but locates these principles within a discourse or semantic, rather than a syntactic component. This paper proposes a model of the last type. We argue that a particular version of the Minimalist Program (Chomsky (1993), Uriagereka (this volume)) provides principles needed to explain both initial human preferences for ambiguous structures and provides a theory of reanalysis, explaining when initial preferences can be revised given subsequent disconfirming data, and when they lead to unrevisable garden paths. We will then argue that this type of theory is to be preferred to theories motivated on extralinguistic principles. In the first section of this paper we discuss the Minimalist Theory of syntax upon which we will base our parsing proposals. Features that distinguish this theory from precursors are: (1) The theory is derivational, providing principles for how an analysis is constructed rather than filtering conditions that constrain output representations. The main derivational constraints are the so-called Economy Conditions (Chomsky 1993). (2) The theory applies constraints strictly locally. Derivations are evaluated at each point in the analysis. They are optimized with respect to how well they satisfy constraints of a given item that is a candidate for integration into the structure at each point. How a proposed structure satisfies constraints imposed by the derivation as a whole is irrelevant.

(3) The theory incorporates a claim about a one to one mapping between precedence order and structural hierarchy or dominance that is embodied in the Linear Correspondence Axiom (Kayne (1994), Uriagereka (this volume)). Next, we show how to interpret Minimalist principles as a parsing algorithm. We will show that the Economy conditions below define a crosslinguistically attested theory of preference judgments. (2) and (3) combined distinguish cases where an initial preference can be reanalyzed from those cases where reanalysis into the appropriate structure is impossible with a resulting garden path. The next section compares our models with Colin Phillips model of sentence processing. Phillips shares our view that principles of grammatical theory should form the basis of the theory of sentence processing. The processing principles that he invokes are based on a slightly different grammatical theory one that he claims is identical to the theory of linguistic competence. We will first discuss what we see as strengths of his theory and then discuss three types of problems with his approach. The final section argues that this type of theory has advantages over theories relying on extralinguistic frequency or parsing strategy principles. II. Some Minimalist Assumptions: Readers of this volume are already familiar with many of the features of the minimalist system. We provide a brief review here of the features that are important for the construction of our parsing algorithm. The two most salient features of this system are its derivational character and the role that Economy conditions play in regulating possible derived structures. At least at the level of competence, the model has moved away from the overgeneration and filtering character of its Government and Binding precursor. Structures that do not pass the Economy conditions are simply not generated. The two major grammatical operations (Merger and Movement), used to generate structure are seen as feature checking. Categories are input from the lexicon with features such as Case and theta role that have to be checked. Checking is satisfied when a category needing a feature is in construction with some other element that can supply that feature in the sentence. Movement or merger operations are only licensed if they allow feature checking to occur. Movement or merger serve to allow an element to transfer a feature necessary to satisfy some constraint. The relevant conditions that rule out overgeneration are the following: (4) Last Resort: Operations do not apply unless required to satisfy a constraint. A minimal number of operations is applied to satisfy the constraint. (5) Greed: "The operation cannot apply to a to enable some different element b to satisfy its properties...Benefiting other elements is not allowed." III. Multiple Spell -out: A corollary assumption that has been incorporated into the Minimalist program has been the derivation of a correlation originally due to Kayne (1994). Previous grammatical formalisms had argued that restrictions on linear precedence and immediate dominance were the product of two separate subsystems. Kayne (1994) suggested that these two systems were linked, and that one could derive precedence order from information about dominance. This conjecture is known as The Linear Correspondence Axiom (LCA) given in (6). (6) LCA: Base Step:

If (a) precedes (b), then (a) c-commands (b). Induction Step: If (g) precedes (b), and (g) dominates (a), then (a) precedes (b). C-command is defined as in Epstein ( this volume, repeated below) (7) (a) c-commands all and only terms with which (a) was paired by Merge or Move in the course of the derivation. (8) illustrates the relationships licensed by these definitions. (8) IP

DP D NP

I' I VP

the man tense slept The precedence relations among elements in the subject are licensed because the determiner ccommands and precedes the NP (man). The second part of the definition is needed since the terminal elements in the subject position did not directly combine with the elements in the VP by either Merge or Move. Therefore they do not c-command these VP elements even though the terminals in the subject precede those in the VP as required by the base step of the LCA. Their presence is allowed however, by the second clause in the definition because the DP dominating both these terminals precedes the VP and dominates both the determiner and the NP, which inherit precedence by a kind of transitivity. Uriagereka, (this volume) argues that the base step of the definition follows from the kind of "virtual conceptual necessity" inherent in the Minimalist program. The simplest kind of mapping between precedence and dominance is one to one, and therefore we might expect a grammar that specifies linear and dominance order to have this simplifying restriction (see Uriagereka (this volume) for details). We cannot so derive the induction step, which appears only to allow terminals in a c-command relation to co-exist in a structure. General goals of the Minimalist program, which try to derive features of the grammatical system from "virtual conceptual necessity" force us to either derive the induction step from other considerations, or eliminate it from the system. Uriagereka adopts the latter course. Uriagereka (this volume) claims that we can maintain the simple relationship between command and precedence given by the base step in (6) if we allow the operation of Spell-Out to apply many times during the course of the derivation. Spell-out is the operation that removes material from the syntactic component and feeds it to the interpretive components of Logical Form (LF) and Phonetic Form (PF) when that material is ready for interpretation. Uriagereka points out that since the minimalist system dispenses with a global level of s-structure as the conduit to the interpretive components, there is nothing to stop material from being passed for interpretation multiple times.

We assume that spell-out applies whenever two categories cannot be joined together by the Merge operation. If Merge doesn't apply then the category currently being built is spelled out or reduced. We retain the notion from earlier theories of grammar that Spell-out is a conduit between the syntax and the phonology. It is well known that the constituency established by the syntax is not relevant for phonological processes. Spell-out, turns a syntactic structure with relevant constituent relationships into a string ready for phonological interpretation. Uriagereka uses "Spell-Out" as a repair mechanism to retain one to one correspondence between domination and precedence. He assumes that both precedence and dominance must be established between terminal elements at all points of the derivation. Precedence implies merger and merger is only possible when a chain of domination can be established. When merger is not possible, the string is linearized (turned into an unstructured string where only previously established precedence relations are preserved). Since the elements that have been linearized are invisible in the syntax, precedence does not have to be established between them and other items in the structure. Thus, when two categories cannot be combined through merger or movement (the only syntactic operations) to form a dominating category, the material that has been given structure so far is "spelled out" or linearized. ( 9) "L is an operation L(c)=p mapping command units"(units that can be formed through merger( ASW) c to intermediate PF sequences p and removing phrasal boundaries from c representations" Uriagereka (this volume) This idea preserves the one to one mapping between precedence and dominance but at the cost of ever building a single phrase marker. Instead one builds blocks (Uriagereka calls them "command blocks") where all elements stand in a c-command relation to each other. When this c-command relation is interrupted, the unit is spelled out, with an unstructured unit shipped to the phonology for phonological interpretation and a structured unit shipped to LF (logical form) for semantic interpretation. The result of "Spell-Out" is an unstructured string (a syntactic word) with no further internal phrase structure. Within the context of the Minimalist system, "Spell-Out" is a grammatical operation, on a par with movement transformations. As such it is governed by conditions on transformations, in particular by the Economy Conditions discussed above. This economy condition establishes a preference for derivations which utilize the fewest number of operations possible. An operation is applied only to satisfy some independent grammatical condition. In this case, this means that we will Spell-Out or linearize only when we could not otherwise establish a chain of precedence. III. Minimalist Principles as a Parsing Algorithm We will now apply a theory incorporating economy conditions and multiple spell-out to parsing. We assume that the algorithm applies left to right and evaluates ambiguities with respect to the economy conditions. As in minimalist theory, items are inserted into the derivation (or moved) with the goal of checking features. The feature checking aspect of the theory will impose an argument over adjunct attachment preference along the lines of Pritchett (1992) and Gibson (1991) on the assumption that theta roles are relevant features for checking. Attachment as an adjunct will never lead to receipt or transfer of theta, case or other features, whereas insertion into an argument position will allow this transfer to occur. We will see that this preference is well attested. Unlike Pritchett (1992) and Gibson (1991), feature transfer is optimized locally. Pritchett and Gibson allowed the parser to scan the entire derivation at of an items attachment and to compare whether the attachment of a category optimized the assignment of features over all elements of the tree built so far. By contrast, since feature checking is subject to Greed in the Minimalist system, this theory only allows optimal feature checking on the particular category that is being attached irrespective of whether this optimizes feature checking across the derivation as a whole. We will see that this is crucial for some of our examples below.

Insertion or movement is governed by the Economy Conditions discussed above. The preference to attach a category using minimal structure follows immediately from this notion of Economy. At each point a category is inserted using the least number of operations necessary for feature transference or merger. This ban on unnecessary operations subsumes Frazier and Rayner (1982)s, minimal attachment and Gorrells (1995) simplicity condition with the advantage of following from independently motivated grammatical principles. Following Uriagereka, we assume that Spell-out occurs whenever a derivation would otherwise violate the LCA (now containing only the base step). The spell-out conditions thus also provide us with an independently motivated theory of reanalysis. If a preferred reading induces a precedence/dominance mismatch, the category that precedes but does not dominate will be spelled out. Again following Uriagereka, this means that the material inside the spelled out category is linearized and all internal syntactic structure is removed, creating a nondecomposable syntactic word. Given this, reanalysis from the preferred to dispreferred reading that requires either extraction of material from, or insertion of material into this syntactic word, will be impossible. As a lexical item, the spelled out material is an atomic unit, which can no longer be decomposed into its component pieces. If however, reanalysis occurs within a domain where Spell-Out has not applied, then material can be accessed and the preferred reading can be transformed into the dispreferred structure. Incorporating Spell-out and Economy conditions into the grammar also explains the preference for right branching derivations without the need for extra explicit principles which favor this type of derivation. As a grammatical operation, Spell-out is governed by Economy. Since it does not allow the checking of any features it is an operation of the last resort. As such, it will only be invoked when no other feature checking operation can apply and the minimal number of spellouts to guarantee satisfaction of the LCA will operate at each time step in the derivation. A right branching structure insures that an element that proceeds will also dominate a category and thus minimize the need for Spell-Out. Therefore, right branching structures will be preferred because they economize on the need for spell-out. The algorithm in (10) embodies these principles (10) A derivation proceeds left to right. At each point in the derivation, Merge using the smallest number of operations needed to check a feature on the category about to be attached. If Merger is not possible, try to Move within the current Command path. If neither merger nor movement is licensed, Spell-Out the command path. Repeat until all terminals are incorporated into the derivation. IV. Some Cases: (i) Argument/Adjunct attachment ambiguities: These cases illustrate the role of optimizing feature checking relative to Economy conditions. In all cases, attachment as an argument is preferred because it allows assignment of features.

(a) Direct object/complement subject ambiguity: The sentences and the relevant structures are given in (11).

(11) a. The man believed his sister would win the Nobel Prize. b. The man believed his sister. c. VP d VP V DP V CP believe D NP believe IP his N DP sister D NP his sister The DP his sister will be assigned both case and theta features by the preceding verb if it is attached as the direct object. Case and theta features can only be assigned by the Case and Theta assigner, the Head of the complement clause. Since this category has not yet been processed, no features will be assigned by an attachment of his sister as the subject of the complement clause. Therefore (c) is the preferred structure. It is also the structure that is the most economical, involving fewer operations, although this is not a crucial determinant of attachment for this case. In neither case is Spell-Out necessary at the site of attachment of his, sister. Notice that the attachment motivated by the desire to check features does not cause any spell-out within the VP. Both the verb and the object are available when the embedded verb is encountered in a case like (a). Therefore, the object NP is available for reinsertion as the embedded subject in (d) even though the initial structure chosen for this case is (c). All elements remain on the command path. b. Preposed object/matrix subject ambiguity. Next consider (12) where there is a preference to treat the word following the first verb as an object in the preposed adverbial, rather than as the subject of the matrix sentence. (12) a. After Mary mended the socks fell off the table. After Mary mended the socks they fell off the table. Again, incorporation as an object allows case and theta features to be checked off from the phrase the socks. Incorporation as the matrix subject does not allow any case or theta feature checking, againbe cause the case and theta assigning head of the IP has not yet been incorporated into the structure. The relevant structures are given in (13). (13)

a. IP b. IP PP DP P P P IP D NP P IP the socks after DP VP after DP VP Mary V DP Mary V mended D NP mended the socks We do not expect reanalysis to be possible given the algorithm (10). After building the optimal structure in (13) , the phrase fell cannot be incorporated into the preposed adverbial clause. A globally optimizing algorithm might look to see what series of transformations could be made to incorporate this category. However, our algorithm is a dumb one that acts only to incorporate local material. Since the second verb phrase phrase cant incorporate into any node within the preposed adverbial, the adverbial is spelled out in a phrase by phrase manner, leaving the structure in (14). This structure respects the LCA. (14) IP AP fell ## After Mary sewed the socks ## However, there is no way to incorporate the structure into this remnant either. The preceding material has been spelled out and so there is no way to retrieve anything from this phrase to be inserted as the necessary matrix subject. Since no further operations apply, and there is remaining unincorporated terminal material, the parse fails and a garden path is detected. c. Ditransitive/complex transitive object ambiguity: (15) a. John gave the man the dog for Christmas. # b. John gave the man the dog bit a bandage. The preferred reading for (b) is to treat the dog as a ditransitive object as in (16) as opposed to treating this category as the subject of a relative clause modifying the man as in (17) (16) VP V VP give i DP V

the man V DP ei D NP the dog

(17) VP VP V DP V gavei DP CP V the man C IP DP the dog ei Clearly (17) is more complicated and requires more mergers than (16), violating Economy. This is again not crucial because the analysis as an indirect object allows features to be checked on the DP the dog while attachment as material in the matrix subject does not allow feature transference. Reanalysis is not possible in this structure. We crucially assume the Larsonian shell structure in (16) to explain why. Reanalysis would involve incorporation of the category in the indirect object position originally part of the relative clause on the direct object. This however, cannot be accomplished while the trace of the moved V remains in the structure because a relative clause inside the direct object would not command the verb trace. Therefore, maintenance of the terminals of the preceding relative and the verb trace in the same tree would violate the LCA. Therefore, the V in (17) must be spelled out. If this category is spelled out however, there is no host site for subsequent attachment of the true indirect object because all structure under the V node is no longer accessible. d. Subcategorized PP/NP modifier ambiguities: There is a preference to treat the PP on the table as an argument of the verb put rather than as a modifier of the NP the book. We will assume (non crucially) the Larsonian analysis of PP complements as well. Whatever the structure is, the attachment as an argument allows the PP to receive and the V to discharge features. The structures are given in (18) c and d. (18a) I put the book on the table b) I put the book on the table into my bag.

(18) c. VP d. VP V VP V VP puti DP V puti DP DP PP D NP V PP D NP P DP V on the table the book ei P DP the book ei on the table Reanalysis is not possible for the same reason as the ditransitive case above. To reanalyze the PP as part of the direct object as the adjunct to the book requires Spell-Out of the V, since material inside the relative would not command this category. If this category is spelled out though, there is no site for the true locative PP into my bag to merge to. The final case of an argument / adjunct ambiguity is the famous main clause/relative clause ambiguity exhibited in cases like (19) (19) The horse raced past the barn fell. These are strict garden paths, with native speakers preferring a main clause reading " The horse raced past the barn for these cases and being unable to reanalyze these as reduced relative clauses. Interestingly, Pritchett (1992) and Stevenson and Merlo (1997) have suggested that these types of ambiguities do not always yield garden paths. When transitive and unaccusative verbs replace the unergatives like those in (19), the sentences become quite easy to process as shown in (20). (20) The student found in the classroom was asleep. (b) The butter melted in the pan was burnt., Within the context of the Minimalist account, these subtle facts are accounted for because both transitives and unaccusatives must have traces inserted in the postverbal position, whether or not these structures are analysed as main clauses or relative clauses. This is because the theta grid of both transitives and unaccusatives signals to the parser that these verbs both require NP objects. Since there is no overt object in the postverbal position, a trace must be inserted here. So, even if the preferred analysis for these cases is as main clauses, the material needed to appropriately interpret these structures as open sentences, with traces in the post verbal position, are built as part of the main clause analysis, BEFORE the spellout required by the disambiguating main verb for cases which are truly reduced relatives. The initial analyses are given in (21a), (21b). The reanalysis proceeds along the lines discussed above. The material preceding the main verb is

initally analysed as a main clause. When the true matrix verb is encountered, spellout occurs of everything preceding the verb in accordance with the LCA. Now, however the spelled out material can be appropriately interpreted as a relativ clause, and so no garden path results. (21a) [IP The studenti [VP found ei [in the classroom] b. ) [IP The butteri [VP melted ei [in the pan ] In all of the above cases, Economy seemed to redundantly track feature checking in the sense that the most economical structure was also the one that allowed features to be checked. We now turn to cases where local economy is crucial to predicting both preference and reanalysis judgments. These cases deal primarily with instances where the ambiguity is between two different types of adjunct attachment. In neither case will a feature be checked so Economy is the only factor in play. ii. Adjunct/Adjunct Attachment: (a) Adverb or particle placement: The grammar presents multiple attachment sites directly after the italicized words in all of the cases in (21). The parser always chooses the position after the most recently encountered word as the preferential site of attachment. (21) I told Mary that I will come yesterday I called to pick the box up I yelled to take the cat out. In the first case, the adverb yesterday is construed with the embedded verb despite the fact that this reading is semantically anomalous and despite the fact that an alternative attachment to the matrix verb would result in an acceptable reading. The other two cases show that the particle prefers low attachment as well. These preferences can be explained on the assumption that Spell-out, as one of the grammatically licensed operations is also subject to the Economy conditions. Therefore mergers involving fewer spell-outs will be preferred. Consider (22) at the point when the adverb yesterday enters the derivation. (22) VP V VP toldi DP V ## Mary ##

V CP ei C IP that DP I ##I## I VP will V VP come V V AP ei Assuming attachment into a Larsonian shell associated with the lowest verb, where adverbs assume the position of complements would require no Spell-Outs at this point. The adverb would simply be merged under the italicized phrase. Assuming Uriagerekas version of the LCA though, attachment as an adjunct to the higher verb would require Spell-out of the lower VP, I and IP respectively given the algorithm in (10). The algorithm in (10) requires spellout of only the material that would not c-command the site of a potential merger. Therefore, if the parser has processed everything up to the lowest clause in the preposed position, it wil require multiple spellouts to return to the highest level of the preposed adverbial. In the competence model, one could think of high or low attachment as requiring an equal number of spellouts, each with a different number of phrases in the spelled out component of the analysis. In a parser however, one does not keep the whole structure in memory at a given point and therefore, one must provide an explicit procedure for dealing with previously processed material. The parser cannot retrieve a site for attachment in this case with out successive iterations of spellout, given (10). Since lower attachment involves fewer iterations of the spellout procedure, economy conditions thus favor this attachment choice. This will be true for the rest of the cases in ( (21). Attachment of the particle to the higher verb will cause spell-out of the phrases remaining on the c-command path of the lower clause. These phrases are italicized in (23). Attachment as the particle of the lower verb requires no Spell-out and will again be preferred by Economy. (23) VP VP VP PP VP PP V VP V VP calledi Vi IP yelledi Vi IP ei DP Ii ei DP Ii ##Pro## I VP ##PRO## I VP to V VP to V VP

pickk DP V PP takek DP V V PP PP ## the box## ek up ##the cat## ek out The next case was discussed in Phillips and Gibson (in press). Normally relative clause attachments are dispreferred, but in this case they are the favored reading. (24) Although Erica hated the house she had owned it for years. Although Erica hated the house she owned her family lived in it for years Philips and Gibson presented sentences like these with either temporal or non-temporal adverbial modifiers in a word by word self-paced reading task with a moving window display. At the disambiguation point (either it or her family), subjects showed a clear preference for the attachment of the preceding clause as a relative clause modifying the noun the house. There was a significant increase in reaction time at the disambiguation point if the ambiguous noun phrase was disambiguated as the matrix subject. We can explain this preference again with reference to economy of spell-out. Again at the relevant point, neither attachment will allow the discharge of a feature. However attachment as a relative clause permits much more of the preceding material to remain in the derivation as it would still command the incoming merged material. Attachment as the matrix subject requires spell-out of the entire preposed adverbial. The relevant structure with the number of nodes needed to be spelled out in italics for the matrix subject reading, and underlined for the relative clause reading are given in (25). (25) IP AP DP A IP she although DP I Erica VP V DP hated DP CP ## the house ## IP DP she Right Branching Structure in the Grammar and in the Parser- A Comparison with Collin Phillips Approach

Phillips (1995), (1996) presents very interesting work that argues for an alternative grammatically based theory of processing. In fact, Phillips claims that there is no distinction between the parser and the grammar. Derivations in both the competence and performance systems are built up incrementally left to right. Given this grammatical underpinning, Phillips tries to link performance preferences to the grammar in the following way. First, he defines a condition called Branch Right given in ( 26) below ( 26) Branch Right: "Select the most right branching available attachment of an incoming item Reference Set: all attachments of a new item that are compatible with a given interpretation" The preference for right branching structure is in turn derived from a principle that insures that the base step of the Linear Correspondence Axiom (LCA) discussed above is incrementally satisfied to the greatest extent possible. As Phillips writes: "I assume that a structure is right branching to the extent that there is a match between precedence relations among terminal elements and c-command relations among terminal elements" Phillips couples this with the idea that grammatical as well as parsing derivations proceed left to right to handle a variety of bracketing paradoxes. Consider ( 27) ( 27) a. John showed the men each others pictures. b. John showed each other the mens pictures. These examples suggest that double object constructions have right branching structures where the indirect object c-commands the direct object as in ( 28) ( 28) VP V VP showedi DP V the men V DP ei each others........ The fact that ( 29 a-c) are grammatical as VP fronting structures suggests that the structure for the PPS should be left branching, allowing the right subparts to be constituents. The structure is given in ( 30). ( 29) I said I would show the men the pictures in libraries on weekends, and (a) show the men the pictures in libraries on weekends, I will (b) show the men the pictures in libraries, I will on weekends.

(c) show the men the pictures, I will in libraries , on weekends ( 30) V V PP V PP P DP V P DP on weekends V in libraries V DP V the pictures V DP show the men Phillips shows that we can derive the effects of a structure like ( 30) without the need to assume it, by assuming that Branch Right applies from left to right with the seeming left branching structures actually being intermediate structures in the derivation. Branch Right for example, would first combine show and the men to form a constituent. This constituent would then be reconfigured when subsequent material was uncovered. Phillips (1996) presents a variety of advantages for his approach over other treatments of paradoxical constituency. The definition in ( 26) suffices to handle all of these paradoxes. Philips claims that we can use Branch Right to resolve various parsing ambiguities. In order to do this, he redefines Branch Right as ( 31). ( 31) Metric: Select the attachment that uses the shortest path(s) from the last item in the input to the current input item. Reference Set: all attachments of a new item that are compatible with a given interpretation. ( 32), repeated from above is a simple illustration of how the principle works. ( 32)a. The man believed his sister would win the Nobel Prize. b. The man believed his sister. Branch Right predicts this preference because there are fewer branches in the path between believed and his sister if one construes the post verbal noun phrase as a direct object, than if one construes this phrase as the subject of an embedded clause as shown in( 33)

( 33) (a) VP (b) VP V DP V CP believe D NP believe IP his N DP sister D NP his sister

Path = 1 step up from V to VP Path = 1 step up from V to VP 1 step down from DP to D 4 steps down from VP to CP, IP, DP and D Since the embedded subject reading takes more steps on the downward path, it is dispreferred. Phillips uses this simple principle to handle a wide range of data in English, and illustrative cases from German and Japanese. The empirical coverage of this simple principle is impressive. In addition, the use of Branch Right is argued to be independently justified by the LCA or at least from its ability to handle bracketing paradoxes, so it appears that we are getting a parsing principle for free from independently needed competence principles. For these reasons the proposal is quite interesting. Nonetheless I will argue against this approach on several grounds. a. Problems with "Branch Right": The range of problems which we now turn to do not focus on the empirical coverage of the theory per se. We note in passing however, that this theory is intended merely to be a theory of initial preference. It is well known that certain initial preferences, such as ( 32) above can be overridden given subsequent disambiguating material, while cases like ( 34) are not subject to reanalysis, and remain garden paths. ( 34) The horse raced past the barn fell ( 34) is initially interpreted as a main clause The horse raced past the barn. Reanalysis as a reduced relative The horse that was raced past the barn is impossible. The availability of this grammatically licensed interpretation has to be pointed out to naive speakers, as is well known. Phillips theory is silent on the issue of when reanalysis is possible. Phillips claims that reanalysis should not be part of the theory of sentence processing . " ...it is not clear that should want Branch Right to account for recovery from error. I assume that Branch Right is a property of the system that generates and parses sentences in a single left- toright pass, and that reanalyzes require backtracking and are handled by other mechanisms." This claim depends on the unargued for presupposition that the sentence processor proceeds in a purely left-to-right manner. However, we know from eyetracking studies that backwards saccades even in processing unambiguous and perfectly understandable text is the norm. Secondly, given that both interpretations in ( 32) are easily processable, it is hard to see why these reanalyses are

not the domain of the human sentence processor. We agree with Phillips that the actual mechanisms of reanalysis, particularly in cases where conscious breakdown occurs, may not be the domain of the processor. We see no reason however not to demand that a full theory of sentence processing distinguish cases where these mechanisms can apply; where the human sentence processor presents the appropriate representations for these mechanisms to operate on, from cases where the sentence processor does not present the appropriate representations for the operation of potentially external general purpose reanalysis mechanisms. Phillips' theory is mute on this domain of empirical prediction. We turn now from the domain of prediction to that of independent motivation. Part of the main appeal of the Branch Right theory is its independent motivation in terms of the LCA and the bracketing paradoxes. We get a processing principle for nothing. However, we will see that this motivation is partial at best. ( 32) above illustrates the next two problems with this condition. ( 32) crucially relies on a comparison of the number of steps needed to derive both possible readings independently of whether either of these readings causes a precedence/c-command mismatch. Both of the structures in ( 32) respect the grammatically relevant version of Branch Right given in ( 26) above where right branchingness is defined in terms of respect for the base step of the LCA. In both structures the verb both precedes and dominates the following NP whether or not it is construed as the direct object as in (b), or the complement subject as in (a). Nonetheless speakers have a clear preference for the interpretation of (b) over (a). This prediction thus rests on the notion of shortest path. This however is not independently motivated by any of Phillips grammatical considerations . In effect, Phillips has sneaked in the grammatically unmotivated Minimal Attachment principle of Frazier and Rayner ( 1982), yielding a combined Minimal Attachment/Branch Right principle which is only half motivated by the grammar. Without the minimal attachment part of this principle, the theory is too weak to predict the preference for (b) over (a). In (36) we present a case where the theory, without the minimal attachment addendum is too strong. ( 35) a The man told the doctor that he was having trouble with his feet. # b The man told the doctor that he was having trouble with to leave. Building either structure at the ambiguous point involves creating a precedence/dominance mismatch. Nonetheless there is a strong preference for (36a) over (36b). Phillips assumes that the preferred structure is analyzed as a VP shell. As such the structure would be as in ( 36) with the ambiguous material italicized. ( 36) VP told i VP DP V1 D NP ei CP the doctor C that

In this structure the direct object the doctor dominates neither the trace of the verb told, nor the complementiser of the complement clause. This structure induces a precedence/dominance mismatch. The same is true in the less highly valued relative reading. ( 37) VP V VP told DP V DP CP D NP C the doctor that The difference in these cases is then again, not attributable to the metric of precedence/ dominance correspondence or mismatch, but to the length of the path between man and the next terminal node. Again, this reduces to the unprincipled minimal attachment portion of Phillips Branch Right. To sum up, we have identified two problems with Phillips Branch Right. First, it fails to provide a theory of reanalysis; or more precisely does not distinguish representations in such a way as to form a basis even for an independent theory of reanalysis. Second, it incorporates a minimal path condition as well as a preference for right branching structure in such a way that the minimal path condition cannot be derived from the latter part of the condition. As such, there is a large portion of the constraint that is not grammatically motivated. Without this unmotivated portion, the theory is both empirically too strong and too weak. VI. Constraint-Based Theories: In this section, I would like to review some data presented above with the goal of contrasting a grammatically based view, such as the two previously discussed, with frequency based or probabalistic constraint based theories. MacDonald, Pearlmutter, and Seidenberg present a theory of this type. The main tenet of this theory is summarized as follows: "Processing involves factors such as the frequencies of occurrence and co-occurence of different types of information and the weighing of probabalistic and grammatical constraints" Our approach has suggested ...that syntactic parsing, including ambiguity resolution, can be seen as a lexical process." Structural heuristics under this view are replaced with frequency about use of either a lexical item , or in some theories a construction type. For example, the "minimal attachment preference"in (33) above would not derive from a minimal atttachment preference, or from its grammtical derivation through economy. Rather, speakers can tune to the fact that believe is either used much more frequently with a simple NP as its direct object, than with a sentential complement, or to the fact that simple sentences occur more frequently in the language than sentences with embeddings. Since this theory is "verb sensitive", it can easily account for the verb sensitivity of a variety of preference judgements. For example, verbs like decide , which occur much more frequently with sentential complements are correctly predicted to be immune from the "minimal attachment effect. ( 38 ) John decided the contest was fair.

I would like to argue that, while speakers may very likely track frequency, this variable works in tandem with independent grammatical constraints. If a structure occurs very frequently in a given construction, it can influence the initial preferred analysis, but once an analysis is chosen , based on an amalgum of frequency and grammatical variables, the grammatically driven reanalysis principles decide what will or will not be a garden path. In (20) above repeated as ( 39), we considered a case where lexical choice was also relevant to preference judgements. Stevenson and Merlo (1997) showed that unaccusative and transitive cases were much better as reduced relative clauses than were unergative verbs. ( 39) The student found in the classroom was asleep. (b) The butter melted in the pan was burnt. Table I. gives grammaticality ratings for unaccusative versus unergative single argument verbs. They found that unaccusatives were indistinguishable from transitive cases with respect to grammaticality judgements, yielding a two way distinction, with unergatives being terrible, and transitives unaccusatives being fine as reduced relatives. Ambiguous Unambiguous VERB SCORE VERB SCORE Unaccusative melt 2 begin 2 mutate 1.66 break 1 pour 1.66 freeze 1.5 reach 1 grow 1 sink 3.25 Unergative

advance 5 fly 4.25 glide 5 ring 3.75 march 5 run 5 rotate 5 withdraw 3.40 sail 5 walk 3.75 Table I: Grammaticality Ratings (1=perfect- 5 = terrible) Merlo and Stevenson surveyed corpora with the goal of determining whether this two-way distinction could be derived from frequency of occurrence in a corpus. Using the Wall Street Journal corpus as the reference, they counted how many times a structure appeared as a reduced relative versus how many times it appeared as a main clause. The results are given in Table 2. The important thing to notice here is that both unaccusatives and unergatives occur extremely infrequently as reduced relatives. Nonetheless, they yield radically different judgements with respect to whether clauses containing them yield garden paths or not. Unergatives are unerringly garden paths, whereas unaccusatives are not. Thus , frequency along this dimension does not predict this distinction RR MV Totals Unergatives 1 327 328 Unaccusatives 6 358 364 Ordinary

16 361 377 Table 2: Number of Reduced relatives vs Main clause in 1.5 million word Wall Street Journal Next, they looked at the number of times a verb appeared as a transitive or an intransitive verb. Since reduced relative clauses are uniformly transitive, perhaps this variable is what is tracked and the frequency of occurrence as a transitive is what predicts ability to appear in a reduced relative. Interestingly, these data seem to show a three way distinction with unergatives normally used with one argument, unaccusatives showing a more even distribution, and a third class of (ordinary) verbs showing a distinct tendency to be transitive. ordinary verbs are distinguished from unergative and unaccusative verbs in that adding the second argument does not invoke a "causative interpretation on the predicate". A paradigm is given in ( 40). ( 40) I raced the horse ( cause the horse to race) vs. The horse raced I broke the vase ( cause the vase to break) vs. The vase broke. I played baseball vs. I played trans intrans totals Unergatives 86 242 328 Unaccusatives 176 228 404 Ordinary 268 114 382 Table 3: Number of Transitive vs. Intransitive frames from Penn treebanked subsection of Wall Street Journal The interesting point here is that this three way distinction is again not mirrored in native speaker judgements. Ordinary verbs pattern like unergatives unless there are extra pragmatic clues as shown in ( 41). ( 41) The author studied in the English class was boring. Results like these suggest a picture where frequency has a role to play, but is filtered through grammatically justified constraints. Given the Minimalist theory discussed above,

ordinary verbs pattern like unergatives because when they are given their preferred interpretations as main clauses, they are pure intransitives with no trace in the object position. The main verb was in a case like (42), triggers reanalysis as a relative clause, but by that time the material preceding it is already spelled out, and the trace neccessary for interpretation as a reduced relative cannot be inserted. The structure is given in ( 42). Frequency , coupled with the Economy driven conditions may drive the initial preference for a given verb to be either a main clause or reduced relative, but if this preference is incorrectly set to a main clause in the first analysis, reanalysis as a reduced relative will be impossible. ( 42) ## IP DP I #The author studied# was This contrasts with the unaccusative cases, as discussed above. These cases must insert a trace in the post verbal position whether or not the structure is interpreted as a main clause or as a reduced relative. Therefore, whether or not the main clause or relative clause reading is intially chosen ( perhaps based on frequency) reanalysis is possible. If this analysis is correct, we are driven to a theory where frequency information interacts with grammatically based principles, but frequency does not replace these principles. VII. Conclusions: In this paper we have argued for a theory of processing preference and reanalysis that is heavily based on independently needed conditions within Chomsky's grammatical theory. There are no independent "parsing principles." In this case, the theory of preference is grounded in the Economy Conditions of Chomskys (1993) minimalist theory. We contrasted our approach with one proposed by Collin Phillips. These theories are similar in that principles are all independently motivated by grammatical considerations. We argued however that these economy conditions allow us to derive the unmotivated shortest path portion of Phillips Branch Right. The principle of Least Effort discussed above favors feature passing that involves the minimal number of steps. Next, we follow Uriagereka in eliminating the induction step of the LCA in favor of a theory involving multiple spellouts. We have shown show that multiple SpellOut when combined with the independently motivated economy conditions also provides a motivation for the preference for right branching structures and an independently motivated theory of reanalysis. In the last section, we argued that these principles interact with frequency derived parsing constraints in interesting ways and can explain subtle differences between the garden path status of reduced relatives derived from unergatives, unaccusatives,and transitives that are otherwise mysterious. This argues in turn for a theory where grammatical principles are supplemented, but not replaced by considerations of frequency or probablility. References: Chomsky, N. (1993) "A minimalist program for linguistic theory in K. Hale and S.J. Keyser eds, The View from Building 20: Essays in Honor of Sylvain Bromberger. MIT Press. Frazier, Lyn and K. Rayner (1982) "Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences." Cognitive Psychology 14. pp. 178-210. Gibson, E. (1991) A Computational Theory of Human Language Processing: Memory Limitations and Processing Breakdown. unpublished Carnegie-Mellon PhD dissertation.

Gorrell, Paul (1995) Syntax and Parsing. Cambridge University Press. Hale, K. and SJ Keyser (1993) "On Argument Structure and lexical Expression of Syntactic relations." in SJ Keyser, ed. The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger. MIT Press. Jackendoff(1972) Semantic Interpretation in Generative Grammar. MIT Press. Jurafsky, Daniel (1996) " A Probablistic Model of Lexical and Syntacttic Access and Disambiguation." in Cognitive Science pp. 137-194. Kayne, R. (1994) The Antisymmetry of Syntax. MIT Press. Larson, R.(1990) "Double objects revisited: reply to Jackendoff . Linguistic Inquiry 21. 589-632. MacDonald, ME, D. Pearlmutter, and M. Seidenberg (1994) "The lexical Nature of Syntacttic Ambiguity resolution" Psychological Review 101 pp.678-703. Merlo, P. (1994) "A Corpus based Analysis of verb Continuation Classes for Syntactic Processing." Journal of Psycholinguistic Research 23:6 pp. 676-703. Phillips, Colin, (1995) "Right Association in parsing and grammar." in C. Schutze, J. Ganger, and K. Broiher, eds., Papers in Language Processing and Acquisition. MITWPL. 26, pp. 37-93. Phillips, Colin (1996) Order and Structure. unpublished MIT PhD dissertation. Phillips, C. and E. Gibson (in press) " On the strength of the local attachment preference." Journal of Psycholinguistic Research . Pritchett, B. (1992) Grammatical Competence and Parsing Performance. University of Chicago Press. Steedman, Mark(1996) Surface Structure and Interpretation. MIT Press. Stevenson, S. (1993) "A Competition-based explanation of syntacttic attachment preferences and garden path phenomena." Proceedings of the 31st Annual Association for Computational Linguistics. pp. 266-273. Stevenson, S. and P. Merlo(1997) "Lexical Structure and Processing Complexity." in Language and Cognitive Processes vol 12.2/3, pp. 349-399. Trueswell. JC & Tanenhaus, MK (1994) "Towards a lexicalist Framework for Constraint Based Syntacttic Ambiguity Resolution. in C. Clifton, L Frazier, and K Rayner, eds. Perspectives on Sentence Processing. pp. 155-179. Uriagereka (this volume) "Multiple Spell-out Weinberg (1992) "Parameters in the theory of sentence Processing: Minimal Commitment Theory Goes East". Journal of Psycholinguistic Research 22.3 pp. 339-364.

Toward a Minimalist Theory of Syntactic Structure David R. Dowty Ohio State University Paper prepared for the Tilburg Conference on Discontinuous Constituency January 25-27, 1989 [Corrected version February 24, 1992] 1. Introduction. No assumption is more fundamental in the theory (and practice) of syntax than that natural languages should always be described in terms of constituent structure, at least wherever possible. To be sure, certain kinds of cases are well-known where constituent structure of the familiar sort runs into problems, e.g. (1) a. VSO languages

b. Cases for which "Wrapping" operations (Bach 1980, Pollard 1984) have been proposed c. Free word order and free constituent order languages. d. Various other instances of proposed "flat" structures e. Clause unions f. Extrapositions g. Parenthetical phrases But even here, the strategy has usually been to accommodate these matters while modifying the received notion of constituent structure as little as possible. Indeed, the most extreme and general theoretical proposal I know of for systematically departing from constituent structure, Zwickys (1986) direct liberation framework, which incidentally was the inspiration for this paper, still takes the familiar hierarchical constituent structure as its point of departure, in an important sense, and derives "flat" structure from these (more on this below). There are two things that worry me about the situation syntactic theory finds itself in 1990. Since hierarchical syntactic structure is so often assumed, syntacticians dont usually ask questions--at least beyond the elementary syntax course---as to what the nature of evidence for a constituent structure in a particular sentence in a particular language is: we just take whatever structure our favorite syntactic theory would predict as the expected one for the string of words in questions---by the current X-bar theory, etc.---unless and until that assumption is contradicted by some particular fact. My second concern is closely related: I suspect syntacticians today have almost come to think of the "primary empirical dat of syntactic research as phrase structure trees, so firm are our convictions as to what the right S-structure tree for most any given sentence is. But speakers of natural languages do not speak trees, nor do they write trees on paper when they communicate. The primary data for syntax are of course only STRINGS of words, and everything in syntactic description beyond that is part of a theory, invented by a linguist. What I want to do today is describe what I will call a minimalist theory of syntax, that is, one in which the default assumption in syntactic description is that a clause or group of words is only a string; hierarchical structure is postulated only when there is direct evidence for it or when it is unavoidable in generating the data right. Unlike Zwickys approach, which is really very similar in the class of phenomena that can be analyzed1, this theory is deliberately formalized in such a
1 Beyond

differences of how the two formulations affect the methodology of the linguist, there are only two differences between Zwickys formulations and mine that I can see: (i) my description of bounding requires one to list the "bounding categories" for the whole language; these then are treated as bounded by all rules (though, at present, I also allow additional rule-specific bounding as a marked option). Bounding (for which his corresponding concept is "concatenated daughter", as opposed to "liberated daughter") can be specified by Zwicky only rule by rule, so he cannot, e.g., directly describe a generalization (assuming there is such) that all rules of some language mentioning NP must treat it as bounded. (ii) Zwicky does not employ anything corresponding to syntactic "attachment" (see below).

2 way as to make linear relationships more salient and hierarchical ones less so. As you might expect from the context in which I am presenting this paper, I am suggesting that this is also a theory which permits the description of various discontinuous syntactic phenomena, in fact descriptions which are simpler, I will argue, by virtue of its taking linear structure as the norm; in the course of the discussion, I will treat examples of many of the problems in (1). While not appealing so much to tree structure, I will on the other hand take much advantage of the idea of Linear Precedence Principles from GPSG (GKPS 1985), and also an idea which I think has been too little pursued: that some words and constituents are more tightly bound (or attached) to adjacent words than others are. This draws a parallel between syntax in general and the well-studied phenomenon of clitics. Though the theory as described here is in a very simple and embryonic form, and there are many aspects of the problems under discussion that I cannot yet give a satisfactory treatment of, I hope

the reader can get some flavor of the possibilities and constraints from this presentation. One of the main interests that such a theory has for me is the way it challenges us to justify our assumptions about constituent structure at every step of the syntactic description of a natural language. 1.5 Two senses of "Constituent" Now, there is one sense in which I am proposing to do away with much syntactic constituency and one sense in which I am not. I still assume that sentences of a language are described by rules of a recursive grammar which specify how words and phrases of some categories can be combined to form expressions of another category. And I assume that language is interpreted by semantic rules, corresponding one-to-one to these syntactic rules, that specify how the interpretation of a syntactically-derived phrase is determined by the interpretations of the inputs to the rule. All this implies syntactic constituents, in one sense. I am going to introduce a distinction which H. B. Curry drew in 1963 (Curry 1963) and which I have found it useful to appeal to before (Dowty 1982a): the sense of syntactic structure we have just been talking about is tectogrammatical structure---the steps by which a sentence is built up from its parts, but without regard to the actual form that these combinations of parts takes. This notion may be best visualized by a Montague-style analysis tree as in (2), which you might imagine as illustrating the generation of a sentence, in an unknown language, meaning "Harold gave Max a mango", with lexical items only glossed: (2) [...............................]S [.......................]VP "Harold"NP [...................]TV "a mango"NP "give"TV/NP "Max"NP
Here we see the lexical expressions involved, the syntactic categories, the steps by which words and phrases are combined, and the implicit form of the compositional semantics. But the complex expressions at the nodes of the tree have been omitted. What is missing is, in Currys term, the phenogrammatical structure: how the words and phrases are combined, in what order, whether word order is fixed or free, whether

inflectional morphology marks the syntactic organization or not, whether the tectogrammatical groups in (2) come out continuously or discontinuously in the sentence itself, and so on. It is in phenogrammatical structure, not tectogrammatical structure, that I am suggesting natural language may be more string-less, less tree-like, than has been suspected. One might ask at this point whether it really matters whether phenogrammatical structure is string-like or not, as long as tectogrammatical constituent structure exists; doesnt the latter do everything we want phrase-markers to do anyway? The answer is, it would not matter at all, if all languages had fixed word order and syntactic formation rules that invariably concatenated input expressions as undivided wholes: the phenogrammatical structure of every sentence like (2) would in that case be a straightforward mapping of the trees leaves into a linear string. The problem, of course, is that languages do not always work this way: they have discontinuous word order, extraposition, and all the other phenomena in (1) that are problematic for the context-free phrase structure model. My claim is therefore that when one gets around to describing such phenomena, one may well find that they are better formulated in terms of syntactic operations applying to strings of words than to phrase markers.2 Furthermore, I am questioning whether some of the familiar tree-based descriptions of apparently context-free phenomena (in languages like English) are really the best

descriptions, so Im suggesting the class of non-phrase-structural phenomena could be wider than is generally assumed. The distinction between tectogrammatics and phenogrammatics is an important one if we are to seriously examine the basis for traditional constituent structure. Many arguments and lines of reasoning that are taken to show, inter alia, that surface structures are trees really establish only tectogrammatical constituency, not phenogrammatical constituency. Here is an example to illustrate. The sentences in (3a)-(3d) would constitute one good traditional argument that the English VP has a hierarchical "nested" VP structure: (3) a. They said he could have been slicing the spinach, and have been slicing the spinach he could. b. They said he could have been slicing the spinach, and been slicing the spinach he could have. c. They said he could have been slicing the spinach, and slicing the spinach he could have been. This conclusion, in particular, meant that (4) had at least three VP constituents: (4) He could have been slicing the spinach. However, (3) and other distributional facts about VPs show only that English has nested VPs tectogrammatically--that rules create various kinds of (fronted) VPs in (3), and that these rules derive all these kinds of VPs sequentially in the process of
2 This

might appear to have much in common with early transformational grammar, in which a context-free phrase-structural component generated strings which are operated on, in a purely string-based way, by transformations. Many differences exist, however, in the question of which phenomena are to be described by string-based operations (e.g. I assume here that Passive, Raising, etc. are not), differences in the restriction on how "structural descriptions" are stated, the ways "movement" is determined (e.g. here by LP and other general principles, etc.), the fact that here string-based operations are either tied to particular syntactic formative operations (i.e. like generalized transformations) or not a consequence of any one rule at all (LP principles). (WH-extraction is not treated in the present paper, but both a functional-composition analysis like that of Steedman (1985a, 1985b) and the "feature-passing" analysis of GKPS (l985) are compatible with my proposals.)

4
producing (4): It does not show that (4) itself has a phenogrammatical form that involves nested VP constituents; and I will argue below that the VP in (4) has no such constituent structure.3 For a second example consider co-reference conditions involving pronouns and reflexives in English. There is a long tradition that describes these in terms of tree structure. But in what sense is the tree structure necessary? In 1980, Emmon Bach and Barbara Partee (Bach -- Partee 1980) showed how to construct, for a reasonable fragment of English, an analysis of anaphora and pronominal expressions, in terms of the kind of compositional semantic interpretation I am assuming here. It was as adequate as the approach of Reinhart (1983), they argued, and superior in a few respects---namely, where there was an asymmetry in anaphoric behavior that would necessarily be paralleled by some asymmetry in semantic interpretation but was no well-motivated asymmetry in traditional constituent structure. Bach and Partees compositional semantic interpretation had the same form as the tectogrammatical syntactic structure---as it does by definition in a "rule-to-rule" semantic interpretation. But the actual phenogrammatical form of the English sentences plays no role in their system. In this paper, I will begin by describing this theoretical framework briefly, then illustrate it, first with a very small corpus of Finnish data, to indicate the treatment of relatively free word order, then turn to English, where I will address the "constituency" of the verb phrase, then turn to Extraposition data. English extraposition offers a very interesting challenge to someone who would postulate much less constituent structure but more reliance on LP principles: "extraposed" relative clauses and PPs end up at exactly that place in the clause where English LP principles say that subordinate clauses and PPs should end up anyway: at the right hand margin. The challenge is to see whether the phenomenon of extraposition can in fact be made to

follow automatically from these LP rules simply by "loosening" the English constituent structure properly. 2. What would a "minimalist" syntactic theory look like? The components of the theory I have in mind are as follows: a. A Categorial Grammar with compositional semantics; syntactic operations build up expressions from words to larger expressions. b. The "default operation" for combining two expressions syntactically is to merge their words into a single (unordered) multiset. c. However, Linear Precedence Principles, as in Generalized Phrase Structure Grammar (or GPSG) (Gazdar, Klein, Pullum, and Sag 1985 --- henceforth GKPS 1985) and Head-Driven Phrase Structure Grammar (or HPSG) (Pollard 1984, Pollard and Sag 1987), which apply to the output of the whole grammar, limit the orders in which expressions may appear relative to others, either partially (leaving
3 Zwicky,

coincidentally, uses the same example to illustrate his system of direct generation, noting that phonological phrasing does not motivate the nested VP structure in English (Zwicky 1986: 67). One reason that I prefer the present forumlation to Zwickys is that his system does not clearly distinguish between phenogrammatical and tectogrammatical structure, as it comes from a tradition that presumes these are the same: the phrase markers his rules generate are, presumably, phenogrammatical, and if there are any "flattened nodes" (corresponding to my tectogrammatical constituents which are not phenogrammatical ones), these do not appear in his trees at all and are like "phantom categories" in early GPSG (though they must be "reconstructed", in effect, in the compositional semantic interpretation). An auxiliary notation might be added to his system to remedy this, but I prefer one based on tectogrammatical analysis trees, permiting phenogrammatical trees (in bracketed form) as well, when needed.

5
some word order variation) or entirely (leaving none). Specifications like "must appear in second position" are to be allowed. But LP principles are actually defaults, not absolute rules, as they can be overridden by rule-particular syntactic operations (Zwicky 1986, Powers 1988); see below. d. For each language, there is a list of Bounding Categories: parts of expressions of these categories cannot mingle with expressions outside the bounding category expression and vice-versa; these are "constituents" in the traditional sense. The list of bounding categories is regarded as a language-specific parameter. For (probably) every language, "Sentence" is a bounding category, since even in very free word order languages, words do not stray outside their own clause. For some languages, NP is a bounding category (so-called "free constituent order languages" are of this type, e.g. Makua (Stucky 1981)), for others it is not (these are so-called "free word order languages"), where an adjective may stray away from its head noun. A language in which all categories were bounded would of course be completely "constituentized," as normal phrase structure theory assumes all languages to be. e. Constituent-formation can be specified as a rule-specific syntactic operation, but a marked one: there are two other kinds of possible syntactic operations besides the default one: (i) ordering one expression4 to the left or right (13) of the head5 of another, and (ii) attaching one expression to the left or right of the head of another. The difference is that two expressions connected by attachment cannot be subsequently separated by other expressions, while ordering allows this. f. Finally, since categorial grammar is the basis of this theory, it is important to emphasize that agreement and government morphology would still be treated either as in Bach (1983), or in a unification-based system as in Karttunen (1989) or Zeevat, Klein and Calder (1987); both of these allow one to observe the socalled "Keenan principle" that in a functor-argument pair, agreement morphology is "copied" from the argument to the functor, while the functor determines what government morphology appears on the argument (Keenan 1974). Needless to say, the details of this proposal are provisional (or absent at present) and might well be modified; some will in fact be modified as we proceed. 3. A simple "free word order" example: Finnish

I will begin with a very brief illustration of how a relatively free word order language would be described in this method. Fortunately, Karttunen (1989) describes a small
4I

assume for the moment that ordering one expression before (after) the head of another is only meaningful if the first is bounded. Thus, options the theory could take at this point are (i) that such ordering is only to be well-defined for bounded expressions, and (ii) that if the first expression is not bounded, this operation has the effect of ordering the head of the first before (after) the head of the second. The latter will be adopted below: cf. (81), (82). 5I assume head is defined as more or less customary in categorial grammar: in a complex expression formed from one expression of category A/B and a second of category B, where A =/ B, the first is the head; if A = B, then the second is the head (i.e. the functor A/A is a modifier). Below, we will have occasion to refer to the lexical head of an expression, the simplest definition of which is that it is both a head and a lexical expression (i.e. is not a syntactically derived phrase). Some expressions would not have a lexical head by this definition, of course. If it turns out that we are to assume lexical heads for all expressions, the intuitive way to proceed is by a recursive definition: one seeks the (non-modifier) functor of the expression, and if that head is not lexical, then one seeks the head of the head, and so on. However, the problem with this kind of definition is that it indirectly makes reference to the derivational history of an expression and therefore to its tectogrammatical "constituent structure", thereby defeating my goal of avoiding reference to constituency in syntactic operations. But the data treated in this paper do not suffice to determine what sort of notion of "head" is needed; the experience of Pollard (1984) and Pollard and Sag (1987) should be relevant, but I am not sure how to apply it here. Cases of attachment in this paper involve attachment to a (first-order) lexical head, or attachment to a group of attached words, of which the group is the (first-order) head and which also includes the (ultimate) lexical head.

part of the grammar of Finnish that is just right for my purposes, so I will borrow his data (but not his analysis). First, we will need some notation. 3.1 Basic Notation Since a set is a familiar mathematical construct that has members with no inherent ordering on them, I will use the set notation to represent expressions that have been produced by grammatical operations but not linearly ordered by the LP (Linear Precedence) principles, as for example (5) is: (5) {a, b, c, d} Suppose the language has the LP principles in (6), (6) A < B , C < D which are interpreted as saying that expressions of category A (to which we assume a belongs) must precede those of category B (to which b belongs), and those of C (etc.) must precede those of D. This will then authorize the following as well-formed linearly ordered sentences of the language: (7) a b c d cdab acbd cadb acdb cabd (This is taken over from GPSG, as in Gazdar and Pullum (1981). To indicate a bounded constituent, a set "inside" (represented as a member of) an expression will be used. For example, if in the derivation of the expression in (5) the combination of c and d had been of a bounded category, then the whole expression produced would have been (8): (8) {a, b, {c, d}} and since bounded expressions cannot be separated by expressions outside, the linear phrases allowed by the same LP rules would now be only: (9) a b c d acdb cdab For grammatical rules, I will adopt a Montague-style notation (since phrase-structure rules will obviously not be suitable), and the default combination operation will be represented by set union: (10) (Default syntactic operation) S1. If A/B, B, then F1(,) A, where F1(,) = U . Lexical items themselves will be singleton sets; therefore all expressions, lexical and complex will be sets, so set union is always well-defined on them. The two "marked" operations will be symbolized as in (11)

7
(11) a. F2(,) = << (" ordered to left of ") b. F3(,) = + (" attached to the head of ")

3.2 Finnish Data Karttunen is concerned with four statements about Finnish grammar: (12) a. In declarative sentences, subjects and objects may occur in any order with respect to the verb and to one another. b. In yes-no questions and imperatives, the finite verb comes first. c. The negative auxiliary (e-) precedes the temporal one (ole-) and both precede the main verb, but the three types of verbs need not be adjacent. d. Elements of participial and infinitival clauses can be interspersed among the constituents of a superordinate clause. (Karttunen 1989: 48) Assume that Finnish has at least these operations for combining subjects and objects for verbs:6 (13) a. If S/NP, NP, then F4(,) S, where

F4(,) = NOM(). b. If TV, NP, then F5(,) VP, where F5(,) = ACC(). (I give only a very rudimentary treatment of morphology here, merely to point out at what point in the derivation case government is determined; see Bach (1983) or Karttunen (1989) or Zeevat, Klein and Calder (1987) for fully-developed theories.) And if Finnish has no LP principles affecting NPs and verbs, then the following kinds of sentences will be generated, all of which mean "John loved Lis: (14) a. Jussi rakasti Liisaa. b. Jussi Liisaa rakasti. c. Liisaa Jussi rakasti. d. Liisaa rakasti Jussi. e. Rakasti Jussi Liisaa. f. Rakasti Liisaa Jussi. For Karttunens second condition, we need simply a Linear Precedence Condition for 6Since under the convention we have adopted is here a set, "NOM()" would be more properly understood as applying the nominative

morphological operation to the expression in the set ; the same applies to attachment and rule-specified precedence in the rules below. The unification-based treatment of Karttunen 1989, or Bachs (1983) "lexical implementation", is actually preferable here, since we do not want to take rules like (a) and (b) to imply that a syntactically analysed version of the phrase is necessary at this point (to apply the morphological operation correctly to the appropriate words within ), but only that a derivation of a nominative (or accusative, etc.) phrase is well-defined elsewhere by the grammar.

8 Finnish that specifies that certain kinds of words, namely interrogative auxiliaries7, must be first in their clause. (15) V < X
[+Q]

Here "X" is understood as a variable over categories, i.e. an interrogative verb precedes anything. I assume a syntactic rule derives interrogative verbs from ordinary verbs, by suffixing -ko, and performing the necessary semantic operation. This will allow (16a) but not (16b): (16) a. Rakastiko Jussi Liisaa? "Did John love Lisa?" b. *Jussi rakastiko Liisaa? Karttunens third condition, restricting the order of two kinds of auxiliaries and main verb is only a slightly more complicated LP condition, (17) V < V < V [+Aux] [+Aux] [-Aux] [+Neg] [+Tem] permitting four (but only four) ways of saying that Lisa has not slept. Here, ei is the negative auxiliary and ole the past auxiliary:

(18) a. Liisa ei ole nukkunut. "Lisa has not slept" b. Ei Liisa ole nukkunut. c. Ei ole Liisa nukkunut. d. Ei ole nukkunut Liisa. The more interesting case, for our concerns, is the fourth one, for Finnish is rather unusual in permitting words from a subordinate infinitive complement to "escape" into a main clause. Below is some data. We will have an infinitive meaning "play tennis in these (clothes)" embedded as complement to the verb "start" (ruveta), all of which is then embedded as complement to "intend" (aikoa), along with negative and tense auxiliary verbs: (19) En min ole aikonut ruveta pelaamaan niss tennist not I have intend start play these-in tennis "I did not intend to start to play tennis in these." (ruveta takes infinitive complements with the verb in the 3rd infinitive in the illative case; aikoa requires the 1st infinitive.) In this example, of course, the infinitives appear in contiguous groups. But the object tennist and adverbial niss can "scramble" to any of six possible positions in the superordinate clause. Karttunen says that it is not clear that all of the 42 possible "free" word orderings are acceptable, but it is not clear that any are ungrammatical either. Some clearly acceptable ones are in (20), where the "discontinuous constituents" are in boldface:
7 To

save time, I will ignore the other ways of forming questions that Karttunen mentions, namely by suffixing -ko to a NP or by using a WHword such as keta. Thus the LP principle should order any word with some feature corresponding to -ko at the beginning of the sentence. The syntax and semantics of deriving these other questions is more complex.

9 (20) En min niss ole tennist aikonut ruveta pelaamaan not I these-in have tennis intend start play En min tennist niss ole aikonut ruveta pelaamaan not I tennis these-in have intend start play En min tennist ole aikonut niss ruveta pelaamaan not I tennis have intend these-in start play In the present theory, Karttunens property (6d), that elements of an infinitival clause can escape into a superordinate clause, which is responsible for this data, reflects simply that infinitival VP is not a bounding category in Finnish, as it is in many other languages. The syntactic rule combining the verbal subcategory to which aikoa ("intend") belongs with its complement is approximately as in (21), morphology again simplified: (21) If VP/VP, VP[-fin], then F6(,) VP, where F6(,) = [3rd-Inf](). And assume a parallel rule for ruveta ("start"). The analysis tree responsible for (19)(20) is then (22): (22) {En, min, ole, aikonut, ruveta, pelaamaan, niss, tennist}S} {en, ole, aikonut, ruveta, pelaamaan, niss, tennist}VP {min}NP "I" {en}VP/VP {ole, aikonut, ruveta, pelaamaan, niss, tennist}VP "not" {ole}VP/VP {aikonut, ruveta, pelaamaan, niss, tennist}VP PAST {aikonut}VP/VP {ruveta, pelaamaan, niss, tennist}VP "intend" {ruveta}VP/VP {pelaamaan, niss, tennist}VP "start" {niss}VP/VP {pelaamaan, tennist}VP "in these" {pelaamaan}TV {tennist}NP "play" "tennis"
As the only LP principles we have used for Finnish relate to the order of the first two auxiliaries and as VP is not a bounding node, the sentences in (20), and many others, are all permissible linearizations of (22).8
8 Actually,

this grammar may be inadequate for this or slightly larger fragments of Finnish because it permits the order of complement-taking verbs to be too unrestricted (once the morphological restrictions on these particular complement verb forms no longer disambiguate); if so, restrictions like those suggested below for English auxiliaries might be needed, which order the head of a VP before (for Finnish, non-NP) elements of the VP.

10

With this brief illustration, I turn to a language which is not traditionally thought to have any free word order phenomena of this sort, namely English, and will first discuss verb phrase word order, then turn to extraposition. 4. English: Verb Phrase Order and Constituency 4.1 English Bounding Nodes With regard to bounding categories for English, surely "sentence" is one. One can see this not only from, e.g. the fact that NPs do not float freely from one clause to another but, as will turn out to be very relevant to this paper, from the familiar fact that extrapositions from NP also do not escape the clause of their host NP, the so-called "right roof constraint" (Ross 1967). The bounding status of NP itself is complicated and will be delayed until we talk about extraposition. Is VP a bounding category in English? It seems to me that we have no choice but to treat infinitives as bounding categories, where by "infinitive" I mean a nonfinite VP introduced by to. As for other kinds of English VPs, one might think there is a good argument that they too are bounding nodes, from that fact that, as show by Baltin (1982), Extraposition seems not to be able to escape from VP in examples like (23), where a VP in fronted position permits us to visibly demonstrate that an extraposed PP (or relative clause) cannot move outside it. (23) a. They said he would call people from Boston up, and call people from Boston up he did. b. *They said he would call people from Boston up, and call people up he did from Boston. However, I believe this not really the correct conclusion to draw from (23). Example (23) is one of a more general class of focus sentences, which include not only VP focus sentences but NP focus sentences such as (24): (24) a. She said she would hire a good syntactician, and by golly, a good syntactician she hired! b. We expected him to catch a big one, and a whopper he caught. c. They said he would pick out a nice one, and a nice one it was. These of course can include PP or relative clause modifiers, and just as with Baltins examples, we see that extraposition from the fronted constituent is prohibited (whereas relative classes and PPs can of course be extraposed from non-preposed NPs): (24) a. She said she would hire a syntactician with a PhD, and by golly, a syntactician with a PhD she hired! a. *She said she would hire a syntactician with a PhD, and by golly, a syntactician she hired with a PhD! b. We expected him to catch a big one with red scales, and a big one with red scales he caught. b. *We expected him to catch a big one with red scales, and a big one

11
he caught with red scales. c. They said he would pick out a nice one that fit her, and a nice one that fit her it was. c *They said he would pick out a nice one that fit her, and a nice one it was that fit her.9 Since the fronted NPs in (24) cannot possibly be dominated by VP, Baltins constraint cannot apply to them. The more general way to formulate a constraint which covers both kinds of examples is (25): (25) Elements may not be extraposed out of a focussed constituent (and perhaps not out of any non-WH fronted constituent?) Here are some further kinds of examples of frontings of various kinds: in some of these the fronted element is a VP (or can be analyzed that way, anyway) and so are not

relevant to demonstrating Baltins conclusion insufficient. But once examples like (24) have done that, these show the generality of the phenomenon. A third sentence is added in each case showing that extraposition is possible in a different structure: (26) a. A student who excelled at math is what he wanted to become. b. *A student is what he wanted to become who excelled at math. c. A student visited us yesterday who excelled at math. e. A mile from the spring where we got our water there appeared a big cloud of dust. f. *A mile from the spring there appeared a big cloud of dust where we got out water. (If not *, then not synonymous with (26e).) g. We rode about a mile from the spring almost every day where we got our water. h. Under the bed which we put in the attic is a good place to hide. i. *Under the bed is a good place to hide which we put in the attic. j. I put it under the bed one day last week which we put in the attic. Thus I will assume that the syntactic rules for focus and inversion (which I will not treat here) must mark their pre-verbal constituents as bounded, but I believe VPs in general are not bounded. 4.2 Grammatical Categories I will assume at least the following syntactic categories for English, shown with a few exemplary members of each:
9 This

sentence actually has a grammatical reading, but not one that is synonymous with (25c); it is rather a focussed version of the cleft sentence It was a nice one that fit her.

12

(27) a. NP (John, Mary, etc.) b. VP (= S/NP) (walk, talk) c. TV (= VP/NP) (eat, love) d. TV/NP (ask, give) e. TV/PP[to] (give, offer) f. IV/VP[inf] (try, want) g. TV/VP[inf] (persuade, convince) h. (VP/VP[inf])/NP (promise) i. (VP/VP) (slowly, carefully) j. S/S (today, obviously, also parenthetical phrases like on the other hand, she suggested, etc.) k. PP (= XP/XP) (in the garden, etc.) l. P (= PP/NP) (to, for, in) m. CN (dog, cat, etc.) n. NP/CN (a, the, many) n. CN/CN[atr] (former, imitation, etc.) o. CN/CN[prd] (prd. adjectives, asleep etc., PPs on the sofa, etc., Relative clauses who loves Mary) This is intended to be a fairly uncontroversial category system for a categorial grammar for English, though of course much detail, especially in morphology, is omitted from the category descriptions in order to save space. Prepositional Phrase is treated as a category schema, XP/XP where X ranges over (at least) VP, S, and CN. One things to note is parentheticals, which I have assigned to category S/S (to get their compositional semantics right) and of which I will have nothing to say about the internal syntax, though I will return to their external syntax below. Also note that I have divided noun modifiers into attributive adjectives, versus predicative modifiers--the latter including predicating adjectives, prepositional phrases, and relative clauses; I believe this distinction can be justified both syntactically and semantically. In order for LP principles to work properly, certain further distinctions in category must be introduced beyond the usual categorial ones. VP/VP includes both adverbs

and prepositional phrases, but the latter must be distinguished, possibly with a feature, from adverbs. Complement sentences are often treated categorially as NPs, with their complementizer of category NP/S, but their sentential nature must be specified somehow. And so on. For convenience, I will refer to traditional categories like PP below without spelling out any one of the several ways these might be defined with features on the category system above. 4.3 English LP Principles For the purposes of this paper, I will work with the LP principles (28): (28) V,N,Adj < NP < Prt < PP < VP[inf] < S These are based loosely on the LP principles worked out in GPSG (Gazdar and Pullum 1981, GKPS 1985). Here, the complement structure of verbs like persuade is treated in the now-traditional way for categorial grammar (Bach 1980), in which they combine

13
with their infinitive complements before their NP complements10. By the use of LP principles and a "flat" structure, the correct word order (e.g. persuade Mary to leave) results without any wrapping operation (in much the same way as in GPSG, though here with a categorial syntax). One thing that should be noticed about (28) is that it seems to order constituents according to their length. That is, I think it is fairly likely (though I have not attempted a statistical study) that an average English NP is longer than an average verb; that an average PP is longer than an average NP; an average infinitive is longer than an average PP, and an average clause longer than an average infinitive. As is known, and as we shall have occasion to see later on, violations of this ordering are often acceptable when the particular instance of a constituent placed on the right is significantly longer than whatever (28) says should instead be on the right of it. This raises the question whether (28), or part of it, should be replaced by a general rule sensitive to the length of particular phrases and saying "heavier constituents to the right".11 I dont have an answer to this question, but I think one should keep it in mind. 4.4 The Double NP Construction and Attachment How should we insure the correct word order when there are two NPs following the verb, as in (29)? (29) a. Mary gave Harold a book. b. Richard asked Sue a question. Here the order of NPs is rigidly fixed, unlike the order of two PPs, which the LP rule (28) correctly predicts to be free (e.g. talk to Bill about John, talk to John about Bill). We could, as in the GPSG treatment, introduce via the syntactic combinatory rule a syntactic feature [+F] on the NP that we want to appear to the left, then add the LP principle: NP[+F] < NP[-F] It has been objected that this step is ad hoc and is at odds with the spirit of the LP program. There is, however, a more interesting way to account for this ordering. It has often been observed (e.g. Postal 1974) that adverbs can intervene almost anywhere in an English VP except between the verb and immediately following NP: (30) a. I believe very strongly that Tony is honest. b. *I believe very strongly Tony to be honest. (Postal 1974) c. I persuaded George quite easily to leave the party. d. I persuaded George to leave the party quite easily. e. *I persuaded quite easily George to leave the party.
10 However, [inf] [inf

the motivation for the category TV/VP

(as opposed to the category (VP/VP )/NP), and therefore the motivation for (some effect of) wrapping in this construction, does not really seem to me to be on firm ground; see Dowty (1982a) and Dowty (1982b). English almost surely requires wrapping (or the equivalent) in other cases, however, such as too hot to eat and easy man to please. 11 A related idea was proposed by Ross (1967, sec. 3.1), though not in an LP-framework but as an output filter on a transformational grammar. Another perhaps relevant fact (if not one observed already) is that constituents containing a verb are placed at the end of the clause; this could have a disambiguating effect, as NPs or PPs placed after such a verb-containing constituent could sometimes be ambiguously parsed as belonging

to the constituent in question or to the main clause.

14

f. I easily could have been passing the note to her. g. I could easily have been passing the note to her. h. I could have easily been passing the note to her. j. I could have been easily passing the note to her. k. *I could have been passing easily the note to her. l. ?I could have been passing the note easily to her. m. I could have been passing the note to her easily. With parentheticals (however, I think, George, on the other hand, etc.), the same prohibition against separating a verb from its object obtains. This suggests to me that this is one point in the English verb phrase where a tighter connection exists between adjacent elements than at other points, and it is this kind of case for which I included the syntactic operation of attachment (cf. (11b) above). Thus, the syntactic rule for combining verb with direct object is to be revised from the default operation to that in (31). (This will be revised slightly below.) (31) S5. If TV, NP, then F3(,) VP, where F3(,) = + ( attached

to the head of ). A derivation of the VP give a book to Harold, for example, would proceed as in (32): (32) {give+{a book}, {to, Harold} }VP {give, {to Harold} }TV {a,book}NP {give}TV/PP {to, Harold}PP

That is, the attachment of the NP a book to the verb is by the special "marked" syntactic operation introduced earlier and cannot be interrupted by adverbs. The PP to Harold, necessarily follows afterward (but cf. below). A phenomenon which may correlate with this one in an interesting way is that unstressed personal pronoun objects are unacceptable when they do not appear immediately to the right of their verbs, necessarily preceding even a verb particle (Zwicky 1986): (33) a. We took in the unhappy little mutt right away. b. *We took in it right away. c. We took it in right away. (34) a. Martha told Noel the plot of Gravitys Rainbow. b. *Martha told Noel it. c. Martha told it to Noel. After surveying the range of syntactic environments where unstressed personal pronouns are acceptable in English, Zwicky offers the following two-part solution to this problem. Unstressed personal pronouns are leaners (in the sense of Zwicky (1982), what have been called clitics in other, less fine-grained analyses), and must form a prosodic unit with adjacent material, either preceding or following, in order to be acceptable. (Stressed personal pronouns, on the other hand, have a freer distribution.) The first part of his solution is the Unaccented Pronoun Constraint:

15
(35) A personal pronoun cannot constitute a prosodic phrase by itself unless it bears accent. Connection of a subject or possessive pronoun to a following prosodic host is relatively unconstrained. For object cases, Zwicky proposes the constraint (36): (36) A personal pronoun can form a prosodic phrase with a preceding prosodic host only if: a. the prosodic host and NP constituted by the pronoun are sisters. c. the prosodic host is a lexical category d. the prosodic host is a category that governs case marking. The second two conditions generalize the condition from verb + object cases to preposition + object and adjective + object cases (as in Im nearer her than you are).

The interesting question here is whether the cases in which prosodic combination of a pronoun with its functor are "obligatory" are the same as the ones in which, in our framework, the syntactic operation of attachment is motivated. If so, syntactic attachment may offer a way of describing this constraint, or even a motivation for it. Zwicky points out a number of other kinds of cases besides (33)-(34) where the effect can be observed. A subject pronoun can appear after an auxiliary when the latter is inverted (examples below from Zwicky (1986a)), but in this situation it may not be separated from that verb:12 (37) a. Was he posing on the couch? b. *Was apparently he posing on the couch? (38) a. When did she learn that pigs cant fly? b. *When did supposedly she learn that pigs cant fly. (39) a. He isnt dangerous, is he? b. *He isnt dangerous, is conceivably he?
12Zwicky

also deals with another class of sentences where an unstressed subject pronoun is prohibited, namely inversions: Across the plane came the 20th Century Limited. *Across the plane came it. "Gee whillikers!" exclaimed Ona, with great feeling. *"Gee whillikers!" exclaimed she, with great feeling. and argues that their ill-formedness is due to their having a somewhat different structure from that in (37)-(41), though one that his condition applies to nonetheless. These sentences, unlike the ones discussed above, permit adverbial material, or, for that matter, further auxiliary and main verbs to separate the subject from the main verb, as in (Zwickys examples): "Gee whillikers!" suddenly exclaimed Ona with great feeling. Across the plains would come the train every few days. However, I think there is reason to wonder whether the pronoun unacceptability here is due to the fact that these are presentational sentences (at least the second), whose function is to introduce a new discourse referent. Personal pronouns by their meaning, on the other hand, cannot introduce a new referent. And in fact, I find even a stressed pronoun rather strange here, though Zwickys analysis assume that stressed pronouns should always be acceptable: (*) Around the corner came SHE, of all people! (*) And now "Gee whillikers!" exclaimed SHE, as well.

16
(40) a. Were she prime minister, she would dissolve parliament. b. *Were, I suspect, she prime minister, she would dissolve parliament. (41) a. Not only would he eat the snails, he also enjoyed the brains in black butter. b. *Not only would, however, he eat the snails, he also enjoyed the brains in black butter. But in fact this effect is not peculiar to pronouns; no NP in these positions can be separated by an adverb from its auxiliary, (in contrast to the its uninverted form of the same sentence): (37) *Was apparently Henry Kissinger posing on the couch? Henry Kissinger apparently was posing on the couch (38) *When did supposedly Aunt Susan learn that pigs cant fly. Aunt Susan supposedly did learn that pigs cant fly. (39) *He isnt dangerous, is conceivably Bad Bart? He isnt dangerous, is Bad Bart? Bad Bart conceivably is dangerous, isnt he? (40) *Were, I suspect, Mrs. Gardmore prime minister, she would dissolve parliament. If Mrs. Gardmore, I suspect, were prime minister, she would dissolve parliament. (41) *Not only would, however, Henry eat the snails, he also enjoyed the brains in black butter. Henry, however, would eat the snails, and he even enjoyed the brains in black butter.13 To finish out the range of English cases, we note the familiar facts that prepositions cannot be separated from their objects: (42) *We put the fertilizer on, probably, the table. *The president of, he remarked, Brazil will be there. Nor adjectival objects from their adjectives:

(43) *The dog was near, I thought, the vase. Thus the generalization that is apparently true of English syntax is (44):14
13I

find the judgments in the following example a bit clearer than in Zwickys, for reasons I am not sure of: John obviously has never lied. John has obviously never lied. Never has John obviously lied. *Never has obviously John lied. 14 We will deal with Heavy-NP Shift below, which is an exception to this principle.

17

(44) Whenever a functor combines with a NP argument to its right in English, they combine via the operation of syntactic attachment. And the corresponding prosodic principle is (45), giving us an analysis somewhat similar to Zwickys but stated in terms of attachment rather than tree structure, is (45): (45) An unstressed personal pronoun which has been combined with its functor by the operation of syntactic attachment is only acceptable if it can form a phonological phrase with the lexical head of that functor. Note that there is a naturalness about this: a tighter syntactic binding requires, for certain unstressed items, a tighter phonological binding as well. If this is right for English, one would like to know if the combination is found in other languages. Unfortunately, I do not know of any research investigating a possible connection between cliticization conditions and syntactic "intervention constraints" in other languages. Consider first the verb-particle examples. I assume that verb-particle combinations are single lexical entries which can consist of more than one word; this step is motivated semantically by the fact that many such combinations are not compositional in meaning at all (egg on, buzz off, etc.) or are not completely predictable(e.g. try out, try on, but not try in, try up, etc.). I also assume that lexical entries can consist of more than one word; see Dowty (1979) for discussion. In fact, I will propose that verb-particle constructions each have two kinds of lexical entries, one which consists of a verbal functor and a particle complement, as in (46), (46) Cat TV: {lookTV/Prt, upPrt} and one which consists of the same two words, but in which they are somehow "glued together", either by means of a bound, or by means of attachment: for my purposes here, it doesnt matter: (47) Cat TV: {{lookTV/Prt, upPrt}} These two forms will result in the familiar two forms of verb-particle VPs, the first in (48) (cf. LP principles above), the second in (49): (48) {look+{the, answer}, up} (49) {{look, up}+{the, answer}} Consider what happens in the case of an object pronoun. For the first kind of lexical entry there is not a problem, as the pronominal argument is syntactically attached to the lexical head of its functor and can form a phonological phrase: (50) {look+{it}, up} but in the second the attachment (or bounding) that keeps the particle next to the verbal head keeps the pronoun away from it (and thus violates (45), if we assume with Zwicky that pronouns cannot form phonological phrases with particles, prepositions or parentheticals to the left): (51) {{look, up}+it}

18
For the double-NP constructions, I must make it explicit that I am assuming that in a VP give Harold the oyster, the verb give combines (tectogrammatically) first with its immediately adjacent NP Harold, then with the other object the oyster: this is not the same order as in most other works dealing with grammatical relations and categorial grammar (e.g. Bach 1980, Dowty, 1982a), but grammatical relations can still be defined in terms of the categories of this system, nonetheless, as pointed out in Dowty

(1982b).15 (It would in fact still be possible to treat the pronoun data assuming the NPs to combine the other way, but with a more complicated analysis.16) And since the principle (44) clearly applies to both objects of a two-object verb, we must assume that syntactic attachment is used for both objects of a double-object verb to maintain its full generality. And indeed, separation of either object from the verb by an adverb sounds pretty bad, even though the first object separation sounds worse: (52) a. Harry had undoubtedly slipped Roger the Kiwi sauce. b. *Harry had slipped undoubtedly Roger the Kiwi sauce. c. ?*Harry had slipped Roger undoubtedly the Kiwi sauce. d. Harry had slipped Roger the Kiwi sauce, undoubtedly. If so, the difference between the acceptable and unacceptable derivations is as in (53) (good) and (54) (bad):

(53) { give+{him}+{the,box} }VP { give+{him} }TV {the, box}NP {give}TV/NP {him}NP (54) { give+{Mary}+{it} }VP { give+{Mary} } {it}NP {give}TV/NP {Mary}NP
In the latter case, (54), the pronoun cannot form a phonological phrase with the lexical head of its functor, because the attached NP Mary intervenes; (53) has no such problem.
15 That

is, direct object is redefined as the "first NP argument" that a verb combines with, while subject is the "last NP argument". Rules like Passive, which apply to direct objects (or to direct objects only) can be given a schematized definition that permits "first NP argument" to be affected, over a variety of verbal types. This schematic form for relation-changing rules is motivated by the need to give parameterized definitions of relation-changing rules, as explained in Dowty (1982a). 16 The other method would require us to suppose that the first NP to combine with a ditransitive verb is attached to it: {give} and {a, book} combine to give {give+{a, book}}. Then when the second NP is added, the syntactic operation again being attachment, and requiring us to attach the new NP to the head of the TV, the result is that this new NP "displaces" the old one: {give+{a, book}} plus {Harold} gives {give+{Harold}+{a,book}}. A question for this approach is what happens to the original attachment that is broken: is {a, book} reattached to {Harold}? Some requirement of a link is needed to explain the badness of *I gave Harold it.

19
4.5 Heavy NP Shift Of course, the English direct object need not always appear adjacent to the verb but can, in a process known traditionally as "Heavy NP Shift", move to the right of the clause if it is long or otherwise lengthy ("heavy"): (55) Susan gave to Ellen for her birthday a large purple ceramic tureen with lavender handles. I do not have a great deal to say about this but can suggest two possibilities: (i) syntactic attachment is optional for the TV + NP rule in the case of "heavy" objects, and the LP principles should be understood in such a way as to allow a very heavy constituent to gravitate to the right of the clause, no matter what its category, or (ii) an alternative operation for the TV + NP rule is to bound all the words within the TV and order the object NP after it, i.e. if is the TV and is the NP, the phrase produced is

{ {} << }. One interesting argument for bpth forumulations is that they, together with the syntactic category assignment to verbs given earlier, predict the observation (Bach 1980) that Heavy-NP Shift is possible for sentences with persuade but not promise: (56) a. I persuaded to go to the game all those nutheads who were afraid of a little rain. b. *I promised to go to the party all the people who were afraid I would stay home. This is because Heavy-NP Shift would be tied to the operation for a particular

syntactic category, TV, and as sentences with promise never involve this category, (56b) would not be produced. By the same token, the left-most NP in double-object sentences would be predicted not to undergo Heavy-NP shift under either of these analyses, (57) *I gave a copy of the book everyone I saw today in the office or on the street. since that NP does not participate in the TV + NP rule. It does predict, however, that the second of the two NPs should be shiftable (it is combined via the TV + NP rule), and this seems to be so: (58) I gave Mary yesterday for her birthday the old book I had bought in the used bookstore on 17th Street. (The configurational analysis of Zwicky (1986) shares these same predictions, incidentally.) 4.6 Ordering of Auxiliary Verbs Here, again, I am not so much defending a particular analysis as pointing out various alternatives which exist in a linear-oriented theory. One thing I would like to assume is that a VP with several auxiliary verbs has a flat structure, not a hierarchical one. As is well-known, English auxiliaries appear in one order (Modal, Perfective, Progressive, Main Verb) and in one order only. So we will have derivations roughly like (59): 20 (59) { could, have, been, bathing+{the, cat} }VP {could}VP/VP { have, been, bathing+{the, cat} }VP {have}VP/VP { be, bathing+{the, cat} }VP {be}VP/VP { bathe+{the, cat} }VP {bathe}TV {the, cat}NP

In a flat-structured VP, the most obvious alternative would be to build this into the LP principles: (60) [+V,+M] < [+V,+Perf] < [+V, +Prog] < [+V, -Aux] Note that we cannot thereby dispense with a system of subcategorization and agreement features among the auxiliaries, both because of morphology and semantics: having auxiliaries in the proper order would do little good if the wrong interpretation were assigned. There is a bit of redundancy in (60), therefore. A more interesting method would be to try to order each successive auxiliary verb added in the course of a derivation at the beginning of its phrase, as the LP principle (61): (61) V < X might (at first) seem to do, where X is a variable over syntactic categories. But this is not enough. In a flat VP containing several auxiliaries, (61) would apply to all such verbs equally, once they have been introduced. Could one save (61) by adopting a kind of "freezing principle"? That is, LP principles wouuld be applied, as it were, "immediately", not at the end of the derivation, and once two constituents were ordered with respect to each other, their order could not change? The intent would be that the variable X above stands for everything already ordered by LP principles, and the V the new auxiliary. This however is still not inadequate. The previously added auxiliary verbs would still be indistinguishable in syntactic category from the newly added one, yet LP principles are traditionally understood as depending only on category. What one needs is an LP principle that orders the lexical functor of the current phrase being produced before the rest of the phrase, something like (62): (62) VP/X < Y where VP/X is any verbal category, and VP/X is the lexical head of the current phrase. And this has to be understood in such a way that it can be applied recursively, repeatedly ordering "new" functors before "old" functors.

"Current lexical head" is however a very different notion from the other categories we

21

have used in LP rules: what is current at one stage of the derivation is not current later on, so we are admitting in a roundabout way a feature of tectogrammatical structure---which is hierarchical structure---into the determination of word order, even if the phrase we are building is not to be hierarchical. Is this undesirable? I am not sure. Since both GPSG and HPSG (Pollard and Sag 1987) have found "(lexical) head" to play such an important role, it seems possible that one should admit just this derivational notion without admitting others. The only other way I can see of handling auxiliary order in a flat structure is to make ordering specific to each auxiliary rule, i.e. all such rules are of the form of (63): (63) If VP/VP[+aux], VP, then F2(, ) VP, where F2(, ) = << .

(Note the operation actually orders only before the head of .) This is a less general statement. 4.7 Subject-Predicate Rule I really have nothing to say about this, except that I want to assume that it, like the verbal auxiliary rules, does not introduce bounding. Since English otherwise has functors before arguments, I suppose this rule will involve rule-specific ordering of the argument (or at least its head---see below) to the left of the functor. 4.8 Sentence Adverbs and Parentheticals As mentioned, I have assumed that both sentence adverbs and parentheticals are of category S/S and are thus sentence modifiers semantically, which is appropriate. I will not deal with how parenthetical expressions such as I think, on the other hand, if you dont mind, George, please, etc. are generated. The syntactic operation for combining such an expression with a sentence is the default one17, i.e. "union", or merging. As there are no LP principles applying to adverbs or parentheticals, an adverb is free to "settle" anywhere as long it (i) does not cross a bounding category boundary or (ii) separate attached words. Since the (finite) VP is not bounded, this predicts such adverbs can appear just about anywhere in the VP, except between a verb or preposition and its object, or in a subordinate VP or S (see examples throughout this paper). This is more or less correct, and it is a natural virtue of this kind of theory to deal with this traditionally knotty problem in this simple way. (See below on adverbs and parentheticals in NPs.) 5. Extraposition and the Bounding of NPs As I said at the beginning, an important attraction of this kind of theory is to be able to describe extraposition by the same LP principles that determine the position of complement Ss and PPs in the VP. But in order to determine just how we want to
17 Since

S is a bounding category, this raises the question what exactly happens to its bound when an S/S combines with it. Clearly, we want to make an exception, so that the "outer" S created by this combination is bounded, but the "inner" S is not, to get the right adverb positions. Cf. the similar discussion about modifiers of NP below, so maybe there is a cross-categorial principle about modifiers of bounded categories. Other than just stipulating this exception, another option would be to postulate something like an S-bar node and make it, not S, the bounded category. However, S-bar has no natural definition (and possibly no need) in categorial grammar, at least in the positions where we would need it. Another possibility is general type-raising for adverbs, i.e. S/S ==> VP/VP (==> TV/TV, etc.), which has a number of independent motivations. This would bring S-adverbs inside the S-bound. Possibly Obviously, John hasnt left yet has an S/S adverb (outside the bound) and John obviously hasnt left yet has its adverb in VP/VP.

22 permit NP modifiers to "break away" from NPs and extrapose, and therefore the nature of NP bounding in this theory, we will want to look at some extraposition data. 5.1 What extraposes? First of all, I will not discuss sentences like (64) in this paper: (64) a. That John is asleep is obvious. b. It is obvious that John is asleep. In accord with GKPS (1985), as well as Dowty (1982b), I will assume that the clause that John is asleep in (64b) is an argument of the verb itself, with a "dummy" subject being an independent argument, and thus these examples do not involve anything peculiar in word order or constituency.

Here are, instead, the kinds of things I assume to "extrapose": (65) Relative clauses from subject and objects a. Someone arrived who I have never met before. b. We met a student yesterday at lunch who only recently arrived in this country. (66) Non-Restictive relative clauses a. The President appeared shortly at the party, who had just come from an important meeting. (67) Prepositional phrases from subjects and objects a. Two people were there from Alaska. b. I called some doctors up with late office hours. (68) PPs from nominalizations a. My appointment was at 2 PM with Dr. Smith b. She gave me a picture at that time of Mr. Howard. (69) Predicative Adjectives a. I want to see someone at every window armed and alert. b. Nothing ever shows up on her table even remotely palatable. (both examples from Stucky 1987: 388) (70) Infinitival relative clauses The person didnt arrive to fix the plumbing? 5.2 How are NPs bounded? Were the above examples the extent of the extraposed phrases (and these are certainly the majority), the generalization would be clear: one can extrapose a predicative modifier (i.e. a CN/CN[+prd]) from a NP. It is even observable that non-predicative 23 adjectives (which appear exclusively in pre-nominal position in a NP) cannot extrapose: (71) a. *I saw some senators at the party former and future. b. *It appear I have given the assignment to a fool after all complete and utter. (Stucky) At this point, lets consider how the NP bounding might be described. Since extraposable phrases appear both within their "host" NP and in separated position, I assume NPs will have to have two syntactic analyses, one in which these phrases are somehow connected to the rest of the NP, and one in which they are not (and therefore appear at the end of the clause in accord with LP principles). If one followed Bach and Cooper (1978) in treating restrictive relative clauses (and, shall we suppose, all the above predicative modifiers as well) as of category NP/NP, then a simple analysis of NP bounding is at hand. In the Bach-Cooper analysis, there are two expressions of category NP whenever there is such a modifier, an "inner" and an "outer" one, as in (72): (72) { {someone}NP {whom I have never met}NP/NP }NP If we stipulate that in such cases either one or the other of these NPs can be bounded, but not both, we have the result we need: if the bound is taken to be the outermost NP, the relative clause remains immediately adjacent to the head noun of the NP (as in (73a); but if the bound is taken to be the inner NP, and if subjectpredicate and verb-object operations order (or attach) only the head of the NP, then the remaining parts, namely the modifying relative clause, will, like all other parts of the clause not specifically attached, appear in whatever position the LP rules specify. In the case of a clausal constituent, this is clause-final position, as in (73b): (73) a.{{someone}NP {whom I have never met}NP/NP }NP arrived} b.{{someone}NP arrived {whom I have never met}NP/NP} However, the NP/NP analysis of restrictive relative clauses is not the most

semantically appropriate one. If instead the more familiar CN/CN analysis is chosen, then some way must be found of making all CN/CN modifiers of the CN within a NP "optionally free" from the bound part of the NP at the point at which the NP combines with its functor. For example, one might propose that anything to the right of the CN in a NP is extraposable, anything to the left is not, regardless of category. One interesting argument (or a possible argument) that the distinction between CN/CN and other NP constituents is the right one to invoke here comes from a fact noticed by Ross (1967): when a PP modifiers a CN, the object of the preposition can be extracted if the PP remains adjacent to its head (74a), but not if the PP has been extraposed (74b): 24 (74) a. Who did Bill give Harry a picture of yesterday? b.*Who did Bill give Harry a picture yesterday of? (cf. Bill gave Harry a picture yesterday of his brother wearing a rabbit suit.) c. Who did you witness the arrest of at the station? d.*Who did you witness the arrest at the station of? (cf. I witnessed the arrest at the station of the entire group of demonstrators that had gathered there.) I argued in Dowty (1989) that prepositional NPs attached to nominals should be analyzed as modifiers rather than arguments; even if not as widely true as suggested there, I believe this is sometimes valid. On the other hand, these PPs are in some respects like arguments. It has been suggested a number of times in various theoretical frameworks that a fundamental difference in extractability exists between arguments and adjuncts (or modifiers): one can extract from arguments but not from adjuncts. (Different theories have different ways of capturing such a constraint of course; compare for example Kaplan and Zaenen l989 with Steedman 1985a, 1985b.) Thus, if the PPs accompanying event-nominals and picture-nominals are in principle either adjuncts or arguments of the nominals ambiguously, but if they are only extraposable on their adjunct (i.e. NP/NP) analysis, not when they are arguments then the difference in extractability Ross observed will follow. 5.3 More kinds of extraposition However, there are still more kinds of extraposition we have not yet considered: (75) Noun-complements a. Sue denied the charge vehemently that she had been in the study at midnight. b. The idea appalled Bill that anyone would have considered him appropriate. (76) Too- and enough infinitival complements a. Mary was too tired after all that work to take another step. b. His examples were clear enough to everyone to make it obvious that he had researched the subject. (77) Comparative than- clauses She was wiser when it was all over than anyone would have expected her to be. (78) So and too result clauses a. So many people came to the lecture that I couldnt find a place to sit down. b. He put too much chlorine in the pool for us to swim in it without our eyes burning. 25

Now I can imagine a CN/CN (or NP/NP) analysis being given to the noun complement clauses in (75), though it is not the first analysis that comes to mind. But I cannot envision any modifier analysis at all for the complements of too and enough in (76) or for the than-clauses in (77). But I cannot say such analyses are impossible, either. (I will return to (78) later.) Pending the development of such modifier analyses, I will tentatively conclude that the distinction between extraposable and non-extraposable complements on the basis of predicative modifier status is to be ruled out. (In addition, there are extrapositions which are not from NPs at all: see below.) The alternative I will pursue here is to see if there is evidence that the rest of the NP---the (pre-) determiner, adjective(s) and CN--might all be syntactically attached. And in fact, it seems that such constituents are not separable by parentheticals: (79) a. *The, however, big Dalmatian may take the prize. b. *The judge picked several, he said, collies that were quite impressive. c. *We looked at some large, in any event, dark houses. d. *I talked to almost, apparently, all the students. e. *A very, in my opinion, absurd judgment was rendered. f. *We drank too, possibly, much wine to give a reliable judgment. g. *We drank too much, possibly, wine to give a reliable judgment. Is there a contrast with the post-noun modifiers as to interruptability? While not all examples below are totally felicitous, I think there is a clear difference between those of (79) and those of (80), where something intervenes between the CN and a nonextraposed post-nominal modifier: (80) a. He invited the vice-chairman, I think it was, of the nominating committee to come to the party. b. An undergraduate student, supposedly, who had witnessed the event reported it to him. c. ?Someone, apparently, asleep at the wheel must have caused the accident. d. They dismissed the suggestion, she said, that anyone would protest the action as unrealistic. e. A taller person, if thats possible, than they had ever encountered was coming up the walk. f. Too much wine, possibly, to permit a reliable judgment was consumed on that occasion. This indicates that even where these phrases are NOT extraposed, some sort of syntactic difference exists between them and the pre-CN constituents of the NP.18 Though the other alternatives should perhaps not be completely ruled out, I will adopt at present the hypothesis that an attributive adjective is syntactically attached to its
18 Coordinated

adjectives within NPs are an exception to these general principles, as one can sometimes felicitously insert parenthetical material after the and: A handsome and, if Im not mistaken, expensive overcoat was lying on the sofa. But if coordinated conjuncts are each bounded expressions (as we would need to say in this approach for several reasons, such as producing the right word order within coordinated VPs and in non-constituent coordinations), then there is no contradiction in saying that a coordinated attributive adjective expression is attached to its CN (and the determiner subsequently attached to it), even though there are unattached expressions within the coordinated adjective expression; that is, the coordination construction itself can be assumed not to involve attachment but only ordering.

26 CN and that a determiner is syntactically attached to its following material (i.e. lexical or phrasal CN)---as one might expect it to be, in this approach, because of the phonological status of determiners as clitics dependent on following material. We have thus motivated this situation: (81) The pre-CN elements of a NP are syntactically attached to each other and to the CN; the post-CN elements (both CN-modifiers and others) are not syntactically attached. Because of this linking of the pre-CN elements, the group will inevitably behave like a

traditional "constituent", whether the category NP itself is a bounded category or not.19 This makes possible a straightforward treatment of the bounding problem for NP and therefore for extraposition (from NP): (82) The category NP in English is optionally bounded: both derivations in which NP expressions are treated as bounded and those in which they are treated as unbounded are well-formed. If a NP is taken to be bounded, an unextraposed sentence is derived; if the NP is taken to be unbounded, the extraposed version is produced, and produced automatically.20 Since the VP, as I have tried to show above, with which a subject NP combines is not bounded, the unattached modifying PP, AdjP or relative clause of the NP become "constituents" of the clause itself by the default "union" operation, as in (83): (83) { a+woman, arrived, who+{John knew} }S { a+woman, who+{John knew} }NP {arrived}VP {a}NP/CN { woman, who+{John, knew} }CN {woman}CN { who+{John, knew} }CN/CN
19 Though

Consequently, the LP principles which order the rest of the clause will also be responsible for the linear order of the "extraposed" relative clause, PP or AdjP: in
I will not discuss them in detail, it can be noted briefly that this treatment produces a "wrapping" effect in several other CN and Adj modifier constructions, e.g. (i) taller than Susan ==> taller woman than Susan (ii) easy to please ==> easy man to please (iii) too to enter ==> too tall to enter I assume the attachment operation that attaches attributive modifiers to nouns (adjectives) is really head-attachment. Thus if the adjective (or Adj/Adj) is a phrasal one, the head is attached to the left of the CN (or Adj) by syntactic head attachment, and the rest of the adjective phrase is unattached and appears whereever the LP principles order it. In the above examples, the phrases are PPs or infinitive VPs, so they follow it. This treatment also predicts the possibility of extraposing these phrases. (Why the complements of easy-adjectives do not extrapose, I cannot explain presently.) 20 The present account makes no provision for NPs in which one modifier has been extraposed and another left behind (remaining with the NP). If such sentences exist, and they apparently do, a more sophisticated definition of "optionally bounded" must be worked out.

27
general, these will be at or near the end of the clause. Since direct objects are not part of a bounded TV constituent in English, nor are objects of prepositions so confined (I assume a preposition is united to its NP object via syntactic attachment), choosing the unbounded option for these NPs will likewise free their post-nominal constituents to be positioned within the clause as a whole by the LP principles. Its time to make revisions to bring a couple of formal definitions up to date. Above, I have mentioned the category NP as if it were always a bounded category, when discussing attachment, ordering and LP principles. The syntactic operation of attachment has already been defined so that when an unbounded functor attaches to an argument, it is actually the head of the functor that "attaches" (becomes linearly inseparable). Now we must do the same thing for the argument: (84) Attachment (redefined): the operation + is defined as specifying that the

expression (if is bounded), or the lexical head of (if is unbounded), or and any material already attached to , is ordered to the left of, and is subsequently inseparable from, (if is bounded) or the head of , or and any material already attached to . A parallel change must be made in the syntactic ordering operation "<<" to linearly order only the lexical heads of unbounded expressions (without thereby ordering their attachments), as well as ordering bounded expressions in the way already described. A similar change must be made in the interpretation of LP principles: (85) If an LP principle mentions a category A, and A is not a bounded category, then the interpretation of the principle is that only the head of an expressions of category A, together with any material syntactically attached, is to be ordered as the LP principle specifies for A itself. Two final comments for this section: extraposition also occurs from adjective phrases, for example:

(86) a. I am anxious to meet someone from there right away. b. I am anxious to meet someone right away from there. As far as I can see, these can be treated satisfactorily just by making the category adjective phrase, as well as NP, optionally bounded. The second comment is that at least one class of potentially extraposable complements exists which does not extrapose: the complements of pre-nominal "tough"-adjectives: (87) a. He seemed to be an easy person to talk to when I met him. b. ?*He seemed to be an easy person when I met him to talk to. Yet the complement seems quite interruptible by parentheticals: (88) a. He was an easy person, I thought, to talk to. Possibly the difference relates to the fact that here the extraposed phrase is a complement to an attributive adjective, not to the CN itself. Yet too Adj and Adj 28 enough complements and comparative than-clauses are arguably complements of attributive adjectives, and they extrapose. I have no solution to propose to this at the moment. 6. Where do extraposed phrases end up? The matter of the resulting position of extraposed phrases is the main point of comparison I want to make between the present theory and other linguistic theories of extraposition. First of all, however, notice that in this theory one cannot raise the question whether an extraposed phrase is (phrase-structurally) attached to the VP node or the S node of a clause, though this is a matter which has been much disputed in the literature. If the category VP is not bounded, as I have argued, membership in the VP is membership in the S and vice-versa. (I personally suspect this particular simplification of theoretical options is a fortunate one.) Second, note that the "upward boundedness" of extraposition (Ross 1967) is captured, since we obviously need for S to be a bounded category in English for various reasons.21 6.1 Multiple Extrapositions Extraposition of multiple modifiers from the same NP is possible (Stucky 1987, 390395), though such examples are usually awkward. If one is a PP and the other a relative clause, usually only the extraposed order PP - S is acceptable: (89) a. And then, a man suddenly appeared at the door from the CIA who I had seen the previous week. b. ??And then, a man suddenly appeared at the door who I had seen the previous week from the CIA . (90) a. Surprisingly enough, several books have appeared over the years by that author that had fewer that 300 pages. b. ??Surprisingly enough, several books have appeared over the years that had fewer that 300 pages by that author. [NB: avoid irrelevant reading where author writes less than whole book]. (91) a. Can you give me the names of any newcomers as soon as possible from Finland who may have programming experience? b. *Can you give me the names of any newcomers as soon as possible who may have programming experience from Finland? (examples from Stucky 1987: 391-392) The appeal to LP rules in describing extraposition predicts this, as English has the LP rule PP < S. It is not clear to me how a "landing site" theory of extraposition (Baltin 1982) or other movement theory could predict this effect. Note that an appeal to a preference for

nesting in multiple dependencies is not possible: if these modifiers were adjacent to their head NPs, their order would likewise be PP before relative clause, so the
21 As

already mentioned, I do not attempt to deal with leftward extraction in this paper, though some other theories of leftward extraction are not incompatible with what I am proposing here: cf. footnote 2.

29 hypothesized relationships between "fillers" and "gaps" in such a movement analysis would have to be the unnested ones, not the nested ones, for the acceptable examples. If however the PP is made especially long or "heavy", then the ordering in which the PP is on the right can be acceptable: (92) a. And then a man suddenly appeared at the door whom I had seen last week from that organization that we all know but which will go unnamed here. b. Surprisingly enough, several books have appeared over the years that had fewer than 300 pages by that author whom we all know to be especially long-winded. c. Can you give me the names of any newcomers as soon as possible who know LISP from any of the countries on our list of overseas development targets? But this too is just the effect one finds in PPs and Ss that originate in the VP: (93) a. We announced to the students that the exam was over. b. ?We announced that the exam was over to the students. c. We announced that the treaty had been signed to all the dignitaries and reporters who were waiting in the outer hall. The point is not to delve into the well-known but mysterious trade-off between grammatical category and length ("heaviness") in the ordering principles for the English VP. I dont know the solution to this problem, whether there is a "basic" versus "heavy" order, nor even whether there are actually any order principles based on categories, but it doesnt matter here. The point is rather that whatever solution is found to this problem, this theory will allow us to describe the ordering of extraposed complements by the same principles as those for VP-original clause elements, which looks as if it is the right thing to do. 6.2 Multiple Extrapositions and Nesting As Stucky (1987: 393) points out, extrapositions from two different noun phrases in the same clause are awkward at best and then only acceptable when they are (i) in a "nested" relationship to their hosts22, and (ii) preferably of different categories. Her nested example, (modified to remove an ambiguity she did not intend) is (94): (94) Improbable as it may seem, an impeccably dressed man struck up a conversation with me on the plane last month about Situations and Attitudes who was going to Missoula, Montana. [PP, Relative clause] Example (95) (from Stucky, likewise modified), is the same except for "unnested" hostmodifier relationships;
22 Note

the difference between the hypothetical "nesting" explanation rejected for (89)-(91) above and those discussed here (and in the following section): in (89)-(91) we were talking about two predicative modifiers of the same NP. Even if it made sense to talk about the "original" vs. "final" order of the modifiers (which I am claiming it doesnt, since no movement is involved), "nesting" could not make any difference, since the semantic interpretation would be the same no matter in which order the two modifiers were combined. In this example, we are dealing with modifiers of two different NPs (or later, of an NP vs. of a complement of a verb), so getting the right element with the right head is crucial for the right semantic interpretation.

30 (95) *Improbable as in may seem, an impeccably dressed man struck up a conversation with me on the plane last month who was going to Missoula, Montana about Situations and Attitudes [Relative clause, PP] it is worse, and for me, completely ungrammatical. The following, adapted from Stuckys example, is about the best I can construct with extraposed relative clauses from different NPs (nested order): (96) ??Improbable as in may seem, an impeccably dressed man struck up a

conversation with me on the plane last month that was about Situations and Attitudes who was going to Missoula, Montana. Why relative clause extrapositions from two NPs should be impossible while two extrapositions from the same NP are at least marginally acceptable, I have no idea, unless this is entirely due to parsing difficulties; there is no obvious way to block the former at all in this approach. 6.3 Extraposition and VP Constituents Stucky also offers, from Henry Thompson and Mark Liberman, an example in which the extraposed relative clause does not make it all the way to the end of the clause, appearing instead before a (very heavy) PP that modifiers the verb: (97) A man arrived that I had been expecting at the time he had said he would come. (Relative, PP) This particular example is dubious, as the phrase at the time... can be parsed as an Smodifier as well as a VP-modifiers (cf. The mans arrival occurred at the time he had said...) and therefore not necessarily an element of the clause. But because of the theoretical significance of this kind of example (cf. below) it is relevant to try to construct others: (For clarity, the first extraposed phrase is underlined, the second marked in italics.) (98) a. Some guy was hired that Sue knows to fix the plumbing and the air conditioning. [Relative clause, infinitival purpose clause] b. George actually convinced somebody this morning who neither of us had ever seen before to buy 600 shares of stock in our new company for $5 a share. [Relative, subcategorized infinitival] c. Lois told a woman today from Finland about the great weather in Central Ohio. [PP, subcategorized PP] d. This event is harder right now than anyone would have imagined for him to just accept as if nothing had happened. [than-S, toughcomplement] (99) By the way, that fellow cabled headquarters whom none of us had ever met that he would be arriving in Paris on Tuesday. (Stucky 1987: 396, from Stuart Shieber.) [Relative, subcategorized that-clause] I find such examples to be of reduced acceptability, but definitely not ungrammatical. But there is a good reason for this reduced acceptability: these are examples of a kind 31 of double dependencies. Though there is no second instance of a "filler-gap dependency" in the traditional sense here, the meaning of the verb (or adjective) would, in the categorial analysis, have to combine compositionally with that of the italicized complement before the whole verbal meaning combined with the meaning of the subject (or object), but the latter must have combined with its italicized modifying phrase before it does this. To the extent that one can categorize, the dependency is "nested" in (98b,c) and "unnested" in (98a,d) and (99). I suspect it is because the categories of the two nested pairs are different from each other here that their unacceptability here is not as great as that of (96) above or the more familiar unnested double leftward NP extractions examples, where the parser has to fight ambiguity as well. These examples, even with their reduced acceptability, are significant for theories which postulate a fixed site for extraposition to move to, and particularly for those that assume extraposed clauses from subject position (as (97) and (98a) are) are attached to S; in these examples, constituents which cannot possibly be dominated by S follow the extraposed element. Again, if extraposed elements are free among VP and S constituents to be ordered only by their heaviness and/or category, such examples are just what we expect to find. One possible objection to such examples is that here the constituent to the right of the

extraposed one has been moved there by some kind of Heavy Constituent Shift. This is not an option of course in accounts that claim that subject relatives are attached to S and Heavy Shift keeps elements within the VP. But it is implausible in any movement theory because the putatively shifted constituent need not really be heavier than what it moves over: (98c) Lois told a woman today from Finland about the news. Still, this possibility deserves more study. 6.3 So....that... Result Clause Constructions All that has been said above about nesting and multiple extrapositions is apparently contradicted by combinations of relative clause extraposition with (what I will call) so...that... Result Clause Constructions, exemplified in this much-discussed example of Williams (1974): (100) Everybody is so strange whom I like that I cant go out in public with them. First of all, there seem to be two extraposed clauses, originating from different hosts, yet the sentence has none of the awkwardness of the multiple clausal extrapositions we saw earlier (94, 98d, 99). More striking, the order of the extraposed clauses in (97) is the unnested order (the "host" of the WH-clause is the subject, that of the thatclause is so); the opposite, "properly nested" order is abysmally bad: (100) *Everybody is so strange that I cant go out in public with them whom I like. This then is a considerable problem for the hypothesis that extraposition in English can be viewed as a consequence of LP-principles, unbounded NPs, and "flat" clausal structure. 32 Gueron and May (1984) have studied the so...that... result clause construction in detail and have observed several properties relevant to us (among other properties). First, pronouns in the main clause can have their antecedents in the that-result clause (101a), while this is not possible in a superficially similar extraposed relative clause (101b): (101) a. I told heri that so many people attended last years concert that I made Maryi nervous. b. *I told heri that many people attended last years concert who made Maryi nervous. (Gueron and May, 1984, 14) Second, it has been known since Ross (1967) that extraposition is upward-bounded by S (cf. 102b); this appears not to be the case for the that-result clause in (102a): In fact, the result-clause has "moved" out of a syntactic island here: (102) a. Critics who have reviewed so many books were at the party that I didnt have time to speak to them all. b. *Critics who have reviewed many books were at the party which Ive enjoyed reading. (Gueron and May 1984, 18). Third, and perhaps most revealing for our purposes, it has been noted by a number of investigators (Hankamer and Sag 1975, Andrews 1975, Libermann 1974, Baltin 1982), result clause can have more than one antecedent so-phrase in the host clause: (103) So many people liked so many paintings at the gallery that the exhibition was held over for two weeks. And the number of so-phrases need not stop at two; the following is fanciful but in no way ungrammatical or uninterpretable: (104) So many football fans brought so much beer and so much popcorn in so many knapsacks with so many rips in them in such drunken stupors that a bulldozer had to be brought in to clear the parking lot. Consider the semantics of such examples, e.g. (105):

(105) John ate so much caviar in 10 minutes that he got sick. They express a causal relationship, but not a causal relationship that can being described as holding between the host NPs referent and the event described in the result clause: (105) doesnt really say that some quantity of caviar made John sick, or even that his eating that quantity made him sick, but rather it was his eating it in this period of time that made him sick. In other words, the quantity of caviar such that John ate it in ten minutes caused, or brought it about, that he became sick. The causal relation is between the extent(s) to which something or other obtained in the event described by the first clause, and the event described by the second. (Note on the other hand that the compositional semantic effect of a relative clause on its matrix clause is, in a sense, confined to the denotation of its host NP.) Consider a double-host example like (103): not only can the causal relation not be described without reference to the event reported in the entire first clause, the relevant extents cannot be separated from one another: (103) does not assert that the number of people that liked many paintings caused the exhibition to be held over, nor that the number of paintings liked by many people caused this; rather the combined extents x 33 and y such that x people liked y paintings caused the extension.23 What this indicates, briefly, is that a compositional semantics must involve multiple variable-binding over the first clause, then a causal relation between this and the (event expressed by) the second clause is asserted. Thus (following a series of predecessors) I claim that no "extraposition" in the sense of this paper is involved in that-result clauses. The analysis I am proposing actually has much in common with Gueron and Mays (1984) (but see note 22), though as it is proposed within a very different theory, it looks superficially quite unlike it. The idea, basically is that so-phrases have unbound variables in them which must be bound at a higher level, specifically a sentential level. The result clause rule then takes a first clause with one or more of these variables, binds it/them semantically, combines it syntactically with a result clause, and semantically has the interpretation that there is a causal relation between the two events. Assuming we dont want to just permit the result-clause rule to apply vacuously, i.e. when there are no sos in the first clause as well as when there are,24 we will need something like a feature [+so] to be passed up from the so-phrase to the sentence containing it; I will assume some mechanism of this sort. The syntactic rule is then (106): (106) So...That...Result Clause Rule If S[+so], then Fi( ) S/S[that], where Fi( ) = That is, the initial clause in a result construction becomes a subordinate clause, though one that takes a that-clause as its argument to form a complete sentence;25 Also, I will assume a lexical expression so, probably a determiner modifier, i.e. of category Det/Det (but with restrictions I wont attempt to go into here), and a translation involving a variable xi over "extents". I will indicate the semantic interpretation rule informally: (107) If translates into then F10( ) translates into that function on sentence meanings such that: the extent x0 and....and the extent xn to which causes it to be the case that . Now, let us return to the properties of result clauses observed by Gueron and May. The fact that the rule permits multiple so-phrases is of course captured by introducing so-phrases independently of the result-clause in the syntax; the right
23 Note,

therefore, that analyses which treat multiple so-sentences simply as having multiple, independent so-operators in semantic interpretation or in their logical form, without summing the extents somehow to get the relevant causal force, would not get the meanings of the sentences right.

This would apprear to be a problem with Gueron and Mays (1984) analysis, for example. 24 The only reason I can think of that we would want this is to capture sentences, known to me only in (King James) biblical English, such as She brought us bread, that we might eat, which looks suspiciously like a result clause (in the subjunctive) without the preceeding so phrase. But I have not had a chance to check the history of the construction. 25 The only reason I have given the syntax in this form is because of the anaphora data in (98); otherwise, the syntactic rule could, more simply, take two sentences and to form { that+ }. For all I know, anternative accounts of the anaphora data could be given which are compatible with this structure. See Gueron and May (1984), who also use a structure somewhat like that in (103), for some discussion.

34 interpretation is derived because it is the result-clause rule which binds the (one or more) variables and supplies the causal relationship. The observed fact that result-clauses "escape" syntactic islands and fail to be upward bounded follows here because there is no "movement" of result clauses at all (as in a traditional analysis), hence no movement out of syntactic islands. Rather, only the binding of variables connects the result-clause with its so-phrase(s), so there is nothing to prohibit the so-variable from being inside an island. Ordinary extraposed relative clauses, by contrast, are generated as part of their "host" NP, and cannot appear any farther away from that NP than the clause the NP belongs to. Third, the antecedent of an (definite) embedded pronoun can appear in a result clause in (101a), repeated here, (101) a. I told her that so many people attended last years concert that I made Mary nervous. b. *I told her that many people attended last years concert who made Mary nervous. (Gueron and May, 1984) because if the result clause construction is generated by (106), the first clause becomes a kind of subordinate clause to the second, and as the pronoun is in the familiar precede-and-command position that permits coreference, just as it is in, e.g.Because I told her that so many people attended the concert, I made Mary nervous. Finally, this analysis, which I have now tried to motivate on other counts, explains why extraposed relative clauses must appear before result clauses, as in Williamss example, repeated here: (100) Everybody is so strange whom I like that I cant go out in public with them. (100) *Everybody is so strange that I cant go out in public with them whom I like. The result "clause" is indeed an independent clause, outside the bound of the clause in which the so-phrase originates. As the relative clause on the other hand is not independent of its hosts clause and cannot cross its boundary, the relative clause appears first. As result clauses are not extrapositions, it is not surprising that they should not be subject to the same limits on multiple extrapositions as are ordinary extrapositions. Of course, this analysis predicts that that-result clauses should never appear adjacent to their hosts, since they do not "originate" there; in fact, the absence of sentences where theydo appear there (or just past the CN anyway): 35 (108) So many books have been published lately that I havent been able to read them all. *So many books that I havent been able to read them all have been published lately. I persuaded so many students to come to the lecture that we couldnt all fit in the lecture hall . *I persuaded so many students that we couldnt all fit in the lecture hall to come to the lecture. has in fact traditionally been viewed as an embarrassment for the extraposition analysis (Stucky 1987: 388). I find these quite clearly bad, but some may accept them and find them only awkward; Stucky accepts the second sentence in (108).26

This analysis also appears to predict that result constructions could be nested by reapplying the result rule, binding one so-phrase the first time, another the second. Though it is often said in the literature said such examples do not exist, the following kind of example has been pointed out too. Everyone finds it somewhat awkward, and a few claim it isnt possible at all, though I find it not very bad: (109) So many mothers complained that their children ate so much of the candy that they got sick that the manufacturers took it off the market. (Note the relationship of so-phrases here, which contrasts with the "multiply-headed" result clauses discussed earlier: the outer result clause is a quantification of the sentence x complained that xs child ate so much of the candy that it got sick which contains an internal result clause that is semantically independent of the outer one.) 7. Conclusions It is hard to convincingly present and motivate a new theory and a novel analysis of a well-known language in a single paper, and I wont pretend I have done so. But I hope I have shown there might be some interest in exploring the idea that a familiar language could have much less constituent structure than supposed, if one simultaneously admits the possibility that some words and phrases could be more tightly connected than others, and that LP-principles apply across a languages in GPSG fashion. Of more interest to me than persuading linguists of all this, however, is to getting linguists to question our automatic assumptions about constituents and our basis for assuming as a methodological principle that languages must always have a phenogrammatical syntactic structure describable by phrase structure trees.
26 Stucky

suggests (p. 388) that this sentence is also accepted by May and Gueron, but they exhibit it only as a D-structure (p. 3, 8b); there seem to be no non-extraposed (S-structure) examples in their paper.

36 REFERENCES
Andrews, Avery 1975 Studies in the Syntax of Relative and Comparative Clauses. Cambridge: MIT Linguistics Dissertation. Bach, Emmon 1980 In Defense of Passive, Linguistics and Philosophy, 3: 297-342. Bach, Emmon 1983 On the Relationship between Word Grammar and Phrase Grammar, Natural Language and Linguistic Theory 1.1: 65-90. Bach, Emmon, and Robin Cooper 1978 The NP-S Analysis of Relative Clauses and Compositional Semantics, Linguistics and Philosophy, 2:145-150. Bach, Emmon and Barbara H. Partee 1980 Anaphora and Semantic Structure, Papers from the Parasession on Pronouns and Anaphora. Chicago: Chicago Linguistic Society, 1-28. Baltin, Mark 1981 Strict Bounding, in: Carl L. Baker and John McCarthy (ed.), The Logical Problem of Language Acquisition. MIT Press. Baltin, Mark 1982 A Landing Site Theory of Movement Rules, Linguistic Inquiry 14: 1-38. Curry, Haskell B. 1963 Some Logical Aspects of Grammatical Structure, in: Jacobson (ed.), Structure of Language and its Mathematical Aspects: Proceedings of the Twelfth Symposium in Applied Mathematics. American Mathematical Society, 56-68. Dowty, David 1982a Grammatical Relations and Montague Grammar, in: Pauline Jacobson and Geoffrey Pullum (ed.), The Nature of Syntactic Representation. Dordrecht: Reidel, 79-130. Dowty, David

1982b More on the Categorial Theory of Grammatical Relations, in: Annie Zaenen (ed.), Subjects and Other Subjects: Proceedings of the Harvard Conference on the Representation of Grammatical Relations. Bloomington: Indiana University Linguistics Club, 115-153. Dowty, David 1989 On the Semantic Content of the Notion "Thematic Role", in: Barbara Partee, Gennaro Chierchia and Ray Turner (ed.), Properties, Types and Meanings: vol. II. Dordrecht: Kluwer, 69-130. Dowty, David R. 1979 Word Meaning and Montague Grammar. Dordrecht: Reidel.

37
Gazdar, Gerald, and Geoffrey K. Pullum 1981 Subcategorization, Constituent Order, and the Notion "Head", in: M. Moortgat, H. van der Hulst, and T. Hoekstra (ed.), The Scope of Lexical Rules. Dordrecht: Foris. Gazdar, Gerald, Ewan Klein, Geoffrey Pullum and Ivan Sag 1985 Generalized Phrase Structure Grammar. Harvard University Press/Blackwells. Gueron, Jacqueline and Robert May 1984 Extraposition and Logical Form, Linguistic Inquiry 15: 1-32. Hankamer, Jorge, and Ivan Sag 1975 Result Clauses in English: So-Flip or So What?, paper presented at NWAVE 4, Georgetown University, Washington, DC. Kaplan, Ronald M. and Annie Zaenen 1989 Long-Distance Dependencies, Constituent Structure, and Functional Uncertainty , in: Mark Baltin and Anthony Kroch (ed.), Alternative Conceptions of Phrase Structure. Chicago: The University of Chicago Press, 12-42. Karttunen, Lauri 1989 Radical Lexicalism, in: Mark Baltin and Anthony Kroch (ed.), Alternative Conceptions of Phrase Structure. Chicago: The University of Chicago Press, 43-65. Keenan, Edward 1974 The Functional Principle:Generalizing the Notion of"Subject of", Proceedings of the Tenth Annual Meeting of the Chicago Linguistics Society. Chicago: University of Chicago, 298-303. Liberman, Mark 1974 Conditioning the Rule of Subject-Aux Inversion, in: Ellen Kaisse and Jorge Hankamer (ed.), Proceedings of the Fifth Regional Meeting of the Chicago Linguistic Society. Department of Linguistics, University of Chicago. Pollard, Carl 1984 Generalized Phrase Structure Grammars, Head Grammars, and Natural Languages. (Stanford University Linguistics Dissertation) Stanford: Stanford University. Pollard, Carl and Ivan A. Sag 1987 Information-Based Syntax and Semantics, vol. I. CSLI Lecture Notes No. 13 Stanford: Center for the Study of Language and Information . Postal, Paul 1974 On Raising. Cambridge: MIT Press. Powers, Joyce 1988 Exceptions to ECPO, New Orleans: Paper Presented to the 63rd Annual Meeting of the Linguistic Society of America.

38
Pullum, Geoffrey 1982 Free Word Order and Phrase Structure Rules, in: James Pustejovsky and

Peter Sells (ed.), Proceedings of the 12th Annual Meeting of the North Eastern Linguistic Society. Amherst: Department of Linguistics, University of Massachusetts, 209-220. Reinhart, Tanya 1979 Syntactic Domains for Semantic Rules, in: F. Guenthner and S. J. Schmidt (ed.), Formal Semantics and Pragmatics for Natural Languages. Dordrecht: Reidel. Ross, John R. 1967 Constraints on Variables in Syntax. MIT Linguistics Dissertation [Available from Indiana University Linguistics Club]. Steedman, Mark J. 1985a Dependency and Coordination in the Grammar of Dutch and English. Language 61: 523-568. Steedman, Mark J. 1985b Combinators and Grammars. in: Richard T. Oehrle et al, Categorial Grammars and Natural Language Structures, Dordrecht: Kluwer. Stucky, Susan 1981 Free word order languages, free constituent order languages, and the grew area in between, NELS 11, 364-376. Stucky, Susan 1987 Configurational Variation in English: A Study of Extraposition and Relation Matters, in: Geoffrey Huck and Almerindo Ojeda (ed.), Discontinuous Constituency. Syntax and Semantics 20. Orlando, etc.: Academic Press, 337-405. Williams, Edwin 1974 Rule Ordering in Syntax. Cambridge, Massachusetts: MIT Linguistics Dissertation. Zeevat, Henk, Ewan Klein, and Jo Calder 1987 Unification Categorial Grammar, in: Nicholas Haddock, Ewan Klein, and Glyn Morrill (ed.), Edinburgh Working Papers in Cognitive Science I. Edinburgh: Centre for Cognitive Science, University of Edinburgh, 195-222. Zwicky, Arnold 1986 Concatenation and Liberation, in: Anne M. Farley et al (ed.), Proceedings of the Twenty-Second Regional Meeting (General Session) of the Chicago Linguistics Society Chicago: Chicago Linguistics Society, 65-74. Zwicky, Arnold 1986a The Unstressed Pronoun Constraint in English, in: Arnold Zwicky (ed.), Ohio State University Working Papers in Linguistics 32. Ohio State Department of Linguistics, 114-124. Zwicky, Arnold M. 1982 Stranded to and Phonological Phrasing in English, Linguistics 20: 3-57.

39 40

A minimalist theory of A-movement and control


M. RITA MANZINI & ANNA ROUSSOU Abstract
In this article, we point out some problems in the theory of A-movement and control within Principles and Parameters models, and specifically within the minimalist approach of Chomsky (1995). In order to overcome these problems, we motivate a departure from the standard transformational theory of A-movement. In particular, we argue that DPs are merged in the position where they surface, and from there they attract a predicate. On this

basis, control can simply be construed as the special case in which the same DP attracts more than one predicate. Arbitrary control reduces to the attraction of a predicate by an operator in C. We show that the basic properties of control follow from an appropriate Scopal version of Chomskys (1995) Last Resort and MLC and from Kaynes (1984) Connectedness, phrased as conditions on the attraction operation, or technically ATTRACT. Our approach has considerable advantages in standard cases of A-movement as well, deriving the distribution of reconstruction effects at LF and of blocking effects on phonosyntactic rules at PF.

1 Classical theories of A-movement and control and their problems According to Chomsky (1981, 1982) the combination of anaphoric, pronominal features yields four different types of empty categories: A-trace, A-trace, pro and PRO. PRO is identified with the +anaphoric, +pronominal empty category, which is subject to
Earlier versions of this paper were presented at the Conference on Syntactic Categories (Bangor, June 1996), the Girona Summer School in Linguistics (July 1996), the Research Seminar in York (January 1997), UCLA (February 1997), GLOW 20 (Rabat, March 1997), Scuola Normale Superiore (Pisa, March 1997), Lezioni di Dottorato, Universit di Firenze e Siena (April 1997), MIT and the University of Maryland (October 1997). We are particularly grateful to the following people for comments and suggestions: Maya Arad, Elabbas Benmamoun, Valentina Bianchi, Bob Borsley, Misi Brody, Guglielmo Cinque, Annabel Cormack, Caterina Donati, Ken Hale, Ursel Luhde, Anoop Mahajan, Bernadette Plunkett, Luigi Rizzi, Ian Roberts, Heloisa Salles, Leonardo Savoia, Dominique Sportiche, Tim Stowell, Peter Svenonius, Anna Szabolcsi, and George Tsoulas. 404 Manzini & Roussou

both Principles A and B of Binding Theory. If both of these apply to a given instance of PRO, a contradiction arises, since Principle A requires PRO to be bound locally, i.e. in its governing category, and Principle B requires it to be free in the same domain. Therefore PRO can only be found in positions for which the notion of governing category is not defined, i.e. in ungoverned positions. In this way, no contradiction arises under Binding Theory. This result is known as the PRO Theorem. Thus under Government and Binding (GB), the distribution of PRO is predictable on the basis of independently motivated assumptions about empty categories and their binding properties. Within the framework of Chomsky (1995) the theory of empty categories just sketched is effectively abandoned. In particular, traces of A-movement are construed as copies of the moved material, rather than as -anaphoric, -pronominal empty categories, i.e. Rexpressions, as in the previous GB framework. Evidence from reconstruction phenomena provided by Chomsky (1995) favors this view. Similarly for Chomsky (1995), A-movement leaves traces that are copies, though we shall return shortly to the lack of evidence for reconstruction in these cases. Thus the classification of empty categories on the basis of anaphoric, pronominal features collapses. Independent

developments in the theory also undermine the notion of government, as well as the idea that there exists a Binding Theory module within the grammar. In particular, properties of lexical anaphors can be derived via movement, as argued by Pica (1987), Reinhart and Reuland (1993). Assuming that traces are copies nevertheless leaves pronominal empty categories, namely pro and PRO, to be accounted for. In the present paper we will leave pro and the parametrization issues it implies aside, and we will therefore concentrate on PRO; for a compatible theory of pro see Manzini and Savoia (forthcoming). Chomsky and Lasnik (1995) seek to maintain the basic descriptive generalization according to which PRO is found only in the subject position of non-finite sentences. According to their analysis PRO is associated with a special type of Case, called null Case. Since null Case is checked by only non-finite I, it follows that PRO is found only in the Spec of this latter category. Furthermore, because null Case is like any other Case from the point of view of Move, we also derive that PRO behaves like a lexical argument under A-movement. Thus in (1) PRO is generated in the internal argument position of kill and raises to [Spec, to] to check its (null) Case: (1) John tried [ PRO to be [killed PRO]] A standard set of predictions about the distribution of PRO that followed from the PRO Theorem of Chomsky (1981), now follow from the stipulation of null Case. In particular
A-movement and control 405

PRO cannot be found in object position nor in the subject position of tensed sentences, as in (2)-(3) respectively: (2) *John persuaded PRO (3) *John believes that PRO will eat It seems to us that this approach has a number of problems. To begin with, null Case and PRO are only ever seen in connection with one another. In other words, there is no independent way of establishing the existence of either. Thus null Case does not appear to provide a genuine explanation for the distribution of PRO, but rather a way of stating the descriptive generalization concerning its distribution. This represents a step back with respect to the GB framework, where the distribution of PRO did indeed follow from the interaction of independently motivated assumptions, even if their abandonment seems now more than justified. What is more, the minimalist approach to control inherits a number of problems from the GB analysis, not surprisingly since it essentially adapts it. In particular, as pointed

out by Chomsky (1981), the distribution of PRO represents only one of the empirical facets of the theory of control. Even assuming that this can be accounted for in terms of null Case, the distribution of the antecedents for PRO, i.e. control proper, remains to be explained. The basic descriptive generalization concerning control, which appears to be accepted by Chomsky (1981), is Rosenbaums (1967) Minimal Distance Principle (MDP), according to which the rule of Equi-NP-Deletion, or the empty category that succeeds it, namely PRO, is controlled by the closest available antecedent, where closest is defined in terms of c-command. Thus in (4), the matrix object, John, obligatorily controls PRO; in (5) the intermediate subject, John, rather than the matrix one, does: (4) Mary [persuaded John [PRO to eat]] (5) Mary thinks that [John expected [PRO to eat]] A small class of English verbs, including promise, provide a counterexample to this generalization, as exemplified in (6), where the matrix subject rather than the matrix object controls PRO: (6) I promised John [PRO to eat]
406 Manzini & Roussou

This potential problem may be solved, however, by an articulate approach to the internal argument structure of persuade-type and promise-type verbs, of the sort proposed for instance by Larson (1991). Within the GB framework of Chomsky (1981), there is no obvious way to reduce the MDP to some independently needed principle. Thus Manzini (1983) proposes a general resystematization of control theory, whereby PRO is taken to be a pure anaphor and its distribution determined by the lack of Case, rather than by the PRO Theorem. As an anaphor PRO is then subjected to a (suitably modified) version of Principle A. This theory can predict locality effects of the type in (5), though it invokes pragmatics to explain the distribution of control within the argument structure of the verb, as in (4) vs. (6). This latter assumption can however be dispensed with, given precisely the VPshell conception of Larson (1991) just referred to. As we have already mentioned, within the minimalist framework a number of authors in turn take anaphoric binding to reduce to Move. Therefore any attempt at reducing control to binding would appear to translate into a reduction of control directly to movement. This step appears to be attractive to the extent that the locality condition on control, the MDP, is based on exactly the same notion of closeness as the basic locality condition on movement, i.e. the Minimal Link Condition (MLC). But such a radically minimalist approach to control theory is not explored by Chomsky and Lasnik (1995).

In recent years we know of at least three different proposals that explicitly subsume control under movement. Martin (1996) maintains along with Chomsky and Lasnik (1995) the existence of a null Case-marked PRO which, however, is subject to abstract movement exactly like a reflexive in the models of Pica (1987), Reinhart and Reuland (1993) mentioned above. ONeil (1995) and Hornstein (1996) subsume control under overt DP-movement; thus the controller is generated in thematic position within the control sentence and then moves to its surface position through a higher thematic position. All of these theories radically differ from proposals, such as Bresnans (1982), under which there is no syntactically represented subject at all for non-finite sentences. Bresnan (1982) goes on to argue that control into non-finite sentences is the product of a lexical, rather than a syntactic, operation. This approach denies some of the basic tenets of Principles and Parameters theory, in which the lexicon reduces to a list of primitive terms, and all operations relating them are carried out by the syntax. The approach that we intend to take in turn differs from all the above, involving first a departure from the standard transformational theory of A-movement. We therefore turn to A-movement next. Proposing that a DP can move through two thematic positions before reaching its Case position, as Hornstein (1996) does, amounts to a violation of the well-known generalization of Chomsky (1981) whereby movement is never to a theta-position. As
A-movement and control 407

Hornstein (1997) carefully explains, this generalization is based on the GB concept of Dstructure, whose elimination represents one of the qualifying features of minimalism. Thus the fact that D-structure is by definition a pure representation of GF-thetas (Chomsky 1986) forces the conclusion that each theta-role is satisfied by an argument at the point of Merge; this in turn forces the postulation of PRO for the cases where lexical arguments are not available. Once the notion of D-structure is abandoned, this fundamental piece of theoretical justification for PRO becomes lost. To be more precise, Chomsky (1995) holds on to the idea that there is a one-to-one match between arguments and theta-positions, but does so through an extra assumption. While properties of lexical items are in general lexical features, theta-roles are to be construed differently, namely as a configurational relation between a head and its specifier/complement. This configurational theory of theta-roles has the effect of barring movement to a theta-position exactly as Chomskys (1981) Theta-criterion at Dstructure

does. But as Hornstein (1997) also points out, if theta-roles are instead construed as features, the null assumption under minimalist theory, then nothing prevents them from acting as attractors for arguments already merged; this in turn opens the way for an overt movement theory of control. What Hornstein (1997) fails to notice is that these same conclusions have implications that go far beyond control. It is these implications that interest us here. If arguments and theta-roles need not be matched at the point of merger, it becomes possible to assume that arguments are generated not in thematic position but directly in the position where they surface, provided suitable means can be found to connect them to their theta-roles. If theta-roles are indeed features, the simplest such means suggested by the minimalist theory of Chomsky (1995) is Move-F. To illustrate these points, let us consider the simple sentence in (7). In Chomskys (1995) terms, this is associated with a derivational stage of the type in (8a), irrelevant details omitted; by movement of the VPinternal subject to [Spec, I] the final string in (8b) is obtained: (7) John called. (8) a. [VP John called] b. [IP John I [VP John called]] In the alternative terms that we are suggesting here, the derivation of (7) would take a form much more similar to (9). In (9) the subject is merged directly in [Spec, I]. A thetafeature, notated provisionally as q, is moved from V to the IP domain, establishing the relevant thematic interpretation for the subject: (9) [IP John q-I [VP called(q)]]
408 Manzini & Roussou

A number of problems are raised by the derivation in (9), but before considering them, we wish to establish the main empirical reasons that favor the covert movement approach in (9) over the overt movement approach in (8). It is fairly obvious that no considerations pertaining to the operation of movement itself can distinguish the analyses in (8) and (9), since the same constraints operate both on phrasal and F-movement. The only difference between (8) and (9) concerns therefore the nature of the copy, or trace, left behind by the application of movement, which is fully phrasal in (8) and a mere feature in (9). On this basis, the predictions of the DPmovement and the F-movement models empirically differ with respect to reconstruction. Remember that since movement is Copy and Merge, DP-movement to [Spec, I] leaves

behind a full copy of the DP in thematic position, exactly like wh-movement to [Spec, C] leaves behind a full copy of the wh-phrase. In the case of wh-movement this copying of phrasal material has been empirically justified by Chomsky (1995) on basis of the fact that it gives rise to reconstruction effects. Thus in a sentence of the type in (10) the anaphor himself can be interpreted as bound either by John or Bill: (10) John wonders [which pictures of himself [Bill saw which pictures of himself ]] Under reasonable assumptions about anaphoric binding, the reading under which himself is anaphoric to John corresponds to the construal of the wh-phrase in its derived position; the reading under which himself is anaphoric to Bill corresponds to the construal of the wh-phrase in the trace position. Chomsky (1995) himself argues, on the other hand, that DP-movement never gives rise to reconstruction. He accounts for it by assuming that reconstruction is a byproduct of Operator-Variable interpretation at LF. Thus consider a sentence of the type in (11) under the standard DP-movement derivation: (11) *Each other seem to them [each other to work] The ungrammaticality of (11) requires that the conditions concerning anaphoric binding are computed with respect to the derived position of each other. If each other could reconstruct, then we would expect them to be able to bind it, giving rise to a wellformed reading. Crucially, our approach to raising and in general to A-movement predicts data of the type in (11) without need for any additional stipulation about reconstruction. Indeed the derivation of (11) takes the alternative form in (12): (12) *Each other [q-I seem to them [ to [work (q)]]]]
A-movement and control 409

In (12) each other is merged directly into the [Spec, I] position in which it surfaces, and the only movement that takes place is that of q to its checking domain. Therefore we do not expect any effects connected with the reconstruction of each other to its qposition. We shall return to potential counterexamples in section 6. The other fundamental argument in favour of the F-movement approach over the DPmovement one is represented by the contrast between the behaviour of A-movement traces and A-movement traces with respect to syntactically conditioned phonological rules. While the reconstruction test in (10)-(11) shows that the presumed DP-copy plays no role at LF, and therefore is best abandoned at this interface level, the phonosyntactic rule test addresses the relevance of the presumed DP-copy at the other interface level,

i.e. PF. As is well-known, a number of scholars have argued that A-movement traces, as opposed to A-movement ones, do not block PF processes. Not surprisingly some of them, notably Postal and Pullum (1982), have concluded from this that DP-traces do not exist. However, the inexistence of DP-traces does not imply the absence of Amovement. We effectively agree that DP-traces do not exist; but A-movement does exist within the theory being proposed in the form of F-movement. The blocking of phonosyntactic rules such as wanna contraction represents a classical argument in favor of the reality of A-movement traces. As the argument goes, the sentence in (13) has two possible interpretations, corresponding to the two structures in (14). In (14a) the trace of wh-movement is interpreted as occupying the object position of the embedded sentence; in (14b) it is interpreted as occupying the embedded subject position: (13) Who do you want to call? (14) a. [who do you want [PRO to call who]] b. [who do you want [who to call]] If want to is contracted to wanna, as indicated in (15), the two interpretations reduce to just one, namely (14a); (14b) is blocked under contraction: (15) Who do you wanna call? A possible analysis of these data is that the phonological process responsible for the contraction of want to to wanna is sensitive to the presence of lexical material between the two in the shape of a wh-trace; hence it cannot apply in (14b). If so, a natural explanation for the lack of blocking effects in (14a) is that PRO, or whatever else PRO reduces to, is not lexical. Consider then A-traces, as seen for instance in (16). We take it
410 Manzini & Roussou

that in (16) have to does not define a control environment, i.e. one involving a thetarole both in the matrix and in the embedded clause, but rather a raising environment: (16) a. I have to leave b. I hafta leave The lack of blocking effects in (16b) argues that what is traditionally construed as a DPtrace, as in (17a), is better construed as an F-trace, as proposed by the present theory and illustrated in (17b). Indeed (17b) suggests an immediate explanation as to why there are no blocking effects on contraction, namely that F-traces do not count as lexical, while (17a) is prima facie identical to (14b); we will return to these points shortly:

(17) a. [I have [I to leave]] b. [I q-have [to leave (q)]] There also appear to be conceptual reasons that favor the F-movement construal over the phrasal construal of A-movement. To begin with, the mechanism illustrated for sentence-internal raising in (9) extends to long-distance raising across sentence boundaries of the type illustrated in (18): (18) John q-I [seems [ to work (q)]] In (18), the q-role associated with the embedded verb work moves in one step to the checking domain of the DP John which is merged directly in the [Spec, I] position where it surfaces. From this construal of raising, in which the embedded [Spec, I] is not involved, we derive a first significant consequence for the grammar in general. The embedded subject position in raising contexts is the only one where according to Chomsky (1995), a D feature can be seen independently of Case. Indeed in Chomskys (1995) analysis the Extended Projection Principle (EPP) requires the presence of a Dfeature on the infinitival I, which forces a DP to pass through its Spec, as in the derivation outlined in (19). At the same time, the fact that the DP can and must move on to the matrix [Spec, I] is due to the impossibility of the infinitival I to check its Case: (19) John I [seems [John to [John work]] The analysis in (18) proposed here crucially differs from the analysis in (19), in that it does not imply movement through the Spec of the infinitival I, not even limited to the qfeature. This means that within the present theory we dispense with any D-feature
A-movement and control 411

associated with a non-finite I. There is therefore no position within the present theory where the D-feature and the Case feature can be seen independently of one another. Thus the two features can be unified, eliminating the considerable redundancy between them in the grammar. We should also stress that according to Chomsky (1995), Case is the only feature which is non-interpretable on both the attractor and the attractee, casting doubts on its existence independently of the redundancy just noted (Roberts and Roussou 1997, Chomsky 1998). See Manzini and Savoia (forthcoming) and Roberts and Roussou (forthcoming) for alternative theories of the EPP compatible with the analysis proposed here. To summarize, there is no evidence either at the LF or at the PF interface that Copy and Merge of lexical material, i.e. classical phrasal movement, is involved in Amovement contexts. This therefore seems to suggest that lexical arguments are generated

directly in the position where they surface and an appropriate operation connects them to a theta-feature associated with the predicate. The appropriate operation would be MoveF in Chomskys (1995) framework. In the following section we will, however, propose that there are conceptual and empirical reasons that favor an even more radical approach to A-movement. 2 The theory of A-movement So far, we have simply suggested a reformulation of A-movement from Copy and Merge of a whole DP, i.e. DP-movement, to Copy and Merge of a single feature, i.e. Fmovement. We have argued for this alternative on the basis of two sets of empirical arguments, namely the lack of reconstruction effects at LF and of blocking effects on phonosyntactic rules at PF. In the first case it is obvious that in the absence of a lexical copy we should not expect reconstruction effects. However, in the second case, one could argue that we should be able to detect some effects of the copy of the feature, since F-movement is Copy and Merge after all. In other words, the fact that there is an asymmetry between lexical copies and feature copies at PF has to be stipulated. This suggests to us that F-movement does not involve Copy and Merge of any sort and is therefore not movement in the sense defined by Chomsky (1995). In the light of these considerations let us go back to the derivation in (9). Let us assume that John has some feature that needs to be checked by called and specifically by its theta-feature. If John and called were in the same checking domain, i.e. in practice in a head-Spec configuration, then checking could take place directly. The intervention of movement, if only of a feature as in (9), is due to the fact that John and called are not in the same checking domain. Suppose, on the contrary, we give up the notion of checking
412 Manzini & Roussou

domain; then whatever operation needs to take place between John and called can do so directly, as tentatively indicated in (20) by means of the italics: (20) [IP John I [VP called(q)]] Chomsky (1998) explicitly recognizes the stipulative nature of the checking domain. Furthermore, he construes Copy and Merge, i.e. movement, which applies to phrases, as a separate operation from Attract, which applies to features. Thus the operation in (20) is a counterpart of the operation Attract in Chomskys (1998) system; contrary to his

analysis, on the other hand, we want to suggest that Copy and Merge is to be dispensed with. For present purposes we shall simply adopt the name ATTRACT for the operation in (20); this term is meant to capture both the fact that the operation plays the same role as Chomskys Attract in the general economy of the grammar, and that it differs from it at least in one respect. Attract as defined by Chomsky (1998) still involves the merger of features of one lexical item (the attractee) into another (the attractor). The operation in (20) is conceived as preserving the integrity of both lexical items; the reasons for this lie in feature theory, a topic to which we shall return shortly. Although we agree with Chomsky (1998) that the notion of checking domain has no theory-independent motivation, in Chomskys (1995) system there is one more locality condition, namely the MLC, which is defined for the rule of movement itself. The MLC has very clear empirical consequences, namely the minimality effects discussed by Rizzi (1990) and Chomsky (1995). In order to capture these effects within our framework, we propose that the MLC applies to the operation ATTRACT. The original formulation of the MLC of Chomsky (1995) is based on the intervention of potential attractees. In other words, movement of attractee a to attractor g is blocked if there is an attractee b which is closer to g. This condition is reproduced in (21) referred now to ATTRACT: (21) MLC g ATTRACTS a only if there is no b, b closer to g than a, such that g ATTRACTS b. (21) is trivially satisfied in (20), by John and q, since there is no other candidate thetarole and DP argument in the sentence. In the next section, we will motivate a modification of the MLC which will allow us to deal in a simple and elegant manner with more complex cases, including control. What matters to us at this point, however, is simply to establish the general feasibility of our approach. While the MLC takes care of what Cinque (1991) calls weak islands, another condition on movement is needed to account for strong islands. We follow Manzini
A-movement and control 413

(1992, 1994) in concluding that the best theoretical account of the empirical data is provided by Kayne?s (1984) Connectedness Condition. In accordance with our

formulation of the MLC in (21), we take Connectedness to hold directly of ATTRACT as well. In (22) we maintain Kaynes (1984) term of g-projection, though the relevant notion is defined with reference only to complementation, rather than government, as shown in (22b). Furthermore, although Kayne (1984) postulates a directional asymmetry, we do not, in line with later proposals by Bennis and Hoekstra (1984), Longobardi (1984). See also Manzini (in press) for a discussion of how the notion of gprojection could actually be subsumed under a reformulation of Merge: (22) Connectedness Condition a. Let b ATTRACT a. Then b together with a and the g-projections of a must form a connected subtree. b. g is a g-projection of a if it is a projection of a, or a projection of some d such that a g-projection of a is a complement of d. In the case of (20), the condition in (22) is satisfied by John and q. Indeed the gprojection set of V includes VP and IP. John in [Spec, I], V and the g-projections of V, namely VP and IP, form a connected subtree, in this case the whole sentential tree. We can now consider questions, carefully skirted so far, concerning the nature of the feature theory that underlies the attraction of theta-roles by DP arguments. First, we need to specify exactly what kind of feature q is. Second, we need to specify which feature of the DP argument ATTRACTs q. Third, we need to specify what the properties of ATTRACT are with respect to notions such as strength and interpretability introduced by Chomskys (1995) grammar. We shall then be able to trace our steps backward to a resolution first of the control problem and then a number of residual problems concerning A-movement itself. Consider first the role played by such general notions as Chomskys (1995) interpretability and strength in the present theory. Remember that according to Chomsky (1995) there are essentially two cases in which a feature attracts another feature. The first case arises when the attractor is strong; in this case attraction is overt and independent of the other properties of the features involved, in particular their interpretability. The other case in which a feature acts as an attractor, independently of its strength, is when it is non-interpretable; in this case another feature moves to check it, even if only abstractly. Chomsky (1998) criticizes the notion of strength on grounds of its conceptual complexity, since it is construed as a feature of a feature. This, however, is not to say that strength does not have empirical relevance. Following Roberts and Roussou (1997)
414 Manzini & Roussou

and Manzini and Savoia (forthcoming), we assume that strength is in fact to be understood as an instruction to lexicalize a certain feature. Since lexicalization is tipically done by Merge within the present framework, the issue of strength is largely

irrelevant for ATTRACT. For instance, in (20) we take it that I is associated with a D feature that needs to be lexicalized; in a non-null subject language like English the lexicalization is accomplished by merging a full DP, John in (20), in [Spec, I]. The issue of interpretability, on the other hand, requires more discussion. It seems to us that maintaining Chomskys (1995) theory of interpretability within the present framework amounts to nothing more than a terminological gimmick. Thus we could associate every D with a non-interpretable q feature, which would ATTRACT an interpretable q feature associated with V; but it is certainly simpler to maintain that every D directly ATTRACTs q. The intermediate step of postulating a noninterpretable q feature is entirely ad hoc. In turn, if D directly ATTRACTs q, this amounts to saying in Chomskys (1995) terminology that checking is well-defined between two interpretable features. This conclusion, and in fact the stronger conclusion that all features in the grammar are interpretable, has been independently motivated in the literature, notably by Brody (1995), Roberts and Roussou (1997) and we shall accept it here without further discussion. If so, checking not only can, but also must involve interpretable features. Summarizing so far, in a simple sentence of the type in (20) the intepretable D feature of John ATTRACTs the interpretable q feature of V. As we already saw, the basic locality principles, i.e. the MLC and Connectedness, are directly defined for the operation ATTRACT and obviously satisfied in (20). Before considering more complex cases than (20) and eventually control, we need to go back to the question of what the nature of the q feature is. More specifically, we can ask whether the present theory of Amovement really chooses between the feature-based theory of theta structure that we have adhered to so far and the configurational view advocated by Chomsky (1995). As it turns out, ATTRACT, contrary to F-movement, does not really choose between these two options. Hale and Keyser (1993) argue compellingly that the notion of theta-role is deeply flawed. In particular, it is easy to show that it is not restrictive enough, since there is no principled reason why the basic repertory of Agent, Theme, Source, Goal, etc. should not be further expanded; nor is there any reason why two or more of those specifications could not be combined; and so on. Face to these problems, Hale and Keyser (1993), followed by Chomsky (1995) among others, propose that theta-roles should be replaced by elementary predicate-argument configurations within a VP-shell. Thus instead of (20), where V is associated with a feature q, we have a structure like (23), where it is the presence of a single VP layer that indicates the nature of Theme of the one argument:
A-movement and control 415

(23) [IP John I [VP left]] The italics in (23) indicate that the operation ATTRACT ought to proceed as before. Indeed the theory of ATTRACT that we have now defined does not require the two features involved in this operation to have any special properties in relation to one another. Therefore there is no reason why the operation ATTRACT could not be defined directly between D and V. In this way, we reconstruct the basic intuition of the configurational theory of theta-roles, namely that they correspond to a relation directly defined between an argument and a predicate. At the same time, it becomes unlikely, if not impossible, that the attraction of V by D should involve merger of V under D, as under Chomskys (1998) Attract. Hence we confirm our conception of ATTRACT as maintaining the lexical integrity of the items involved. The analysis in (23) can also be extended to predicates with two or more arguments. In the pure VP-shell notation employed by Hale and Keyser (1993) a lower V represents the basic predicate of the sentence, and a higher V represents an abstract causative predicate in eventive verbs. In the rather more articulated conception of the VP-shell drawn by Chomsky (1995), the higher V corresponds to the light verb v, as in (24): (24) [v [V]] Suppose we adopt the structure in (24). Sentences which embed (24), such as (25), raise the problem of the derivation associated with the object, since so far only derivations associated with a subject have been considered: (25) John killed Mary. The peculiar problem posed by (25) under ATTRACT is that if Mary is generated within the lower VP, then it would mean that arguments can be generated in thematic positions after all. But if we want to maintain that the subject, John in (25), is not generated in a thematic position, then we need a general conception of predicate properties as weak, or in present terms not lexicalized. The alternative therefore would be for Mary to be generated in [Spec, v]; but this is even more problematic, since in this case it is not clear why Mary would ATTRACT the lower V rather than v itself, under the MLC. We believe in fact that this set of problems is an artifact of Chomskys (1995) conception of v, depending in turn on his device of multiple Specs. Suppose that in line with a number of theorists, notably Kayne (1994), we enforce a single Spec constraint for each head. Then we are forced to attribute the two properties of v in Chomskys
416 Manzini & Roussou

(1995) system to two different heads. On the one hand we can postulate a head, v proper, associated with a D feature; accusative case will be a morphological reflex of the

lexicalization of this feature (see Manzini and Savoia 1998 for a precise realization of this general idea). On the other hand, a higher V head will be tipically associated with the CAUSE predicate and its relation to the DP in [Spec, I] will correctly establish the argumental interpretation of the latter. In other words, transitive sentence like (25) will have the structure in (26): (26) [John I [V [Mary v [V]] Needless to say, the correct word order for (26) in English requires the verb killed to be in the highest V. Arguments for this are also independently found in the literature (cf. Bobaljiks 1995 discussion of his Stacking Hypothesis). Before concluding on this point, we would like to note that the conception of argument structure of Hale and Keyser (1993) and Chomsky (1995), while not necessarily having recourse to abstract predicates, owes an important debt to lexical decomposition frameworks. In essence the number of argument positions and their properties are restricted by the number of primitive predicate types and their possible combinations. An alternative theory based on the idea that argument structure is aspectually characterized is argued for by Tenny (1994), Borer (1994), Salles (1997), Arad (1998); Manzini and Savoia (1998; forthcoming) go as far as denying the existence of VPshells. We believe that this approach avoids the problems associated not only with standard theta-roles but also with lexical decomposition and is therefore to be preferred. The relevant lines of empirical argument, however, cannot be pursued here; we shall therefore be satisfied with (26) for the purposes of this article. In what follows we will provide empirical and conceptual arguments in addition to those already presented in favour of our view of A-movement. The first set of arguments comes from the domain of control. We have independently argued that the classical analysis of control, by means of the empty category PRO, has conceptual and empirical drawbacks, which surface in different forms within the GB and minimalist frameworks. Within the present framework we will show that all of these difficulties can be overcome. 3 The theory of control The major proposal that we will put forward in this section is that within the framework of assumptions laid out in section 2, control corresponds to a derivation in which one argument DP ATTRACTs two (or more) different predicates. We shall argue that this
A-movement and control 417

model allows for an elegant account of control which overcomes many of the problems noted for previous frameworks in section 1. Consider the basic control sentence in (27): (27) [IP John I [VP tried [IP to [VP leave]]]

In (27) John is directly merged into [Spec, I] position to lexicalize the D-feature of I. In order to achieve the desired control interpretation, it must be the case that the DP ATTRACTs both the matrix V and the embedded V. Under the MLC in (21), the DP is clearly allowed to ATTRACT the higher V since this is the closest attractee. We may wonder, on the other hand, what allows DP to ATTRACT the lower V as well. Remember that under Chomskys (1995) MLC, once movement of a given feature or phrase occurs, its trace ceases to be visible for the purposes of the MLC, allowing for movement of a lower feature or phrase of the same type to take place. Technically, all that is needed in the present framework is an adaptation of this convention under which a DP can ATTRACT a V across another V, as long as the latter is itself ATTRACTed by the same DP. It seems to us, however, that this technical solution comes short of a real answer to our problem. One conclusion that can be drawn from the difficulty just noted is that contrary to what the MLC states, potential attractees do not really interact with one another. Rather it is potential attractors that interact, as independently suggested by Manzini (1997). Adapting slightly the formulation in Manzini (1997), we therefore define the MLC in scopal terms, as in (28): (28) Scopal MLC Feature F ATTRACTs feature FA only down to the next F that also ATTRACTs FA. Given the Scopal MLC the argument DP in (27) automatically ATTRACTs all of the V's that it has in its scope, where its scope extends as far down in the tree as the next DP. Since there is no other DP than John present in (27), the scope of John includes both the matrix and the embedded predicate, as desired. An analogous problem is posed by Last Resort, in the sense of Chomsky (1995). Clearly, in (27) Last Resort allows the DP to ATTRACT the higher V. However, in Chomskys (1995) formulation, it does not automatically allow for the lower V to be ATTRACTed by the DP as well. Note that there is at least one case in which Chomsky (1995) explicitly allows for a single attractor feature to be checked by more than one attractee. Indeed in connection with his discussion of multiple Specs, he admits the
418 Manzini & Roussou

possibility that a non-interpretable feature can survive checking by the first attractee and therefore act as an attractor for a further attractee. Technically speaking, all that is needed to allow for the analysis of control suggested in (27) is an assumption parallel to Chomskys (1995) whereby an attractor, such as DP, can ATTRACT just once or ntimes.

Once again it seems to us that this solution, whatever its justification may be within Chomskys (1995) system, does not represent a conceptually adequate answer to our problem. In fact, Chomskys (1995) minimalist theorizing oscillates considerably between the conception of Last Resort that we accepted so far (implicitly or explicitly), and what he terms Greed. If Last Resort is based on the need of the attractor to have some feature checked, Greed is based on a parallel need of the attractee. According to Chomsky (1995, 1998), Greed is to be abandoned in favor of Last Resort, for conceptual reasons; for Greed, as opposed to Last Resort, requires look-ahead properties, that should be excluded by a minimalist grammar. It seems to us that this latter argument is compelling and therefore bars a return to Greed as a possibile solution to our predicament. As it turns out, however, the Scopal MLC that we have just formulated in order to solve the locality problem suggests a solution to the Last Resort problem as well. Quite simply, our proposal is that Last Resort also functions as a Scopal Last Resort principle which requires (and therefore allows) a given attractor to ATTRACT all of the potential attractees down to the domain of the next attractor. To be more precise, remember that Chomsky (1995) unifies Last Resort and the MLC into a single principle. In (29) we provide our version of the Scopal Last Resort + MLC. Effectively, (29) represents a strengthening of (28) to a biconditional: (29) Scopal Last Resort + MLC F ATTRACTs all and only the FAs that are in its scope It is worth noticing that the attraction of V by DP within the present framework provides a natural translation of Chomskys (1981) Theta-Criterion, whereby every argument must be assigned a theta-role and every theta-role must be assigned to an argument. Chomsky (1995) enforces the satisfaction of both of these clauses by his configurational definition of argument structure. The condition in (29) derives the same results as the Theta-Criterion, when applied to arguments and predicates, since it effectively requires that every DP be matched with all and only the predicates in its immediate scope. In recalling the Theta-Criterion we slightly weakened it by omitting the one-to-one correspondence between arguments and theta-roles present in the original formulation. This requirement does not hold by hypothesis under the present construal of control. To be more precise, every theta-role/predicate is associated with a single DP, but vice versa,
A-movement and control 419

a single DP can be associated with more than one theta-role/predicate. In fact, there is independent evidence from secondary predication for this weakening of the theory, as explicitly recognized by Chomsky (1986). We are now in a position to consider whether the present theory derives the first empirical generalization on control reviewed in section 1, namely Chomskys (1981) PRO Theorem. The relevant data are in (2)-(3). Within the framework of assumptions adopted here their relevant structure is as in (30)-(31) respectively: (30) [IP John I [VP persuaded [vP v [VPV]]] (31) [IP John I [VP believes [CP that [IP will [VP eat]]]] Consider (31) first. John cannot be merged directly into the matrix [Spec, I] position, because this implies a violation of the requirement imposed by the strong D-feature of the embedded I represented by will. Suppose then John is merged in the embedded [Spec, I] position. In the framework of Chomsky (1995) its movement into the matrix [Spec, I] position is straightforwardly blocked by the fact that its (non-interpretable) Case feature has already been checked and cannot therefore check the Case feature of matrix I. In section 2, however, we suggested that Case features do not have any syntactic import and should be subsumed under D features. This means that we must find some alternative means to rule out movement from the lower to the higher [Spec, I] in (31). The generalization that prohibits movement from one Case position to another can be reconstructed within the present framework as a prohibition against the same D(P) lexicalizing more than one D feature. If we extend the constraint we just formulated so that any lexical element is allowed to lexicalize one and only one feature, we expect not to be able to copy it at all. This result is undesirable to the extent that we want to maintain a Copy and Merge derivation for the wh-movement cases. One possibility is that the reconstruction effects that argue in favour of this derivation according to Chomsky (1995) are themselves to be explained in some alternative way (cf. Kayne 1998). For present purposes, however, it is sufficient to adopt an intermediate generalization, whereby any given lexical item can lexicalize the same feature once and only once. This still allows for wh-movement, in that the whphrase lexicalizes a D feature at the position of merger and a different feature, namely wh/Q, in the position where it moves to. (30) is blocked in much the same way as (31). Remember that we take v to be associated with a D-feature that needs to be lexicalized. If John is inserted in [Spec, v], it cannot be raised to [Spec, I] to lexicalize the D feature of I, because it has already lexicalized another D feature. If it is inserted directly in [Spec, I], then the D-feature associated with v is not lexicalized at all. In fact, with respect to the distribution of PRO
420 Manzini & Roussou

our results are not so much reminiscent of Chomskys (1981) PRO Theorem, as of the alternative generalization of Manzini (1983), according to which the distribution of PRO is dictated by its lack of Case. Another generalization that remains to be accounted for is that expressed by the MDP, as illustrated in (4)-(5) in section 1. The relevant structure of (5) is as in (32): (32) [IP Mary I [VP thinks that [IP John I [VP expected [IP to [VP eat]]]]] Notice that in (32) Mary lexicalizes the D feature of the matrix I and John lexicalizes the D feature of the intermediate I, as required; by hypothesis, non-finite I does not have a D feature. In order to capture the MDP, we need to establish the conclusion that the most embedded predicate eat is ATTRACTed by John, as indicated by the italics, and not by Mary. Under the Scopal MLC, Mary ATTRACTs only down to the scope of the next attractor, i.e. John. Therefore eat is not a possible attractee of Mary. The only possibility that the Scopal MLC allows for is the correct one, whereby thinks is ATTRACTed by Mary and the other predicates by John. Classical MDP examples of the type in (4), with the structure indicated in (33), are accounted for much in the same way as (32): (33) [IP Mary I [VP persuaded[vP John v [VP V[IP to [VP eat]]]]] If we take v to represent a D-position, lexicalized by John, then the lower V?s cannot be associated with Mary across it, under the Scopal MLC, as desired. Remember that a separate ATTRACT operation connects Mary to the higher V position in the matrix VPshell. Within the present theory, control and raising differ only by the nature of the predicates involved. This holds for subject control and raising to subject, to which we will return, as well as for object control and raising to object, or Exceptional Case Marking (ECM). As argued by Johnson (1991), Chomsky (1995), the overt realization of John as the subject of the ECM infinitival in (34) below depends on the presence of what for them is a Case position and for us a D position in the VP-shell of believe. In present terms, this D position requires to be lexicalized by a DP, which in turn attracts the predicates in its immediate scope, as indicated in (35). Remember once again that John triggers a separate instance of ATTRACT, involving the higher VP-shell predicate. The effect is that of raising to object because of the nature of the predicates involved (as expressed for instance by. Chomskys 1955 idea that believe forms a complex predicate with the embedded verb):
A-movement and control 421

(34) John believes Mary to eat too much. (35) John I [believes [Mary v [V[to [eat too much]]]]] It is often assumed in the literature that contexts of obligatory control, such as the ones that we are considering, do not admit of split antecedents. In this respect, the Scopal MLC yields the conventionally accepted results, since the embedded predicate in (36) is

in the attraction scope of Mary but not of John: (36) John persuaded Mary to drink. Transitive control predicates such as persuade also raise the question of the interaction between control and classical A-movement, such as passive. Consider (37): (37) John was persuaded to eat. Our theory predicts that the embedded and the matrix Vs are both ATTRACTed by John only if there is no intermediate D position in the sentence skeleton. Thus contra Baker, Johnson and Roberts (1989), we are led to the conclusion that in short passives only one D is syntactically represented. This analysis has another consequence that turns out to be correct. It is often noticed in the literature (Williams 1980) that subject control verbs such as promise do not passivize, or to the extent that they passivize they require control by the derived subject, such as John in (38): (38) John was promised to be hired. Within our theory, the well-formedness of (38) with an object control, rather than a subject control reading, corresponds to a derivation whereby hired and promise are both ATTRACTed by John. As desired, this is the only derivation allowed by the theory. Summing up so far, if what precedes is correct, the Scopal MLC yields the two basic properties of control, namely the PRO Theorem and the MDP, without need for extra stipulations in the grammar. In particular, within the present theory there is no need to state the MDP as a separate locality principle; nor to capture the distribution of control by using a specialized empty category PRO and/or the ad hoc notion of null Case. Other properties of control remain nevertheless to be taken into account, notably arbitrary control. Before we go on to arbitrary control, it is worth pausing to note that the idea that there is no A-movement, and that the syntax provides operations for matching DPs inserted in D positions with thematic positions also characterizes approaches within
422 Manzini & Roussou

the categorial grammar framework. In particular, Cormack (1998) proposes that the [Spec, V] position is filled by an abstract element q, which passes on the relevant thematic properties to the subject DP, generated directly in Case position. Similarly, the distinction between raising and control in terms of a DP ATTRACTing one or two predicates is reminiscent of work within the categorial grammar tradition such as Jacobsons (1992). What distinguishes the present approach, is mainly the fact that through the Scopal MLC + Last Resort the advantages of the transformational approach are also retained. 4 Arbitrary Control

As we mentioned in section 1, the elimination of PRO in the contexts of obligatory control can also be achieved in a framework which adopts the standard construal of Amovement as DP-movement, by assuming that a DP can move from one q-position to another, along the lines suggested by ONeil (1995), Hornstein (1996). Thus for instance in (39), control would be expressed by merging John in the embedded [Spec, V] position and moving it to the matrix [Spec, V], via the embedded [Spec, I], and then to the matrix [Spec, I]: (39) John I [John tried [John to [John leave]]]] Under this approach, Hornstein (1996) achieves a unification of the MDP with the basic locality principle in grammar, i.e. the MLC, making it equivalent to the present proposal in this respect. However, the two theories can be told apart in at least one environment, namely arbitrary control. Arbitrary control is illustrated in (40) where the embedded infinitival verb is apparently not associated with any DP, and ends up being interpreted roughly as having a generic argument: (40) It is hard [to [work]] In the terms of Chomsky (1981), the infinitival [Spec, to] position hosts a PRO, which in the absence of any available controller, is assigned a free index, and hence arbitrary interpretation. For Hornstein (1996), in the absence both of PRO and of any overt DP argument, the only possible analysis consists in assuming that the embedded [Spec, to] position is occupied by the other pronominal empty category, pro. There are both empirical and conceptual problems with this latter approach. Thus it is not explained why in English pro would be restricted to exactly the arbitrary control
A-movement and control 423

environments. What is more, much current literature argues in favor of the elimination of pro even from the null subject configurations for which it was originally introduced by Chomsky (1982). In particular, Pollock (1996), Nash and Rouveret (1997), Alexiadou and Anagnostopoulou (1997) argue that V is sufficient to check the strong feature of I in null subject languages, while Platzack (1995), Manzini and Savoia (forthcoming) treat null subject languages as having a weak (non-lexicalized) D feature in I. For the reasons advanced by all of these authors, we conclude that pro is best abandoned; in this perspective the pro analysis of arbitrary control, and therefore the standard DP-movement approach to obligatory control that forces it, are undesirable. What remains to be demonstrated is that our theory predicts the basic facts concerning arbitrary control without having recourse to the re-introduction of pronominal empty

categories. In present terms, the problem that needs to be solved is that, although the embedded predicate in (40) is apparently left without an argument, this does not lead to a failure of interpretation; rather it leads to the interpretation already noted whereby work takes a variable of a generic operator of some sort as an argument. We can provide a syntactic formalization for this interpretation by assuming that an abstract adverb of quantification is available in sentences of this type, which ATTRACTs the predicate. More precisely, we can identify the position of the operator with that of finite C. The derivation for (40) therefore takes the form in (41) where finite C ATTRACTs the infinitival V: (41) [C [it is hard [to work]] One argument in favor of C acting as an attractor for the embedded predicate is provided by the fact, noticed in the literature (cf. Bresnan 1982), that predicates not associated with a lexical DP can be interpreted as having a specific, rather than a generic, argument given the appropriate context. Thus while a generic reading for the non-lexicalized argument is associated with the generic context in (41), a specific reading is associated with the non-lexicalized argument given the specific temporal context in (42): (42) It was hard to work (on that beautiful sunny day). In (41) and (42), then, the interpretation of the non-lexicalized argument varies in accordance with the temporal context. Suppose that the operator in C ATTRACTs Tense, exactly as it ATTRACTs the lexical predicate. If so, we predict that a generic operator in C determines a generic interpretation for both the non-lexicalized argument and for Tense in (41), while the presence of a specific operator in C in (42) determines a specific interpretation for them. Remember that it is independently argued in the
424 Manzini & Roussou

literature, starting with En (1987) that the interpretation of Tense depends on C; see Roussou (forthcoming) for more recent discussion. Indeed we can assume that the attraction properties of the operator in C are always satisfied by Tense, independently of whether they ATTRACT a lexical predicate or not. It is important to stress that we predict a correlation between temporal and argument interpretation only when the argument is not lexical; only in this case do they both depend on C. If there is a lexical argument, as in all of the cases of ordinary control considered in previous sections, and it is the lexical argument itself that satisfies an argument place in the predicate, we predict no correlation between the generic or specific nature of the lexical argument and that of the temporal context. For instance, it has been noticed (cf. Brody and Manzini 1988), that if an overt argument is associated with the matrix predicate in (40), it takes on the role of obligatory controller, as in (43):

(43) It is hard for us [to [work]] In present terms us in (43) ATTRACTs the embedded predicate, as predicted by the Scopal MLC. Indeed in (43) the predicate work is in the immediate attraction scope of the DP us. Since in (43) the interpretation of the argument position of work depends on us, while the interpretation of the matrix Tense depends on C, we expect no correlation between the two. In other words, in this case we correctly predict that it is possible to have a specific argument in a generic context. The interaction between temporal and argumental interpretation are strictly confined to arbitrary control, which seems to us exactly the correct empirical result. A problem that we have not addressed so far concerns the role played by the expletive it and the matrix predicate in (40)-(42). Since it is a DP, we might expect it to count as the nearest attractor for the embedded, as well as the matrix, predicate. If so, however, we cannot predict the interpretation of the relevant sentences, which as we have seen depends on the matrix C acting as the attractor for at least the embedded predicate. One partial solution consists in saying that only an argument DP, and not an expletive, ATTRACTs a predicate. This solution, however, has both conceptual and empirical disadvantages. From a purely conceptual point of view we notice that the distinction between arguments and non-arguments is not a feature of DPs but rather a consequence of the derivation associated with them. Thus it is a non-argument if it has an argument associate, and an argument otherwise.These considerations are strengthened by the empirical evidence provided by examples such as (44). (44) contains an instance of there which is expletive with respect to another DP; in this latter case the associate DP is ultimately interpreted as the controller of the embedded predicate:
A-movement and control 425

(44) There arrived several people [without telling us beforehand] For reasons that have to do with the structure of adjuncts (cf. section 5 below), the interpretation of (44) cannot be obtained by having several people directly ATTRACT the embedded predicate. Rather there independently ATTRACTs both the embedded predicate and several people (see Chomsky 1995, Manzini and Savoia forthcoming for theories of the expletive-associate pair which involve attraction of some sort or another). This analysis confirms that it is really DPs that attract Vs, independently of whether they are arguments, and not some +/-argument property of DPs. Returning now to the examples in (40)-(42), we still wish to suggest that the reason why it allows C to ATTRACT the embedded predicate is connected to the fact that it is an expletive. In particular, note that the control sentence is the associate of it. In the framework of assumptions just adopted, this means that it ATTRACTs not only the matrix predicate but also the control sentence. Thus the control sentence comes to be interpreted as the argument of the matrix predicate. What is more, the predicate in the

control sentence is not in the scope of it but only in the scope of the matrix C. Thus it is C that ATTRACTs it and determines the arbitrary interpretation of its argument. It is worth checking at this point whether the present set of assumptions still allows us to account for raising and superraising. In a simple raising sentence like (18), repeated here as (45), John ATTRACTs both the matrix and the embedded predicate, that are in its immediate scope, as indicated by the italics: (45) [John I [seems [to [work]]]] Consider then (46)-(47). (47) represents an example of superraising, classically accounted for by assuming that it intercepts A-movement of the argument across it; (46) is its well-formed counterpart: (46) It seems [that John was told [that Mary left]] (47) *John seems [that it was told [that Mary left]] In (46), John ATTRACTs the predicate told; it ATTRACTs the predicate seem which is in its immediate scope and at the same time the associate, in this case the that-clause. The correct interpretation ensues, with the that-sentence taken as the argument of seem. Consider the superraising example in (47). The intended interpretation, under which John is an argument of told, is blocked by the Scopal MLC; indeed John cannot ATTRACT told because told is in the immediate scope of it. Other construals of the sentence are equally excluded. In particular, although seem is in the immediate scope of
426 Manzini & Roussou

John, and can be ATTRACTed by it on the basis of the Scopal MLC, the interpretation fails, since the interpretive requirements of John cannot be satisfied by a predicate like seem (cf. the ungrammaticality of *John seems). The analysis of control presented in this section and in the preceding one depends of course on the idea that the C position of the control complement does not have any operators that block attraction of the embedded predicate by a higher DP or finite C. If such attractors were present, we would not be able to derive obligatory control in (4)(5), as well as in (43). Furthermore, with respect to arbitrary control we would not be able to predict that the interpretation of the non-lexicalized argument co-varies with that of the matrix Tense as in (40)-(42). The absence of appropriate operators in the infinitival C can in turn be connected to the absence of temporal properties in infinitivals. This conclusion directly contradicts the conclusions of Chomsky and Lasnik (1995), whose null Case for PRO is supported by temporal properties associated with to (cf. Martin 1996). This latter view is apparently confirmed by the observation of Bresnan (1972),

Stowell (1982) that a future interpretation is associated with the controlled infinitival in examples of the type in (4) or (6). Though the observation is in itself correct, we believe that it does not argue for temporal properties associated with infinitival I. Rather, the future interpretation can be taken to support a modal characterization for infinitivals (see Roussou forthcoming). Recent work by Bokovic (1997) seeks to provide further arguments in favour of the temporal analysis of control clauses based on a comparison between English and French. Though a discussion of his data is outside the scope of the present paper, we believe that the modal-based approach, or eventually an aspectual one, accounts for them as well. Let us summarize so far. On the basis of the Scopal MLC and of the appropriate assumptions concerning attractors, the theory is able to derive the following theorem: a predicate is necessarily ATTRACTed by the first argument DP or finite C that ccommands it. This in turn means that what in standard terms is an arbitrary PRO can only surface if there is no DP argument in the clause that immediately contains the control clause. Thus arbitrary control is impossible in (43), though forced in (40)-(42). The operator that we have postulated in C independently acts as an attractor for Tense and is associated with it in all finite environments. Other empirical consequences can be made to follow from what precedes. Suppose we embed (40) under a higher sentence, as in (48): (48) John thinks [that it is hard [to work]]. In (48), two interpretations are available for the predicate in the control sentence; remember that in (48) the control sentence itself is the associate of it and its scope is
A-movement and control 427

therefore the same as the subjects. The first reading, the arbitrary one, depends on work being ATTRACTed by an operator associated with the intermediate C. There is, however, a second reading, with John interpreted as an argument of work. Our theory clearly predicts that this reading cannot depend on control by John, in the sense defined so far, that is on the attraction of the predicate by John. Indeed the temporal operator in the intermediate C acts as an attractor for work, as we have just seen, effectively preventing John from ATTRACTing it. There is an alternative way to obtain the desired reading though. We have already assumed that the embedded finite C is associated with an operator that ATTRACTs work. We suggest that the interpretation whereby John is the argument of work also depends on this derivation. As we have seen, the argument of a predicate ATTRACTed by the operator in C can be interpreted as generic or specific; in this latter case we

assume that it can be anaphoric to a DP. Hence the non-lexicalized argument of work can be interpreted as anaphoric to John. In this respect we follow a long tradition in control studies (Williams 1980, Bresnan 1982, and many subsequent works including Hornstein 1996), which suggests that the non-obligatory long-distance variety of control is to be treated in the same terms as arbitrary control. At the same time, contrary to the works just mentioned, the present theory introduces a parallelism between obligatory and non-obligatory control, to the extent that both phenomena reflect the (obligatory) attraction of a predicate by an appropriate attractor. The only difference is that DP is the closest attractor in the case of obligatory control and finite C is the closest attractor in the case of non-obligatory control. One question that remains to be addressed concerns the +human interpretation that appears to be necessarily attached to arbitrary PRO. This apparently provides a good argument in favor of the approach of Chomsky and Lasnik (1995), since their PRO can be endowed with a feature such as +human in the lexicon. In a language like Italian, furthermore, the feature specification of arbitrary PRO can include +masculine, +plural, since participles and adjectives are seen to agree with it in these features. However, if this is the case for arbitrary PRO, then there must be a separate lexical entry for nonarbitrary PRO which has as many combinations of lexical features as its antecedents. This result appears to be unparsimonious. Nor is it likely that a feature like +plural can be construed as a default, even if one were to overcome general theoretical objections to the notion of default itself (on these points see Manzini and Savoia forthcoming). In this connection, Williams (1992) points out the similarity between arbitrary control and logophoricity. Thus elements that behave logophorically such as long-distance anaphors typically target as their antecedent, the source of the report, the person with respect to whose consciousness or self the report is made, and the person from whose point of view the report is made (Sells 1987). Naturally, all of these characterisations
428 Manzini & Roussou

point to a +human antecedent for logophors. We therefore conclude that an operator ATTRACTing a predicate leads to a logophoric reading of its argument; whence the surfacing of +human properties for arbitrary control. We shall return to the issue of logophoricity, and its syntactic or pragmatic nature, in section 6. 5 Other consequences for control In sections 2-4, we established that the Scopal MLC is observed by ATTRACT involving predicates, in that the closest DP or C attractor, is obligatorily targeted. There is, however, a second major constraint on ATTRACT, whose effects on control we

should in principle be able to observe, namely Connectedness, as formulated in (22). The relevant environments correspond of course to the classical strong island ones, namely adjuncts and subjects. Consider first control into adjuncts. As is well known, adjuncts tend to display the same properties as complements with respect to control. Thus the only possible interpretation for the adjunct infinitival in (49)-(50) is one in which the embedded predicate is associated with the matrix subject: (49) John left before eating. (50) John left without asking. One possible interpretation of this fact is that, as suggested by Kayne (1994), right adjunction is not allowed by the grammar; therefore the embedded sentences in (49)(50) are attached as complements. This analysis, however, is in direct contradiction with a number of other properties of adverbials, as argued in detail by Manzini (1995). One property, which is of direct interest here, is that in general, a matrix object cannot serve as controller for an adverbial. Thus (51)-(52) have exactly the same control properties as their counterparts in (49)-(50) despite the insertion of an object in the matrix sentence: (51) John left us before eating. (52) John left us without asking. If adverbials are generated in the most deeply embedded position in (51)-(52), as suggested for simple adverbs by Larson (1988), then control by the object should be possible and in fact obligatory by the MDP or the Scopal MLC. Therefore, as pointed out by Williams (1974), the subject orientation of most adverbials argues in favour of the conclusion that they are attached high enough not to be in the scope of the object.
A-movement and control 429

Incidentally the same conclusion excludes that in (44) the associate could directly control the embedded predicate. Thus adverbials are not in complement position, but rather adjuncts. We have independent evidence that in the case of wh-movement, extraction from adjuncts is blocked, as in (53) below. Therefore if control in (49)-(52) is to be construed as attraction of the embedded predicate by the matrix subject, the question arises why this is not blocked as an instance of adjunct island violation. However, adjunct islands can be circumvented by parasitic gaps. Thus for wh-movement, the adjunct island violation in (53) contrasts with the wellformed parasitic gap configuration in (54): (53) *Which article did you review my book [without reading t] (54) Which book did you review t [without reading e] This suggests that subject control in (49)-(52) also reflects an underlying parasitic gaplike configuration. Consider indeed (49) in terms of the present theory. Assuming that the before sentence is generated as a sister to the highest VP-shell projection, the

relevant structure is as in (55): (55) John I [left [before I [eating]]] In (54) we independently know that the DP John ATTRACTs the matrix predicate, which is on the main branch of the sentence. But if so, attraction of the embedded predicate, which is on the adjunct branch, by John creates a parasitic gap configuration, as desired. This can be most simply illustrated by comparison with (54), where a single antecedent, the wh-phrase, binds a trace in the main branch of the sentence and a trace inside the adjunct, as schematized in (56). Similarly in (55), a single DP, John, acts as an attractor for both the predicate in the main branch of the sentence and for the predicate inside the adjunct, as schematized in (57):
430 Manzini & Roussou

(56) CP which book C C VP VP PP . which book before CP .... which book (57) IP John I I VP VP PP left before CP V... As we did for control into complements, we can now consider the interaction of control into adjuncts with passivisation. In the cases illustrated in (58)-(59) below, the argument DP in matrix [Spec, I] functions as the attractor for the embedded predicate. This is correctly predicted by our theory on the basis of the Scopal MLC as well as Connectedness. As can be seen by the comparison between the active forms in (b) and the passive forms in (a), we find a switch of the thematic properties of controllers, since in passives it is the patient that controls the embedded predicate, while in actives it is the agent that does so. Our theory thus predicts that notions such as agent and patient are irrelevant for control; instead the embedded predicate in the adjunct is attracted by the closest DP connected to it. As we have argued for the case of control into complements, in short passives, the agent is not syntactically projected and cannot therefore interfere with control.
A-movement and control 431

(58) a. John was fired after doing that. b. We fired John after doing that. (59) a. John was hired without moving a finger.

b. We hired John without moving a finger. One apparent counterexample to the above conclusions is represented by control into purpose clauses. Well-known examples of the type in (60) are generally taken to show that the agent is syntactically realized in passives, and acts as a controller for the adjunct (cf. Brody and Manzini 1988): (60) The boat was sunk [(in order) to collect the insurance]. In order to account for (60), we are led to assume that the control sentence is adjoined to a projection higher than the VP to which we adjoined the adjunct sentences in (49)(52). Indeed, it is necessary and sufficient for us to assume that the purpose clause is attached high enough in the matrix sentence to be outside the scope of the subject, in order to predict that it is not controlled by the subject under the Scopal MLC. Following fairly standard assumptions, we can associate the purpose clause with an IP-adjoined position, as in (61): (61) CP C IP IP CP DP I C IP The fact that the embedded predicate in (61) becomes associated with the arbitrary interpretation, is in present terms due to it being ATTRACTed by the matrix C. This also follows form the Scopal MLC, since the matrix C is the first attractor that takes the adjunct predicate in its scope. Note that Connectedness is also satisfied, in that the matrix C equally ATTRACTs the matrix I for reasons of Tense interpretation. Therefore, the attraction path from matrix C to the adjunct predicate forms a connected subtree together with the matrix attraction path from C to I. This analysis does not take into account the intuition that the argument of the embedded predicate is interpreted as anaphoric to the agent of the passivized matrix predicate. Since we independently argued
432 Manzini & Roussou

that in short passives the agent is not syntactically represented, we are led to impute this intuition to pragmatic inference. Note that if purpose clauses are adjoined to a position which is outside the scope of the matrix subject, we predict that their predicate is not controlled by it even in active environments, for instance in (62). Hence we are led to claim that the apparent control of the embedded predicate by the matrix subject in (62) reduces to arbitrary control and anaphora, essentially as in the case of apparent long-distance control in (48): (62) We sank the boat [(in order) to collect the insurance] In other words, the matrix C acts as an attractor for the adjunct predicate, licencing a

specific interpretation, anaphoric to some argument in the sentence. Once again, we impute the fact that the second interpretation is in fact the only possible one to a pragmatic inference factor. We are now in a position to consider control into the other major type of strong islands, namely subjects. The relevant data are of the type in (63): (63) John believes [that [behaving badly at the party]I would bother Mary] (63) is a typical context of non-obligatory control; indeed the non-lexicalized argument in the subject clause can be interpreted as arbitrary, or as anaphoric to either the matrix argument, John, or to the embedded one, Mary. Leaving aside the question of islandhood for a moment, this state of affairs is as expected, given that the nearest attractor of the appropriate sort for the embedded predicate is the operator in the intermediate C. The DP-argument Mary cannot act as an attractor because it does not take the subject sentence in its scope, while the DP-argument John is higher, and thus further away than the intermediate C in terms of the Scopal MLC. In turn, attraction of the control predicate by the intermediate C is compatibile either with an arbitrary interpretation or with an interpretation anaphoric to some individual(s) in the discourse, including John/ Mary. As for the question whether the intermediate C is accessible to the gerund under Connectedness, remember that wh-extraction from a subject is barred; however, if a parasitic gap configuration is involved, Connectedness is satisfied, as seen in the contrast in (64): (64) a. *Who do [friends of t] admire you b. Who do [friends of e] admire t
A-movement and control 433

Therefore in (63), the gerund can be ATTRACTed by the intermediate C, in that this attraction path forms a connected subtree with the attraction path independently defined between C and the matrix I for reasons of Tense intepretation. This yields a derivation of the type in (65): (65) CP C IP CP I C IP would VP .... V ... In short, both the fact that non-finite subject sentences are typical contexts of nonobligatory control and the fact that non-finite adjuncts typically display obligatory control can be explained on principled grounds. The alternation between obligatory and non-obligatory control depends on whether the closest attractor is a DP argument (as for

adjuncts) or a C (as for subjects). As for the possibility to reach into strong islands, this arguably depends on the creation of parasitic gap-like configurations. Thus a DP subject can ATTRACT two predicates, one in the main branch of the sentence and one inside an adjunct. Similarly a C can ATTRACT an I in the main branch of the sentence and a predicate inside a subject sentence. The conclusion that control into adjuncts amounts to the creation of a parasitic-gap configuration is also put forward by Hornstein (1997), though it is embedded under a completely different analysis of parasitic gaps; furthermore, his analysis does not readily extend to control into subject sentences. Before concluding the discussion of control, another notoriously problematic set of data is worth reviewing briefly. We have so far illustrated control with embedded nonfinite sentences. However, it is well-known that in appropriate contexts, such as interrogatives and exclamatives, infinitivals are acceptable as matrix sentences, as for instance in (66)-(67): (66) What to do? (67) Ah, to go to the sea! What is more, it appears that interrogative sentences embedded as complements to matrix verbs, do not display obligatory but rather optional control effects, as in (68):
434 Manzini & Roussou

(68) I asked how [to behave] Consider first (66)-(67). Because there obviously is no tensed C which takes the infinitival in its scope, our general theory about the licencing of arbitrary control depends on assuming that some other operator can act as an appropriate attractor for the infinitival predicate. As it turns out, this operator can be identified with the one licensing the apparently exceptional matrix use of the infinitival, say some interrogative/exclamative modal operator. If the same operator is present in the embedded context in (68), we fully predict that it will licence the arbitrary reading of the non-lexicalized argument of the embedded predicate, pre-empting control by the matrix subject. 6 Further consequences for A-movement In our discussion of A-movement in section 2, we argued that the fundamental role of Amovement in the grammar is that of conveying theta properties to DP positions, where arguments are realised. In the alternative formalization that we provided, a similar role is played by the attraction of predicates by DPs. This construal of A-movement explains the lack of reconstruction effects at LF and the blocking of PF rules.

A number of other facts standardly associated with A-movement remain to be considered. One of them is the triggering of agreement. Let us for instance consider a standard unaccusative sentence in Italian including an embedded past participle. The subject overtly agrees with the past participle in number and gender, as in (69): (69) Maria partita. Mary is left-sg.fem. Mary has left. Under standard assumptions about A-movement the derivation of (69) proceeds along the following lines: Maria moves through [Spec, v]; this configuration triggers agreement between Maria and the past participle, though Maria is ultimately realized in [Spec, I], as illustrated in (70): (70) Maria [Maria partita] In the present framework, Maria is merged directly in [Spec, I], while the lexical verb is ATTRACTed by Maria, without Copy and Merge applying at any point in the
A-movement and control 435

derivation. If that is the case, the question arises what triggers agreement between Maria and the past participle at all. We assume that the past participle, exactly like the DP, is associated with f-features in the lexicon, agreeing in this with Chomsky (1995). On the other hand, we have already established that for reasons entirely independent of agreement, DP ATTRACTs the lexical V. In this framework agreement of the ffeatures present on the DP and those present on the past participle can be seen simply as a reflex of the ATTRACT operation that involves these two elements. Manzini and Savoia (forthcoming) show that this approach also extends to configurations standardly taken to involve agreement with pro. Another set of problems that we need to consider concerns one of the arguments that we put forward for our approach to A-movement, namely the lack of reconstruction effects. Though data of the type in (11) seem to provide a clear-cut argument against reconstruction of A-traces, other data have been held to support it. In particular, Belletti and Rizzi (1988) point out contrasts of the type in (71): (71) a. Pictures of himself worry John/him. b. *Himself worries John/him. They argue that the grammaticality of (71a) is due to reconstruction of the derived subject, i.e. pictures of himself, into its thematic position, which they take to be lower in the VP-shell than the position of the Experiencer John/him. If so, (71b) can no longer be excluded by the impossibility of reconstruction. Therefore they suggest that Principles B

and C of Chomskys (1981) Binding Theory must be satisfied at S-structure, whereas Principle A can be satisfied in their terms either at D-structure or at S-structure, i.e. equivalently at S-structure or at LF in a theory that uses reconstruction. Under this set of assumptions, both (71a) and (71b) are well-formed with respect to Principle A. (71b) is independently excluded in that it represents a violation of Binding Principles B/C at Sstructure, given that the object John/him is (locally) bound by the subject, contrary to (71a). Crucially, Belletti and Rizzi (1988) make reference to two levels of representation, and this is not consistent with minimalist assumptions. What is more, this recourse to two different levels of representation cannot easily be mimicked within the more limited theoretical resources of minimalism. If we say that both the derived and the trace position count at LF, we can of course derive the ungrammaticality of (71b), but we have no way of deriving the grammaticality of (71a), since in its derived position himself is not bound by John/him. If the subject DP on the other hand is interpreted only in its reconstructed position, we have the original problem of not being able to account for the ungrammaticality of (71b).
436 Manzini & Roussou

Data of the type in (71a) can also be reproduced for standard raising configurations, as in (72): (72) Pictures of each other seem to them [to be on sale] We believe that the correct approach for this kind of examples is the one proposed by Reinhart and Reuland (1993). They argue in essence that self moves to the closest predicate, marking it as reflexive. Therefore the interpretation of John likes himself is determined by an abstract structure of the form John self-likes him where the reflexively marked predicate forces the coreference between John and him. Disjoint reference in John likes him corresponds to the absence of a reflexive marker on the predicate. Given this analysis, since the anaphoric construal of himself depends on self acting as a reflexivizer of the nearest predicate, apparent anaphors inside (picture-)DPsm bound across sentence boundaries as in (73a), must correspond to logophors. This conclusion is independently supported by a number of facts; for instance a first person anaphor can appear in picture-DPs freely, as in (73b): (73) a. Lucie thought that a picture of herself would be nice on that wall. b. A picture of myself would be nice on that wall. Note that though Reinhart and Reuland (1993) like Sells (1987) conceive the construal of logophors in pragmatic terms, this conclusion need not be subscribed to. Thus we could assume that self in its logophoric interpretation is ATTRACTed by a syntactic

position, even though one different from the predicate. The point that at least the Icelandic logophor sig is subject to syntactic movement is argued for by Manzini (forthcoming) on the grounds that elements like sig obey strong islands and show parasitic gaps effects. Though the distribution of logophoric himself is different from that of logophoric sig, we then assume that a syntactic account is equally possible. Remember that according to Williams (1992), arbitrary control itself represents an instance of logophoric interpretation. We suggest therefore that the correct syntax for logophoric himself might be roughly the same as for arbitrary control, i.e. attraction by the nearest finite C. On this understanding we expect logophoric himself to systematically appear inside subject islands, as in (73), exactly like arbitrary control. In short, data of the type in (73) argue not for reconstruction, but for the existence of a logophoric himself. This in turn raises doubts concerning all arguments for movement as Copy and Merge based on the reconstruction of himself, including those presented by Chomsky (1995) for wh-movement and reproduced in (10). As we already remarked at the outset, a discussion of A'-movement is beyond the scope of the present paper. What
A-movement and control 437

is relevant for present purposes is that the availability of a logophoric binding account for (73) and the like neutralizes the potential counterargument to our construal of Amovement. Further evidence for reconstruction in DP movement, relating to bound pronominals, is presented by Lebeaux, as quoted by Brody (1996). Thus (75) is considerably worse than (74): (74) [Hisi mothers]j bread seems to [every man]i to be known by herj to be the best there is. (75) *[Hisi mothers]j bread seems to herj to be known by [every man]i to be the best there is. The argument runs as follows. If in (74) his mothers bread is reconstructed in the intermediate subject position, his is in the scope of every man and his mother ccommands her; this yields a wellformed interpretation. In (75), on the other hand, if his mothers bread is reconstructed within the scope of every, it is also reconstructed in a position c-commanded by her, which means that the resulting representation is ruled out by Principle B. If his mothers bread is not reconstructed, the structure is ruled out by whatever principle requires bound pronouns to be in the scope of their operator. We propose on the contrary that in (74)-(75) the binding of his is accomplished not by reconstruction of his mother, but rather by QR of every: this latter operation is successful when every raises from the matrix sentence, but not when it raises from an embedded one. We suggest that the reason for this is related to the fact that only seem, which as we

shall motivated below has an event-less structure, is crossed in (74), while QR would cross the event domain of know in (75). Though our analysis is still tentative, it does make a clear prediction, namely that substituting any other pronoun for her in (75), as in (76), yields a sentence which is still worse than (74). Judgements are subtle, but native speakers at best assign to (76) an intermediate status between (74) and (75). We can therefore group (75) and (76) together, attributing any residual contrast to their different relative complexity: (76) (*)His mothers bread seems to us to be known by every man to be the best there is. In short, on the basis of the discussion that precedes, we uphold Chomskys (1995) generalization that A-movement cannot be reconstructed. This in turn provides a straightforward and powerful argument in favour of our approach, as opposed to the standard DP- movement one. This conclusion does not hold of the reformulation of Amovement proposed by Sportiche (1996). Sportiche also recognizes the difficulty
438 Manzini & Roussou

represented for DP-movement by contrasts of the type in (71). Thus he proposes that the D head of the DP is merged in the position where it surfaces, essentially as proposed here; if the reflexive in (71b) corresponds to such a head, he predicts that it does not reconstruct. By contrast the NP predicate is always merged in the thematic position, and NP-movement takes systematically place from thematic to D-position. Thus an anaphor in the NP-predicate, as in (71a), can give rise to reconstruction effects. But as we have seen, all of the evidence in favor of DP-/NP-movement from reconstruction of anaphors can be circumvented, thus depriving Sportiches theory of an important empirical basis. Furthermore, the lack of blocking effects on PF rules is not explained by Sportiches approach either. Another phenomenon that appears to involve the LF-interface has been explicitly argued to mirror the underlying structural difference between raising and control. Thus Chomsky (1981), Burzio (1986) point out a contrast between (77) and (78): (77) One interpreter each seems to have been assigned to the visiting diplomats. (78) *One interpreter each tried to be assigned to the visiting diplomats. In order to explain the data in (77)-(78) a theory of each, hence of distributivity, is clearly needed. Though this is largely outside the scope of the present article, we shall nevertheless consider briefly what an account of (77)-(78) might involve. Each is construed not with an interpreter, i.e. the distributee, but with the visiting diplomat, i.e. the distributor. This is made morphologically obvious in Italian by the fact that each agrees in gender with the distributor, as in (79):

(79) a. Un interprete ciascuno/ *ciascuna fu assegnato ai diplomatici An interpreter each-m/ each-f was assigned to the diplomats b. Una guida ciascuno/ *ciascuna fu assegnata ai diplomatici A guide each-m/ each-f was assigned to the diplomats c. Un interprete *ciascuno/ ciascuna fu assegnato alle mogli An interpreter each-m/ each-f was assigned to the wives d. Una guida *ciascuno/ ciascuna fu assegnata alle mogli A guide each-m/ each-f was assigned to the wives Following Beghelli and Stowell (1997) we assume that the structure of the sentence includes a projection that acts as a host for distributive quantifiers, namely DistP, which is the highest quantificationl position below the inflectional ones. For the sake of the present discussion we shall in fact identify the surface position of each with DistP. According to Beghelli and Stowell (1997) an active Dist head selects a ShareP ... which
A-movement and control 439

in turn requires that an existential QP appear in Spec of ShareP, corresponding to the distributee. In (77)-(78), however, the distributee, being in [Spec, I], is higher than ShareP and is therefore not available to fill its Spec. In such cases Beghelli and Stowell (1997) assume movement of a covert existential quantifier over events to this position. Our idea is that in the raising case in (77) a successful interpretation is reached in that the matrix ShareP can be associated with a quantification over the embedded assign event across the event-less seem. In the control case in (78), on the other hand, a successful interpretation would require that both a quantification over the matrix and the embedded event be associated with the same matrix ShareP. This we take to be impossible, as desired. Finally, it should be noticed that the present theory of A-movement is not compatible with the analysis of floating quantifiers argued for in Sportiche (1988), according to which quantifiers such as tous in French are stranded in their base-generated thematic position by leftward movement of the D. But though this analysis has proven very influential, there are several pieces of evidence that cast doubt on it, as argued by Bobaljik (1998) quite independently of the present theory. 7 Conclusion To conclude, in section 1 we argued that there are empirical and conceptual problems for the classical construal of control, involving PRO, as well as for that of A-movement. In section 2 we argued that an alternative formalization for A-movement is possible within the minimalist framework, whereby DPs are merged directly in D-position, where they ATTRACT predicates from the VP-shell. This alternative formalization for Amovement,

which effectively eliminates DP-traces, is supported by the lack of reconstruction effects at LF and of blocking effects on rules at PF. What is more, the theory proposed in section 2 allows for a straightforward account of control which dispenses with PRO. Quite simply, control corresponds to the case in which more than one predicate is ATTRACTed by the same DP argument. In section 4, we concentrated on one difference between our approach and ONeils (1995), Hornsteins (1996), for whom control reduces to DP-movement and arbitrary control must therefore involve an empty pronominal subject pro. In present terms, the empty category pro can be eliminated from the grammar as well. Arbitrary control reduces to the case in which a predicate is controlled by an operator in C, rather than by a DP-argument. Throughout sections 3-4, we showed that the distribution of PRO, the distribution of antecedents for controlled PRO, and the distribution of controlled and arbitrary PRO are determined uniquely by the interplay of ATTRACT with the MLC, which we construe as Scopal MLC. Similarly, in section 5, we explained obligatory control into adjuncts as
440 Manzini & Roussou

a by-product of parasitic gap-like patterns of predicate attraction predicted by Connectedness. Arbitrary control into subjects amounts to the same, with the difference that an operator in C acts as the attractor, rather than a DP-argument. In section 6, we turned to A-movement again considering more complex data involving reconstruction and other LF effects.
References Alexiadou, A. & E. Anagnostopoulou (1997). Parametrizing Agr: Word-Order, V-movement and EPPchecking. Ms., FAS Berlin and University of Tilburg. Arad, M. (1998). VP-Structure and the Syntax-Lexicon Interface. Doctoral dissertation, University College London. Baker, M., K. Johnson & I. Roberts (1989). Passive Arguments Raised. Linguistic Inquiry 20: 219251. Beghelli, F. & T. Stowell (1997). Distributivity and negation. In A. Szabolcsi (ed.). Ways of scopetaking. Dordrecht: Kluwer. Belletti, A. & L. Rizzi. (1988). Psych-Verbs and q-Theory. Natural Language and Linguistic Theory 6: 291-352. Bennis, H. & T. Hoekstra (1984). Gaps and parasitic Gaps. The Linguistic Review 4: 29-87. Borer, H. (1994). The Projection of Arguments. Umass Occasional Papers 17. GLSA, University of Massachusetts, Amherst. Bobaljik, J. (1995). Morphosyntax: The Syntax of Verbal Inflection. Doctoral dissertation, MIT, Cambridge, Mass. Bobaljik, J. (1998).GLOT. Bokovic, Z. (1997). The Sntax of Nonfinite Complementation : An Economy Approach. Cambridge, Mass.: MIT Press. Bresnan, J. (1972). Theory of Complementation in English. Doctoral dissertation, MIT, Cambridge, Mass. Bresnan, J. (1982). Control and Complementation. Linguistic Inquiry 13: 343-434.

Brody, M. (1995). Towards perfect syntax. Working Papers in the Theory of Grammar 2. Research Institute for Linguistics, Hungarian Academy of Sciences. Brody, M. (1996). Some Restrictive Aspects of Perfect Syntax. Paper presentend at the University of Stuttgart. Brody, M. & M. R. Manzini (1988). On Implicit Arguments. In Ruth Kempson (ed.). Mental Representations: the interface between language and reality. Cambridge: Cambridge University Press. Burzio, L. (1986). Italian Syntax. Dordrecht: Kluwer. Chomsky, N. (1975 [1955]). The Logical Structure of Linguistic Theory. Cambridge, Mass: The MIT Press. Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. (1982). Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, Mass.: MIT Press. Chomsky, N. (1986). Knowledge of Language. New York: Praeger. Chomsky, N. (1995). The Minimalist Program. Cambridge, Mass.: MIT Press. A-movement and control 441 Chomsky, N. (1998). Minimalist Inquiries: the Framework. Ms., MIT. Chomsky, N. & H. Lasnik. (1995). The theory of Principles and Parameters. In N. Chomsky. The Minimalist Program. Cambridge, Mass.: MIT Press. Cinque, G.(1990). Types of A-dependencies. Cambridge, Mass.: The MIT Press. Cormack, A. (1998). Definitions: Implications for syntax, semantics and the language of thought. New York: Garland. Enc, M. (1987). Anchoring conditions for Tense. Linguistic Inquiry 18: 633-657. Hale, K. & S. J. Keyser (1993). On Argument Structure and the Lexical Expression of Syntactic Relations. In K. Hale & S. J. Keyser (eds.). A View from Building 20. 53-109. Cambridge, Mass.: MIT Press. Hornstein, N. (1996). On Control. Ms., University of Maryland. Hornstein, N. (1997). Control in GB and Minimalism. GLOT 8.2: 3-6. Jacobson, P. (1992). Raising without movement. In R. Larson, S. Iatridou, U. Lahiri & J. Higginbotham (eds.). Control and Grammar. 149-194. Dordrecht: Kluwer. Johnson, K. (1991). Object positions. Natural Language and Linguistic Theory 9: 577-636. Kayne, R. (1984). Connectedness and Binary Branching. Dordrecht: Foris. Kayne, R. (1994). The Antisymmetry of Syntax. Cambridge, Mass: The MIT Press. Kayne, R. (1998). Talk given at the University of Florence. Larson, R. (1988). On the double object construction. Linguistic Inquiry 19: 335-391. Larson, R. (1991). Promise and the Theory of Control. In Linguistic Inquiry 22: 103-139. Longobardi, G. (1984). Connectedness and island constraints. In J. Gueron et al. (eds.). Grammatical representation. 169-185. Dordrecht: Foris. Manzini, M. R. (1983). On Control and Control Theory. Linguistic Inquiry 14: 421-446. Manzini, M. R. (1992). Locality. Cambridge, Mass.: The MIT Press. Manzini, M. R. (1994). Locality, minimalism and parasitic gaps. Linguistic Inquiry 25: 481-508. Manzini, M. R. (1995). The position of adjuncts: A reply to Kayne. Quaderni del Dipartimento di Linguistica dell'Universit di Firenze 6. Manzini, M. R. (1997). A minimalist theory of weak islands. In Peter Culicover and Louise McNally (eds.). The Limits of Syntax (Syntax and Semantics). New York: Academic Press. Manzini, M. R. (in press). Dependencies, phrase structure and extractions. In D. Adger et al. (eds.). Specifiers: Minimalist approaches. 188-205. Oxford: Clarendon Press. Manzini, M. R. (forthcoming). Sentential complementation: The subjunctive. In P. Coopmans, M. Everaert & J. Grimshaw (eds.). Lexical selection and lexical insertions. HIL. Manzini, M. R. & L. M. Savoia (1998). Clitics and auxiliary choice in Italian dialects: Their relevance for the Person ergativity split. Recherches Linguistiques a Vincennes. Manzini, M. R. & L. M. Savoia (forthcoming). Parameters of subject inflection in Italian dialects. In P.

Svenonius (ed.). Subjects, Expletives and the EPP. Oxford: Oxford University Press. Martin, R. (1996). A minimalist theory of PRO and control. Ph.D. Dissertation, University of Connecticut, Storrs. Nash, L. & A. Rouveret. (1997). Proxy categories in Phrase Structure theory. Ms., University of Paris VIII. ONeil, J. (1995). Out of Control. Proceedings of NELS 25. GLSA, University of Massachusetts, Amherst. Pica, P. (1987). On the Nature of the Reflexivization Cycle. Proceedings of NELS 17. GLSA, University of Massachusetts, Amherst. 442 Manzini & Roussou Platzack, Ch. (1995). Null subjects, weak Agr and syntactic differences in Scandinavian. Working Papers in Scandinavian Syntax 53: 85-106. Pollock, J.-Y. (1996). Langage et cognition: Introduction au programme minimaliste de la grammaire generative. Paris: Presses Universitaires de France. Postal, P. & G. Pullum. (1982). The contraction debate. Linguistic Inquiry 13: 122-138. Reinhart, T. & E. Reuland. (1993). Reflexivity. In Linguistic Inquiry 24: 657-720. Rizzi, L. (1990). Relativized Minimality. Cambridge, Mass.: The MIT Press. Roberts, I. & A. Roussou. (1997). Interface Interpretation. Ms., University of Stuttgart and University of Wales, Bangor. Roberts, I. & A. Roussou. (forthcoming). The EPP as a Condition on the T-Dependency. In P. Svenonius (ed.). Subjects, Expletives and the EPP. Oxford: Oxford University Press. Rosenbaum, P. (1967). The Grammar of English Predicate Complement Constructions. Cambridge, Mass.: MIT Press. Roussou, A. (forthcoming). Control and Raising in and out of Subjunctive Complements. In B. Joseph, A. Ralli & M.-L. Rivero (eds.). Oxford: Oxford University Press. Salles, H. (1997). Prepositions and the Syntax of Complementation. Doctoral Dissertation, University of Wales, Bangor. Sells, P. (1987). Aspects of Logophoricity. In Linguistic Inquiry 18: 445-480. Sportiche, D. (1988). A theory of floating quantifiers and its corrolaries for constituent structure. Linguistic Inquiry 19: 425-449. Sportiche, D. (1996). A-Reconstruction. Paper presented at Rutgers University. Stowell, T. (1982). The Tense of Infinitives. Linguistic Inquiry 13: 561-570. Tenny, C. (1994). Aspectual Roles and the Syntax-Semantics Interface. Dordrecht: Kluwer. Williams, E. (1974). Rule Ordering in Syntax. Doctoral dissertation, MIT, Cambridge, Mass. Williams, E. (1980). Predication. Linguistic Inquiry 11: 208-238. Williams, E. (1992). Adjunct Control. In R. Larson, S. Iatridou, U. Lahiri & J. Higginbotham (eds.). Control and Grammar. 297-332. Dordrecht: Kluwer.

S-ar putea să vă placă și