Main Topic Continuity in Discourse: A Quantitative Cross-language Study
Topic Continuity in Discourse: A Quantitative Cross-language StudyTalmy Givón (Ed.)
The functional notion of “topic” or “topicality” has suffered, traditionally, from two distinct drawbacks. First, it has remained largely ill defined or intuitively defined. And second, quite often its definition boiled down to structure-dependent circularity. This volume represents a major departure from past practices, without rejecting both their intuitive appeal and the many good results yielded by them. First, “topic” and “topicality” are re-analyzed as a scalar property, rather than as an either/or discrete prime. Second, the graded property of “topicality” is firmly connected with sensible cognitive notions culled from gestalt psychology, such as “predictability” or “continuity”. Third, we develop and utilize precise measures and quantified methods by which the property of “topicality” of clausal arguments can be studied in connected discourse, and thus be properly hinged in its rightful context, that of topic identification, maintenance and recoverability in discourse. Fourth, we show that many grammatical phenomena which used to be studied by linguists in isolation, all partake in one functional domain of grammar, that of topic identification. Finally, we demonstrate the validity of this new approach to the study of “topic” and “topicality” by applying the same text-based quantifying method to a number of typologically-diverse languages, in studying actual texts. Languages studied here are: Written and spoken English, spoken Spanish, Biblical Hebrew, Amharic, Hausa, Japanese, Chamorro and Ute.
TOPIC CONTINUITY IN DISCOURSE TYPOLOGICAL STUDIES IN L A N G U A G E (TSL) A companion series to the journal "STUDIES IN L A N G U A G E " Honorary Editor: Joseph H. Greenberg General Editor: T. Givón Editorial Board: Alton Becker (Michigan) Wallace Chafe (Berkeley) Bernard Comrie (Los Angeles) Gerard Diffloth (Chicago) R.M, W.Dixon (Canberra) John Haiman (Winnipeg) Paul Hopper (Binghamton) Margaret Langdon (San Diego) Charles Li (Santa Barbara) Johanna Nichols (Berkeley) Andrew Pawley (Auckland) Frans Plank (Hanover) Dan Slobin (Berkeley) Sandra Thompson (Los Angeles) Volumes in this series will be functionally and typologically oriented, cover ing specific topics in language by collecting together data from a wide variety of languages and language typologies. The orientation of the volumes will be substantive rather than formal, with the aim of investigating universals of human language via as broadly defined a data base as possible, leaning toward cross-linguistic, diachronic, developmental and live-discourse data. The series is, in spirit as well as in fact, a continuation of the tradition initiated by C. Li (Word Order and Word Order Change, Subject and Topic, Mechanisms for Syntactic Change) and continued by T. Givón (Discourse and Syntax) and P. Hopper (Tense and Aspect: Between Semantics and Pragmatics). Volume 3 T. Givón (ed.) TOPIC CONTINUITY IN DISCOURSE: A QUANTITATIVE CROSS-LANGUAGE STUDY TOPIC CONTINUITY IN DISCOURSE: A QUANTITATIVE CROSS-LANGUAGE STUDY edited by T. GIVÓN Linguistics Department University of Oregon, Eugene and Ute Language Program Southern Ute Tribe Ignacio, Colorado JOHN BENJAMINS PUBLISHING COMPANY Amsterdam/Philadelphia 1983 8 TM The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Library of Congress Cataloging-in-Publication Data Topic Continuity in Discourse : A quantitative cross-language study / T. Givón. p. cm. (Typological Studies in Language, issn 0167-7373 ; v. 3) Includes bibliographical references and index. 1. Discourse analysis. 2. Grammar, Comparative and general --Topic and comment. I. Givón, Talmy, 1936-. P302 .T66x 1983 85673162 isbn 978 90 272 2867 3 (Hb ; alk. paper) isbn 978 90 272 2863 5 (Pb ; alk. paper) isbn 978 90 272 8025 1 (Eb) © 1983 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa TOPIC CONTINUITY IN DISCOURSE: AN INTRODUCTION T. GIVÓN Linguistic Department University of Oregon, Eugene and Ute Language Program Southern Ute Tribe Ignacio, Colorado TABLE OF CONTENTS 1. TOPIC CONTINUITY IN DISCOURSE: AN INTRODUCTION 1 T. Givón 2. TOPIC CONTINUITY IN JAPANESE 43 J. Hinds 3. TOPIC CONTINUITY IN WRITTEN AMHARIC NARRATIVE 95 M. Gasser 4. TOPIC CONTINUITY AND WORD-ORDER PRAGMATICS IN UTE . . 141 T. Givón 5. TOPIC CONTINUITY IN BIBLICAL HEBREW NARRATIVE 215 A. Fox 6. TOPIC CONTINUITY AND DISCONTINUITY IN DISCOURSE: A STUDY OF SPOKEN LATIN-AMERICAN SPANISH 255 P. Bentivoglio 7. TOPIC CONTINUITY IN WRITTEN ENGLISH NARRATIVE 313 C. Brown 8. TOPIC CONTINUITY IN SPOKEN ENGLISH 343 T. Givón 9. SOME DIMENSIONS OF TOPIC-NP CONTINUITY IN HAUSA NAR RATIVE 365 P. Jaggar 10. TOPIC CONTINUITY AND THE VOICING SYSTEM OF AN ERGATIVE LANGUAGE: CHAMORRO 425 A. Cooreman Index of Names 491 TABLE OF CONTENTS 1. 2. 3. 4. 5. 6. The 'topic' strand. Micro traditions The 'paragraph' strand: Macro traditions Major topic functions within the thematic paragraph The discourse file: Topic availability to the hearer Factors affecting topic availability Discourse measurements of topic continuity 6.1. Referential distance ('look-back') 6.2. Potential interference ('ambiguity') 6.3. Persistence ('decay') 7. The grammatical coding of topic continuity 7.1. Preliminaries: Functional domains and syntactic coding 7.2. Scales in the coding of topic accessibility 7.2.1. The scale of phonological size 7.2.2. The word-order scale 7.2.3. The scale of roles and animacy . . 7.3. Topicality and passive vs. active 7.4. Topic continuity and main vs. subordinate clauses 7.5. Referential-indefinite NP's and existential-presentative devices . . . . 7.6. Topic continuity and definite-marking morphology 7.7. The use of restrictive modifiers 8. The studies in this volume 9. Typological predictions in the grammar of topic continuity 9.1. Zero anaphora, pronouns and agreement 9.2. Word-order variation and dislocations 9.2.1. Languages with rigid word-order 9.2.2. Languages with pragmatically-flexible word-order 9.3. Indefinites and existential-presentative constructions 9.4. Topic continuity and morphology Notes References 5 7 9 9 10 12 13 14 14 15 15 17 18 19 20 23 23 25 27 27 27 30 30 31 32 33 34 35 35 38 INTRODUCTION 5 1. The 'topic' strand: Micro traditions The intuition, expressed under whatever terminology, which lead to shifting the attention of the linguist from the purely structural notion of 'subject' toward the more discourse-functional notion of 'topic', or under some other guises 'theme', may be traced back to a number of sources, among which I find myself disinclined to apportion historical primacy. The sources most of us who became involved with the renascent 'topic movement' in the early nineteen seventies tended — and still tend — to cite more often were either the Prague School (cf. Firbas, 1966a, 1966b), the Firthian tradition (cf. Halliday, 1967) or Bolinger (1952, 1954). In one form or another, the various strands of this tradition tended to divide sentences ('clauses') into two distinct components, one of them the 'focus' ('rheme', 'comment', 'new information'), the other the 'topic' ('theme', 'old information'). And it was the second, the topic, which all early practitioners would then link to discourse structure, communicative intent, communicative dynamism, functional sentence perspective etc., in ways that tended to be often both vague and mysterious.1 In the early 1970's, when a number of us became involved in studying the phenomena of 'topic' and 'subject' (cf. Hawkinson and Hyman, 1974, Li, (ed.) 1976, inter alia), we tended to incorporate uncritically our predecessors' view of 'topic' as an atomic, discrete entity, a single constituent of the clause. When we worried about the relation between 'topic' and 'subject', in one way or another we gravitated toward viewing the subject as grammaticalized topic (cf. Givón, 1976a), And some of us went further and proposed typologies, whence 'topic prominent languages' exhibited paucity in such grammaticalization, while 'subject prominent language' displayed richer grammaticalization of 'topic'; which soon turned out to mean morphologization (cf. Li and Thompson, 1975). But already then, there was a range of rather recalcitrant data which suggested that, at least at the functional level, 'topic' was not an atomic, discrete antity. One could consider, for example, the commonplace phenomena of R- and Ldislocation, as in: (1) a. b. . L-dislocation: John, we saw him yesterday R-dislocation: We saw him yesterday, John Simplex: We saw John yesterday In both (la,b) above, the conventional wisdom went, 'John' is the topic. But then, what was the status of the subject 'we'? Obviously, the clause could have more than one 'topic', one grammaticalized as 'subject', the other of a different 6 T. GIVON status yet to be elucidated. The problem is further compounded when the dislocated constituent is coreferential with the grammatical subject, as in: (2) a. b. L-dislocation: R֊dislocation: John, he came yesterday He came yesterday, John Here the same argument is both 'topic' and 'subject'. Is it carrying a double function? What function? how defined? Just as disturbing were the data concerning dative-shifting ('promotion to direct object'), when it became clear (cf. Givón, 1975, 1979, Ch. 4, further ex panded in Givón, 1981a; see also Shir, 1979) that the direct object is also, in some clear way demonstrable by both syntactic and discourse-pragmatic tests, a 'topic' case in the sentence, albeit perhaps a secondary grammaticalized topic, as against the primary one, the subject. So that in sentences such as: (3) a. b. John gave the book to Mary John gave Mary the book there existed actually three different 'topics', perhaps hierarchized by degree. But degree of what? Importance? Topicality? The data pertaining to the relation between 'topic', 'definite NP', 'pronoun', 'agreement' and 'zero anaphora' were not intially considered overly damaging to the concept of topic as a functional prime. Eventually, however, their import was bound to sink in. Consider, for example, the various expressions of the sub ject-topic NP in the following: (4) a. b. d. e. . . .(he came in) and ø sat down. . . (zero anaphora) . . .(he came in;) he then sat down. . . (unstressed pronoun) . . .(she came in;) then hé joined her. . . (stressed pronoun) . . .(the woman came in;) then the man joined her. .. (definite NP) . . .now the man, he never joined. . . (L-dislocated NP) In each example in (4), the italicized NP is subject, topic and definite. In each, however, it seems to perform a different discourse function. But how defined? And how related to 'topic' or 'topicality'? Several things had then become evident, and received less than consistent expression in some of my own earlier excursions into the subject: INTRODUCTION (a) (b) 7 We were dealing with a non-discrete entity of 'topicality', or at best a multi-point scale (cf. Givón, 1975, 1976a, 1976b); or perhaps We were dealing with a functional dimension ֊ or a number of dimensions — for which the notion 'topicality' was rapidly losing explanatory power. Experimenting with labels of greater functional (and hopefully psychologi cal) import, I toyed briefly with 'degree of presuppositionality' (Givón, 1978, 1979 ch. 2), which translated itself rather naturally via 'degree of backgroundiness' into degree of predictability and thus continuity of topic NP's. And those in turn translate, at least eventually when all the empirical dust is raised and then allowed to settle, into a performance dimension of topic accessibility (cf. Givón 1978, 1979 Ch. 2). This is, in brief, one strand of the antecedence of this volume. Like other myopic views of a tapestry yet to be completed, it remains to some extent parochial.2 2. The 'paragraph' strand: Macro traditions The clause ('sentence') is the basic information processing unit in human discourse. A word may have 'meaning', but only the proposition — grammaticalized as clause - carries information. Human discourse, further, is multipropositional. Within it, chains of clauses are combined into larger thematic units which one may call thematic paragraphs (cf., under various terminological guises, Longacre 1976, 1979, Hinds 1979, Chafe 1979, inter alia). These may further combine into larger yet discourse units (such as 'paragraphs', 'sections', 'chapters', 'parts' or 'stories'). The thematic paragraph is the most immediately relevant level of discourse within which one can begin to discuss the complex process of continuity in discourse. There are, broadly, three major aspects of discourse continuity which are displayed in or mediated through the thematic paragraph, and which in turn receive structural/grammatical/syntactic expression within the clause. These three continuities thus bridge the gap between the macro and micro organiza tional levels of language. (a) (b) (c) Thematic continuity Action continuity Topics/participants continuity While this volume deals primarily with the most concrete of the three, (c), the three are nontheless deeply interconnected within the thematic paragraph. 8 T. GIVON Thematic continuity is the overall matrix for all other continuities in the discourse. It is the hardest to specify, yet it is clearly and demonstrably there. Statistically, it coincides with topic and action continuity to quite an extent within the thematic paragraph. The thematic paragraph is by definition about the same theme. Most commonly it also preserves topic and action continuity. However, topics/participants may change within the discourse without necessari ly changing either action continuity or theme continuity. And action continuity may change without necessarily changing thematic continuity. One is perhaps justified in viewing the three as an implicational hirarchy (or 'inclusion set'): (5) THEME > ACTION > TOPICS/PARTICIPANTS Finally, since the 'theme' is the most nebulous, macro -oriented entity out of the three, it is only to be expected that its structural expression in the 'grammar' is the most weakly coded.3 Thematic continuity is most commonly coded — if at all — via conjunction or clause-subordination particles in the SVO or VSO typology (of, say, English), or via verb-final or clause-chain final suffixes in the strict SOV typology (of, say, Japanese or the New Guinea Highlands). Action continuity pertains primarily to temporal sequentiality within thema tic paragraph, but also to temporal adjacency therein. Most commonly, within a thematic paragraph actions are given primarily in the natural sequential order in which they actually occured, and most commonly there is small if any temporal gap — or pause — between one action and the next.4 In the grammar/ syntax, which is primarily (though not exclusively) a clause-level coding instru ment, action continuity receives its expression strongly and universally via the tense-aspect-modality sub-system most commonly attached to the verbal word (cf. Hopper, 1979, Givón, 1977, 1982a, 1983, ch. 8). The functional domain of topic/participant continuity is the main concern of this volume. It is linked to the thematic paragraph in a statistically signifi cant but not absolute fashion: Within the thematic paragraph it is most common for one topic to be the continuity marker, the leitmotif so that it is the partici pant most crucially involved in the action sequence running through the para graph; it is the participant most closely associated with the higher-level 'theme' of the paragraph; and finally, it is the participant most likely to be coded as the primary topic ֊ or grammatical subject ֊ of the vast majorty of sequentiallyordered clauses/sentences comprising the thematic paragraph. It is thus, obvious ly, the most continuous of all the topics mentioned in the various clauses in the paragraph. The grammatical sub-system which codes clause-level topics, or the INTRODUCTION 9 structural correlates of the functional domain of topic identification, topic maintenance and topic continuity in discourse, are the main focus of the various studies in this volume. 3. Major topic functions within the thematic paragraph If the thematic paragraph is indeed a chain of equi-topic clauses, i.e. a string of clauses whose main /primary topic remains the same, then one could perceive an initial division of main topics into three major types according to their posi tion within the paragraph. One would, further, expect the grammar/syntax to code these three main functions in some fashion. The three are: (a) Chain initial topic: (i) Characteristically a newly-introduced, newly-changed or newly-re turned topic; thus (ii) Characteristically a discontinuous topic in terms of the preceding discourse context; but (iii) Potentially — if an important topic — a rather persistent topic in terms of the succeeding discourse context. (b) Chain medial topic: (i) Characteristically a continuing/continuous topic in terms of the pre ceding discourse context; and also (ii) Characteristically persistent — but not maximally so — in terms of the succeeding discourse context, even when an important topic. (c) Chain final topic: (i) Characteristically a continuing/continuous topic in terms of the pre ceding discourse context; but (ii) Characteristically a non-persistent topic in terms of the succeeding discourse context, even if an important topic. As we shal see further below, these general, indeed almost a priori consideration, are intimately involved in the kind of predictions one may make concerning the discourse behavior of various topic-marking devices, as well as the type of measurements one may devise to identify them. 4. The discourse file: Topic availability to the hearer Linguists traditionally deal with the binary distinction between definite and 10 T. GIVÓN indefinite, with the former marking topics which the speaker assumes the hearer can identify uniquely, is familiar with, are within his file (or register) and thus available for quick retrieval. On the other hand, indefinites are pre sumably topics introduced by the speaker for the first time, with which the hearer is not familiar, which therefore are not available to the hearer readily in his file, and for which he thus has to open the initial file. In terms of the topic functions within the thematic paragraph, section 3. above, one may say that paragraph-medial (b) and paragraph-final (c) topics must both be definite. But paragraph initial topics may be either definite (already identi fied to the hearer at some prior time, by whatever means) or indefinite. If the discourse file has psychological reality as an internal filing system, it is legitimate to ask whether it is a permanent file — long term memory — or a temporary file — short term memory, or perhaps a double filing system involving both. When the structural/syntactic/morphological coding system of a language codes strongly the definite/indefinite distinction, that constitutes one type of evidence supporting the existence of a permanent file. On the other hand, when the coding system of the language treats in the same fashion indefinite topics introduced for the first time and definite topics brought back into the discourse register after a considerable gap of absence, that constitutes some evidence for the register being a short-term, temporary file, where a gap of absence of a certain length precipitates erasure from the file. As we shall see throughout this volume, languages present syntactic and discourse evidence in support of both filing systems. There is enough cross-language evidence, further, to suggest that some im portant topics are in the file permanently, and are thus always available to speakers/hearers as part of their generic firmament. These are most typically unique important features of the universe, such as the sun, the moon, the world etc. They are also inalienably possessed body parts ('my head') or kinship terms ('my mother'). And they are most typically names ('Johnny'). What one ob serves about these permanently-filed topics and their discourse behavior is that they are much less predictable than other definite topics in terms of their posi tion within the thematic paragraph. They thus often constitute exceptions to the text measurements that reveal the rules which govern the discourse distri bution of topics that are not filed as permanently and as uniquely. 5. Factors affecting topic availability The discourse measurements performed in the various studies in this volume are derived from certain assumptions concerning what may reasonably affect INTRODUCTION 11 the degree of difficulty that speakers/hearers may experience in identifying a topic in discourse, i.e. in filing it appropriately in their internal register, so that predications or new information transmitted about those topics would in turn be addressed correctly. The major factors are listed as follows: (a) Length of absence from the register: If a topic is indefinite and thus intro duced for the first time, it is maximally difficult to process, by definition, since a new file has to be opened for it. If a topic is definite and returns to the register after a long gap of absence, it is still difficult to process. The shorter is the gap of absence, the easier is topic identification; so that a topic that was there in the preceding clause is by definition easiest to identify and file correctly. (b) Potential interference from other topics՛. If no other topics are present in the immediately preceding discourse environment, i.e. the short-term eresable file, topic identification is easiest. The more other topics are present in the immediate register, the more difficult is the task of correct identification and filing of a topic, especially if those other topics qualify semantically (in terms of their 'selectionai restrictions') for the role within the clause which the topic in question occupies. (c) Availability of semantic information՝. Especially when other topics clutter the immediate register and may thus create potential interference and difficul ties in topic identification, the availability of so-called 'redundant' semantic information within the clause in question may play an important role in facilitationg topic identification. This information comes primarily from the predicate of the clause, less so from verb-phrase adverbials (in particularly of manner), less so from other topics/participants of the clause. This information concerns generic probabilities that a particular topic could participate in the clause in the specific semantic/grammatical role in question (i.e. as subject, agent, patient, recipient etc.). (d) Availability of thematic information՛. Much like generic semantic informa tion about permanent likelihoods, thematic information available from the preceding discourse could help in topic identification — especially when other topics in the register may potentially interfere. Such information establishes specific probabilities — for this story, in this chapter, in this section or in this thematic paragraph — as to the topic identification within a particular clause and in a particular role. It also establishes, for particular discourses, some ranking 12 T. GIVÓN of importance of the various topics/ participants, and thus affects their behavior in terms of the permanent file, at least as pertaining to the particular dis course. The discourse measurements developed in this study are based at least in part on the four factors outlined above. In particular, we have attempted to assess the more concrete and more readily measurable factors (a) and (b). The fact that it is not yet possible to quantify rigorously factors (c) and (d) in spite of their undeniable importance creates a certain degree of indeterminancy in the results, so that correlations between grammatical devices and particular measure ments appear to be less than categorial. They are nevertheless dramatic, with residues that are important but not devastating. The role of semantic and thematic information in topic identification is going to remain an imponderable for a while, toghether with the more elusive role of personality and memory of speakers and hearers, their specific life experience and the more subtle assump tions they make about each other and their respective abilities to identify referents specifically as well as in general. That our correlations are as strong as they are merely suggests that over large bodies of texts of rather diverse types, the role of the less obvious factors affecting the grammar of topic identification is less than dominant, while the role of the more easily measurable factors is in some sense decisive.5 6. Discourse measurements of topic continuity So far I have talked primarily in terms of topic availability or identification. The approach pursued in this volume is to some extent speaker-hearer neutral, but nevertheless is couched in terms of assumptions that the speaker makes about topic-availability to the hearer. The transition from "availability" or "identifiability" to the more neutral "continuity" is motivated by certain as sumptions concerning gestalt psychology: (6) a. b. d. "What is continuing is more predictable" "What is predictable is easier to process" or conversely "What is discontinuous or disruptive is less predictable" "What is less predictable, hence surprising, is harder to process" Since our measurements are performed on texts rather than on speakers or hearers, assumptions such as (6) are fundamental in a certain chain of reasoning and empirical justification. The text itself does not reveal the assumptions INTRODUCTION 13 made by speakers or hearers as to topic identifiability in a direct way, nor does it reveal the ease or difficulty they experience in processing and filing topics in discourse. The text reveals, however, two types of information which in this study we have endeavored to correlate: (i) (ii) The grammatical, 'purely linguistic' devices used by the speaker to code various topics/participants in the discourse; and The exact position of those topics in the discourse, in terms of thematic paragraph structure, distance from last previous appearance, the clustering with potential other interfering topics, persistence in subsequent discourse context. Hopefully, once stable, strong and cross-linguistically viable correlations are established between these two types of information available to the linguist in his/her capacities of grammarian and discourse analyst, one may proceed to the obvious next step, that of correlating the grammatical and discourse-distribution data with psycho-linguistic experimentation and measurement. In the following sub-sections I will describe briefly the three main discourse measurements to which texts were subjected in this study. Previous works attempting to treat these topics cross-linguistically were either devoid of quanti fication or did not attempt to impose the same methodology across the sample of languages and topics (see eg. Hinds, ed., 1978 or Givón, ed., 1979, inter alia). This study is thus quite an extent unique, although it clearly draws on inspired guesses, intuitions and insights gained from previous and less rigorous work. 6.1. Referential distance ('look-back') This measurement assesses the gap between the previous occurence in the discourse of a referent/topic and its current occurence in a clause, where it is marked by a particular grammatical coding device. The gap is thus expressed in terms of number of clauses to the left. The minimal value that can be assigned is thus 1 clause, which is maximally continuous. Since it is impossible to deal adequately with infinity, and since there are grounds for suspecting that the erasable, short-term file is the crucial psychological correlate of this measure ment,6 one must impose a maximal integer on topics whose referential gap exceeds certain range. In this study I have chosen to impose that arbitrary upper bounds at 20 clauses to the left. When a topic does not appear within that range, the value of 20 is automatically assigned and scanning discontinued. This means that referential indefinite topics are assigned this maximal value 14 T. GIVÓN by definition. 'Presence in the register' at some preceeding point is not necessarily overt, but may also be represented by a zero anaphore, provided the topic/referent is indeed a semantic argument of the predicate of the clause. 6.2. Potential interference ('ambiguity') This measurement assesses the disruptive effect which other referents within the immediately preceding register may have on topic availability or identifica tion within a clause. In order to minimize an obvious correlation with referen tial distance,7 the "immediately preceding register" was defined arbitrarily as between 1 and 5 clause to the left, most commonly 3 clauses to the left in the studies in this volume. This is based on the assumption that if a topic has already occupied a dominant/continuous position and umambiguous identification within the last 3 clauses in the register, the presence of other, potentially-inter fering topics further away in the preceding register does not interfere as signifi cantly with the task of topic identification.8 This measurement was further mitigated by the factor of semantic compati bility with the predicate of the relevant clause: An interfering topic was counted only if it was just as semantically compatible (most commonly in terms of animacy, humanity, agentivity or semantic plausibility as object or subject) with the predicate of the clause as the topic under consideration. This measure ment thus combines an assessment of both factor (b) and (c) in section 5., above, although (c) is probably less dominant. If no potentially interfering referent was found within the relevant distance, the value of 1 was assigned. If one or more referents were found within the rele vant distance, the value of 1 was assigned. Only in some of the studies, higher values — commonly not exceeding 3 — were assigned. 6.3. Persistence ('decay') The first two measurements outlined above relate in an obvious fashion to our postulated 'continuity' and 'identifiability'. Both involve the preceding discourse context; both would hopefully correlate, in some fashion, to the hearer's task of identifying referents and filing them in the register. Our third measure, that of topic persistence in subsequent discourse, is of a different kind. Most directly it is a reflection of the topic's importance in the discourse, and thus a measure of the speaker's topical intent. We will accept as self-evident, by definition, the assumption that: INTRODUCTION (7) 15 "More important discourse topics appear more frequently in the register, i.e. they have a higher probability of persisting longer in the register after a relevant measuring point". Assumption (7) also justifies, at least in part, treating topics that are highly continuous in terms of low referential distance as potentially important dis course topics. But as will be shown shortly, this correlation is not as strong. In this study, we measure persistence in terms of the number of clauses to the right — i.e. in subsequent discourse from the measured clause — in which the topic/participant continues an uninterrupted presence as a semantic argument of the clause, an argument of whatever role and marked by whatever grammatical means. The minimal value that can be assigned is thus zero, signifying an argu ment that decays immediately, i.e. of the lowest persistence. There is no maximal value assigned by definition in this case. As one could guess following the discussion of thematic paragraphs and equitopic chains in section 3. above, there is a partially predictable relation between our measures of referential distance (6.1.) and persistence (6.3.). This relation, as pertaining to topics of relatively high importance in the discourse, may be given as in (8) below: (8) topic position within paragraph referential distance persistence initial high (low continuity) high (high continuity) medial low (high continuity) medium (medium continuity) final low (high continuity) low (low continuity) 7. The grammatical coding of topic continuity 7.1. Preliminaries: Functional domains and syntactic coding In the approach to the study of human language pursued here, it is a funda mental assumption that there exists a systematic correlation between message and code. That the correlation is never perfect is a fact that those of us endeav oring to study the use of language in communication — rather than its structural 16 T. GIVÓN properties in the artificial medium of isolated 'sentences' — have had to learn to live with, appreciate and slowly come to understand. The syntactic coding of discourse function, which is the bulk of the functional correlates to syntax9 — is imperfect but geared for a certain efficiency of processing, whereby the loss accruing in the cause of efficient processing is to a large extent offset by the omnipresence of the discourse context, refering particularly to: (a) (b) (c) Generically shared knowledge coded in the culturally shared lexicon and known semantic likelihoods; Specifically shared knowledge of the particular discourse, what was said earlier and various inferences thereof including verbal or non-verbal feedback; Specifically shared knowledge of the particular speaker and hearer, what they know or tend to assume about each other, their respective know ledge, motivation and propensities, not excluding possible telepathy, however unlikely on general grounds. Context thus plays a crucial role in allowing syntax to be an efficient processing device while retaining — when one ignores context — a less-than-perfect correla tion between code and message (for further discussion see Givón, 1979, Ch. 5, 1982b). As can be easily gathered from the preceding section, the area of topic identi fication in discourse is a complex functional domain rather than a simple 'func tion'. This is indeed typical of syntactic coding in general (cf. Givón, 1981b, 1983). If the complex functional domain under scrutiny here is provisionally termed "degree of topic accessibility", then it is clear that at least in some respect we are dealing here with a scalar, graded continuum. Such func tional domains are quite common in language, both in the message realms of lexicon, propositional semantics and discourse pragmatics (see discussion in Givón, 1980a, in reference to the semantics of verb complementation). Along a scalar functional domain, different languages may use varying number of coding points, i.e. syntactic devices — comprising of word-order, morphology, intonation or their possible combinations. Some language may either over-code or under-code the entire domain or sub-segments of it. This possibility may be schematically represented as in: INTRODUCTION (9) 17 coding points՛. functional domain; To the extent that there exists any language-internal and cross-language predic tability as to what coding devices ('syntactic constructions') are more likely to code what relative portions of the scalar domain, the coding points in (9) must represent an implicational hierarchy. One may not predict the coding density in any particular language, nor whether a particular language will actually have a particular coding device, nor even the exact functional point on the scale (or exact boundaries of a sub-section) to be coded by a particular device. One could, however, predict with extreme accuracy that within any particular language the relative order of the coding points along the scale would be maintained. And thus that if a certain device X codes a certain functional point (or sub-section) of the domain, a certain device Y could only code a higher point on the domain, but never a lower point. This is indeed one of the most stable results demonstrated by this cross-language study. 7.2. Scales in the coding of topic accessibility In earlier discussions of the syntactic coding of topic accessibility, I was in clined to take for granted that indeed we deal here ultimately with a single though complex scale, thus ranking the most common grammatical devices involved — cross-linguistically — in coding this domain along the following scale (cf. for example Givón, 1978, 1979, 1981b, 1982c): (10) most continuous/accessible topic zero anaphora unstressed/bound pronouns or grammatical agreement stressed/independent pronouns R-dislocated DEF-NP's neutral-ordered DEF-NP's L-dislocated DEF-NP's Y-moved NP's ('contrastive topicalization') cleft/focus constructions referential indefinite NP's most discontinuous/inaccessible topic 18 T. GIVON While our cross-language studies largely upholds this scale, it is clear now that, to begin with, the scale is still too language-specific, and that better and typologically more relevant predictions can be made by recognizing a number of scales each reflecting some specific syntactic coding means — be those word-order, morphology, intonation or phonological size10 — which alone or in various combinations make up the syntactic constructions that code our scalar domain. The discerning reader should detect by now the rather ambitious goal we have set ourselves: To define, in a preHminary but cross-linguistically stable fashion, the basic principles of konicity underlying the syntactic coding of the topic identification domain. 7.2.1. The scale of phonological size As is transparent from the overall scale in (10) above, the following subscale exists in the grammar of topic identification: (11) more continuous/accessible topics zero anaphora unstressed/bound pronouns ('agreement') stressed/independent pronouns full NP's more discontinuous/inaccessible topics The inconicity principle underlying this scale must be simple: (12) "The more disruptive, surprising, discontinuous or hard to process a topic is, the more coding material must be assigned to it" In turn, this may translate into a relatively sane psychological and indeed motorbehavior principle :11 (13) "Expend only as much energy on a task as is required for its performance" The coding scale in (11) is only one expression of our phonological size scale. Another involves stress, and may be given in the following three sub-hierarchies: (14) a. b. stressed pronouns > unstressed pronouns cleft/focus NP's > non-focus NP's Y-moved NP's > non-Y-moved NP's INTRODUCTION 19 In each case in (14), the more heavily stressed device to the left is used to code more discontinuous/inaccessible topics than the device to the right. And stress is merely one type of phonological material. Finally, at least some studies in this volume12 demonstrate that NP's modi fied by restrictive modifiers code more discontinuous/less accessible topics than unmodified NP's. This must be a reflection of the phonological size scale, since obviously a modification increases the size of the NP. 7.2.2. The word-order scale The gross overall scale in (10) presents already one prediction concerning the use of word-order to code topic continuity, i.e. the relative position of R-dislocation vs. L-dislocation. These two devices pertain to languages with rigid word-order, such as English (SVO) or Japanese (SOV). They are further found only in the informal, unplanned colloquial register of such languages (cf. Keenan, 1977, Givón, 1979, ch. 5). The specific scalar prediction in such lan guages is thus : (15) R-dislocation > neutral word-order > L-dislocation whereby the left-most on the scale codes more continuous topics, the right most more discontinous ones. This scale is corroborated by the results reported here for colloquial English (Givón, in this volume) as well as by the results ob tained in Givón (1982d) for Pidgin English spoken by Philippine (VO) and Korean (VO/OV)/ speakers in Hawaii. In addition, in languages with pragmati cally-controlled flexible word-order, such as Spanish (SV/VS), Biblical Hebrew (SV/VS) or Ute (SV/VS, OV/VO), in this volume, a clearly related scale is evi dent: (16) a. b. VS>SV VO > OV The scales in (15) and (16) are quite transparently one and the same, and may be given as pertaining to the relative position of the topic vs. the comment, follow ing Givón (1982d), as: (17) COMMENT-TOPIC > TOPIC-COMMENT again with the left-most element in each implicational scale coding more con- 20 T. GIVON tinuous topics, the right-most less continuous ones. In order to demonstrate beyond a shadow of doubt the fundamental same ness of the word-order scales (15), (16) and (17), I would like to cite the numeri cal results pertaining to referential distance taken from several studies in this volume as well as from Givón (1982d). The results are presented in Table I, below. While the numerical values are not always the same, the relative ranking is amazingly consistent, with topic-comment orders in each case showing higher average referential distance values than comment-topic orders. Finally, following Givón (1982d), it is possible to integrate the size universals in section 7.2.1. with the word-order universals in this section into a single implication scale: (18) COMMENT > COMMENT-TOPIC > TOPIC-COMMENT > TOPIC (zero topic) (zero comment) One may then go on to suggest that the most obvious topics receive their coding as zero, the least obvious topics receive their coding as topic repetition (zero comment), and the comment-topic and topic-comment orders are merely the intermediate-/coding points between those two extremes. The whole scale thus abides by one simple psychological principle: (19) "Attend first to the most urgent task" When the topic is most obvious, making the comments is surely a more urgent task. When the topic is less obvious, establishing it is more urgent. In Pidgins and spoken registers (see Givón, 1982d as well as Givón's Spoken English paper, in this volume), when the topic is least available, thus most problematic, topic repetition is the preferred coding device. The last comment to be added here is that both Y-movement (contrastive topicalization) and cleft-focus can be considered instances of more discontinu ous/surprising topic constructions where the topic is placed to the left of the comment. Most commonly the source of the surprise/disruption here is either referential distance or the presence of other referents, as well as some element of counter expectation. 7.2.3. The scale of roles and animacy To many of us (cf. Hawkinson and Hyman, 1974, Givón, 1976a, inter INTRODUCTION 21 TABLE I: Correlation between Referential Distance and Word-Order [definite NP's] AVERAGE REFERENTIAL DISTANCE DATA SOURCE (i) Dislocations TOPIC-COMMENT L-DISLOCATION SV or OV NEUTRAL ORDER (IF ANY) COMMENT-TOPIC R-DISLOCATION VS or VO Philippino-English Pidgin (Givón, 1982d) [subject NP only] 16.20 8.00 1.00 Korean-English Pidgin (Givón, 1982d) [object NP only] 15.8 (11.86/8.23) 1.00 Spoken English (Givón, in this volume) [subject NP only] 15.35 10.15 1.00 11.86 / 8.23 9.67 / 4.46 13.31 / 1.48 13.00 / 3.42 3.41 9.24 / / 1.00 7.53 8.47 / 4.83 (ii) Variation (a) Object Korean-English Pidgin (Givón, 1982d) Ute (Givón, in this volume) (b) Subject Ute (Givón, in This volume) Spanish-English Pidgin (Givón, 1982d) Spoken Spanish (Bentivoglio, in this volume unmodified NP: modified NP: Biblical Hebrew (Fox, in this volume) 22 T. GIVON alia) the hierarchy of the case-roles was our first acquaintance with the scalar concept of topicality. The semantic or grammatical case-roles, pending on one's orientation at the time, seemed to exhibit different propensities for becoming the 'topics' of clauses. The semantically based case-role hierarchy may be given as: (19) AGT > DAT/BEN > ACC> OTHERS Almost all languages13 have a grammaticalized subject case-role, singled out by word-order, morphology, intonation or their combination(s). Most though per haps not all languages have some grammatical manifestations of a direct-object, which I have argued elsewhere (Givón, 1979, Ch. 4 and more comprehensively in Givón, 1981a) is a "secondly topic" of clauses, with the subject then being the "primary topic". In most languages the coding of the direct object involves word-order, with the more topical direct object most commonly preceding all other objects. In a sub-set of languages, such word-order coding is also accom panied by a morphological coding, most commonly assigning the mor phologically-unmarked form of the accusative to "promoted" direct object (which are semantically not accusative), but occasionally (cf. Nez Perce, see Rude, 1982) also by assigning marked accusative morphology to "promoted" direct objects. If one recognizes the direct object as a secondary topic, then the case-role hierarchy may be also expressed as a hierarchy of grammatical cases: (20) SUBJ > DO > OTHERS This hierarchy does not contradict the semantic-role hierarchy in (19), but rather incorporates it, given the following universal tendencies of human dis course: (a) (b) Agents tend to be made the clausal subjects in discourse; and When dative/benefactive objects are present, they tend to be promoted either obligatorily or in high frequency to direct object (see discussion in Givón, 1981a). One strand running through all the papers in this study is a massive substan tiation and amplification of both topic hierarchies, at least so far as the higher topicality of subjects and human/animate/agents is concerned. Unfortunately, the sample of languages involved does not allow us as comprehensive a treat- INTRODUCTION 23 ment of the topicality status of the direct object, except in one paper (Cooreman, Chamorro) and there somewhat indirectly. What is shown there is that the anti-passive construction in Chamorro, which typically demotes accusatives from their DO status, also renders them of the lowest topicality, in terms of two of our continuity measures, referential distance and persistence. Further quanti tative support of the same type may also be found in Rude (forthcoming), where a similar treatment is given to various grammatical constructions in Nez Perce. The cumulative effect of all these quantified studies is to demonstrate explicitly what exactly one means by the highei topicality of subjects vs. objects and direct objects vs. all other objects. 7.3. Topicality and passive vs. active Two studies in this volume, Fox (Biblical Hebrew) and Cooreman (Chamorro) deal with the topicality of passives vs. active. In general, in languages where the passive is really a passive (rather than an incipient Ergative, see discus sion in Givón, 1981b as well as text-studies in Hopper and Thompson, 1982 and Fox, 1982), the text frequency of passives is much much lower than that of actives, somewhere between 5-20 percent of all main, affirmative, declarative clauses (for text counts in English ses Givón, 1979, Ch. 2 and for Biblical Hebrew, Fox, in this volume). This by itself tags the passive as a discontinuous device in discourse, by virtue of its rarity. In addition, our measures also show that the subjects of passives tend to be more discontinuous than the subjects of active. To some extent, however, this is a function of text-rarity of passives. Cooreman's study of Chamorro (in this volume), takes a somewhat different direction, showing that the topicality of non-agent — by our measurements ֊ is much higher when they are subjects (of the passive) than when they are ob jets (of the middle-voice or Ergative constructions). And it is the lowest, as in dicated above, when they appear in the anti-passive. While the passive is a complex, multi-dimentional functional domain (Givón, 1981b), it is clear that one of its dimensions overlaps, to quite an extent, with our domain of topic continuity and topic identification. 7.4. Topic continuity and main vs. subordinate clauses The conventional wisdom in many discourse studies (see eg. Givón, 1977, 1979, Ch. 2, Hopper, 1979, inter alia) used to be that main clauses carry the bulk of sequentially-ordered new information in discourse, and various subor dinate clauses may carry discontinuous, non-sequential background information. While this is certainly true in some language types (cf. Biblical Hebrew, Givón, 24 T. GIVON 1977), it is hardly the whole story. And in fact, one type of so-called subordina te clause, non-finite or participial ones, tend to be used — often in long clause chains — as a typical subject/topic continuity device. Some examples may be brought from English to illustrate this potential: (21) (22) Purpose clauses: Participials: I did it to attract attention a. Having finished, he left b. Working hard and fast, he managed to plug the hole One may argue that these examples are 'localized' and 'grammaticalized', but the total predictability of the identity of the subject, which not accidentally is marked by zero, is clearly an example of the coding of high topical continuity. Further, one could show that the devices in (22) could easily render English a true clause-chaining language. Thus consider: (23) Working hard and not getting anywhere, trying again and again, marshalling all her ingenuity and internal resources yet finding the going rougher and rougher and getting progressively more frustrated, she finally conceded the obvious and gave up. Having seen the product and having decided to cancel the order, but having then had a change of heart, he took the day off. (24) While clause-chaining is a stylistic option in English (as well as in Amharic, see Gasser, in this volume), in some languages it is the main — perhaps only — expressive vehicle in continuous discourse. In languages such as English or Amharic, where other stylistic venues are open, non-finite clause-chaining usu ally involves obligatory equi-subject (equi-topic) conditions. In languages with no other stylistic alternative, while at the text-frequency level equi-subject predominates inside clause-chains, special provisions are made for switch-subject — switch reference — within such chains. As an example, consider the following passage from Chuave (Thurman, 1978): (25) a. b. . . .meina i ne-ro money get eat-SS '. . . (I) took the money enatekoiu-re then again come-SS then (I) came back INTRODUCTION c. d. e. f g. 25 iki moi-i-koro, house be-I-DS and I stayed home, tekoi u boi-n-goro, again call out-he-DS so then he sent for me again, inako de-ro return leave-SS and so (I) came back fu-i-goro go-I-DS and I went there tokoi numba lin-lin numba-i naro-Ø-m-e. again number one-one number-that give-PAST-he-DECLAR and again he made me foreman (of the work-line)'. Of the entire passage above, only the final clause (25g) is finite in the sense of being marked for tense-aspect-modality and speech-act value. The SS (same subject) and DS (different subject) markers are obligatory in the non-finite chain which precedes, and they are anticipatory — alerting the hearer to subject change in the following clause. Two other characteristics of these markers are of interest. First, the SS marker is phonologic ally smaller than the DS marker. In some related languages (cf. Haiman, 1980) the SS is zero and the DS a phonologically-realized suffix. Further, in each case where the DS suffix is used above, the subject — about to be changed — is marked overtly with a pronoun. But when the SS suffix is used the subject — more predictable — is marked by zero. These, I believe, are fairly transparent reflections of our phonological size iconicity principle (12), above. 7.5. Referential-indefinite NP's and existential-presentative devices As suggested earlier, referential-indefinite NP's, being introduced into the discourse for the first time, should be considered maximally surprising/disrup tive/discontinuous, at least as far as their continuity vis-a-vis the preceding discourse context is concerned. In addition, however, one could study them instructively as to their persistence properties, which would then indicate their potential topical/thematic importance in the subsequent discourse. Grammatical devices may then be categorized according to whether they are used to introduce into the register important, persistent topics or unimportant and fast-decaying T. GIVÓN 26 ones. Some of the studies in the volume have clear bearing on this. For example, Fox (Biblical Hebrew) shows that if one introduces a referential indefinite ar gument into the register at the accusative object position, it decays much faster than if one introduces it as an indefinite-referential subject, typically with the SV word-order. Human indefinite subjects persist on the average 2.90 clauses after first being introduced, while human indefinite direct objects persist on the average only 0.83 clauses after entry into the register. In other languages (cf. Hetzron, 1971, but see also Givón, 1978 for further discussion) it is not the SV but rather the VS word order that tends to be a presentative device, introducing important topics into the register in subject position. Such languages tend to be rigid word-order languages, most commonly SVO, such as English or Mandarin (see Li and Thompson, 1975), where the postverbal order is used to introduce indefinites into the register. In languages with flexible, pragmatically controlled wordorder, such as Ute (for both subject and object) and Biblical Hebrew (for the subject only), the word-order principles outlined earlier above (section 7.2.2.) holds, and the re-verbal position is used to introduce idefinite into the register. Another theme that often runs through the grammatical coding of referen tial-indefinites is the use of morphological means normally reserved for marking logically non-referential NP's for coding referential NP's of lesser importance as they enter the register. I have identified this device elsewhere for languages using the numeral 'one' to mark referential-indefinites as well as languages using other devices to affect the same contrast (Givón, 1978, 1981c). As an example consider the following from Israeli Hebrew:14 (26) a. b. . . . az atsárti ba-xanút ve-kaníti Ίtότι-χad ve-hitxálti so I-stopped at-the-store and-I-bought paper-one and-I-started '. . . so stopped by the store and I bought this one paper and I began li-kró oto ve-hayá sham maamár-xad norá meanyén ve-ha-itón kulo. . . to-read it and-was there article-one very interesting and-thepaper all֊of-it to read it and it had a very interesting article and the entire paper. . . ' . . . az atsárti ba-xnút ve-kaníti itón-ø ve-haláxti ha-báyta so I-stopped at-the-store and-I-bought paper and-I-went home '. . . so I stopped by the store and go a paper and I went home INTRODUCTION 27 ve-axálti másheu ve-axár-kax haláxti lishón. . . and-I-ate something and-after-that I-went to-sleep and I ate something and then I went to sleep. . .' In both (26a) and (26b) 'paper' is logically referential. However, in (26a) it turns out to be an important, persistent topic in the discourse, and it is marked by the numeral 'one'. In (26b), however, the actual referential identity of the apper is only incidental; it decays rapidly, it retains no import in the discourse, the speaker merely did some 'paper-buying'. So in spite of being logically refe rential15, 'paper' appears with no morphological marking. 7.6. Topic continuity and definite-marking morphology While most of the studies in this volume largely skirt around this complex issue, one must point out that in many language various "topic marking parti cles", be they prefixai as in Lahu (Matisoff, 1975) and Lisu (Li and Thompson, 1976) or suffixal as in Korean, (Hwang, 1982) or Japanese (Hinds, in this volu me), play various role in the grammatical coding of topic continuity, in terms of referential distance, persistence, emphatic contrast (in the presence of other referents) etc. Similarly, in English (Linde, 1979), Dutch (Kirsner, 1979), Per sian (Mahootian, 1979) and many others (Givón, 1978) demonstrative articles/ pronouns assume similar functions, often developing into definite and indefinite articles of various kinds. 7.7. The use of restrictive modifiers Restrictive modifiers, such as adjectives and relative clauses, very clearly are involved in the grammar of topic continuity, and at least one study in this volume (Bentivoglio, colloquial Spanish) documents their behavior in terms of both our referential distance and potential interference measurements (see sec tion 7.2.1. as well as Table I, above). 8. The studies in this volume Without casting the fine details in cement, we have attempted to follow the same quantitative methodology and the same general approach in all the studies in this volume. While the selection of particular languages to be studied was not by itself completely systematic, we have attempted to insure a reasonable typo logical balance in the areas of the grammar that count most heavily in the coding of topic continuity: Word-order typology and morpho-tactics. In this section I will briefly survey the studies included in this volume, touching upon what 28 T. GIVON seems to be the salient typological variables involved in the grammatical coding of topic continuity. 8.1. Japanese: This is a study of a rigid SOV languages with suffixal case-mar king, post-nominal topic-marking particles (-ga, -wa) and a relative paucity of clitic-pronoun/verb-agreement morphology, and thus an extensive use of zero anaphora, with independent pronouns being largely contrastive. The possibility of clause-chaining is present but not fully investigated here, remaining a stylis tic option in Japanese. 8.2. Amharic: Equally rigid as an SOV language, Amharic has the rich pronomi nal/agreement morphology of Semitic. It has a mix of prefixai and suffixal casemarking morphology, excluding the subject and indefinite direct object. It also exhibits a number of topic-marking suffixes which are clearly not of Semitic origin. Finally, clause-chaining is a stylistic option that is well documented here. 8.3. Ute: An ex-SOV Uto-Aztecan language with complete word-order flexi bility controlled by the pragmatics of topic-continuity, Ute exhibits suffixal casmarking that interacts, for subject and direct object, with the noun-gender suffixes and the grammar of referentiality. Clitic pronouns/agreement are op tional for both subject and direct object, but are much more extensively used for objects. Further, their position is not fixed on the verb (though statistically the verbs in discourse carry most clitics), but rather they cliticize on the first word in the clause. Zero anaphora in predominant for subjects, less so for ob jects. Various topic-marking particles, including demonstratives, are used but not studied here. 8.4. Biblical Hebrew: This ancient Semitic language is rigidly VO but shows a pragmatically-controlled VS/SV variation. Subject agreement/clitic pronoun is obligatory, but not for direct objects. Independent pronouns are contras tive for subjects of verbal forms but could be also anaphoric for subjects of non-verbal predicates, as well as for direct objects. Zero anaphora is a minor phenomenon, for objects only. 8.5. Colloquial Spanish: Typologically very close to Biblical Hebrew, except that it is already gravitating toward rigidification of SVO. As a result, while some VS/SV variation controlled by topic-continuity pragmatics is exhibited, its extent is smaller that in BH. And the 'presentative' VS word-order ֊ totally INTRODUCTION 29 absent from BH ֊ is already evident here. 8.6. Written English: A rigid SVO language, with unstressed pronouns used anaphorically, in effect like clitics, but also a stylistic option of zero anaphora, primarily for subjects. The same pronouns when stressed are used contrastively. Clause-chaining is a stylistic option but probably of limited currency. 8.7. Colloquial English: The same typological characteristics as above. How ever, the colloquial variety exhibits the use of L- and R-dislocation as well as topic repetition and hesitation as topic-marking devices sensitive to the discour se-pragmatics of topic continuity/predictability. 8.8. Hausa: A rigid SVO Chadic language, with an extensive agreement/prono minal morphology roughly of the BH or Spanish type. The use of zero anaphora is thus attested but limited to objects, and even there not statistically extensive. 8.9. Chamorro: A fairly rigid V-first language that nevertheless allows highly contrastive/discontinuous NP's — primarily subjects — to be moved pre-verbally (thus an SVO variant), a situation that is fairly characteristic of the V-first languages of the Austronesian family. Further, it is an old Ergative language, with the ERG/ABS contrast manifested primarily on the verb-agreement (clitic pronouns) paradigm. This study is divided into two: (I) A study paralleling the rest of the papers in this volume; and (II) A study, using only the measures of referential distance and persistence, documenting the behaviour of the five constructions which can code transitive sentences in Chamorro, (a) Agentless passive (b) Agented passive (c) Middle-voice active (d) Ergative active (e) Anti-passive active in terms of the topic-continuity properties — hence topicality — of the subject and direct object of clauses. The results are exciting both methodologically and substantively, allowing for the first time a rigorous, discourse-based definition of a mature 'surface' Ergative language. Novel uses of the messurement methodol ogy developed in this volume are thus revealed. The major typological hole in this collection, so far as the goals of its editor T. GIVON are concerned, is the absence of a study of a language with a clear "promo tional" direct object case that is pragmatic rather than purely semantic (i.e. 'accusative') (Givón, 1981a). Indirectly the Chamorro study, part II, bears on this area, since the anti-passive may be considered as "demotion" of the direct object (most commonly also its total deletion from the surface text). We hope to supplement this with an ongoing study of Nez-Perce, which has both the "promotion" and "demotion" phenomena (Rude, 1982, Rude, forthcoming). 9. Typological predictions in the grammar of topic continuity In this section I will outline the kind of typological predictions that one could project out of the studies in this volume taken together. While many of the details are yet to be worked out, a number of solid correlations seem to be emerging. I will deal with them in order. 9.1. Zero anaphora, pronouns and agreement In terms of the functional domain of topic continuity, as defined most firmly by our measurements of referential distance and potential interference, one could identify three major coding points covering this section of the conti nuum of topic accessibility. They may be defined functionally with both no tional labels as well as our measurements, with point A being the most continu ous, point intermedaite and point the least continuous: (27) overall continuity: topic continuity: theme continuity : ave. ref distance՛. ave. interference:16 highest high high 1.00 1.00 intermediate high intermediate 1.00-1.20 1.00 lowest low low 1.70-2.00 close to 2.00 In terms of how individual languages code these three semi-discrete sections of the continuum, one observes the following generalizations: (a) All languages code point — the most discontinuous of the three — with stressed independent pronouns. They are used either contrastively or as topic switchers. (b) In languages with obligatory verb agreement — most commonly pertaining to the subject - (cf. Biblical Hebrew, Amharic, Hausa, Spanish) - those clitic pronouns/agreement will code both points A and B. INTRODUCTION (c) (d) (e) (f) 31 In languages such as English or Ute, where unstressed/clitic pronouns17 are not obligatory, a three way division tends to be observed for the coding of subjects: (i) Stressed/independent pronouns code point (ii) Unstressed/clitic pronouns code point (iii) Zero anaphora codes point A However, there is probably a strong quantitative difference between Ute and English in terms of the relative frequency of zero anaphora vs. un stressed pronouns: In Ute subjects are coded in discourse primarily by zero anaphora, while in English probably primarily by unstressed pro nouns. 18 For objects the situation is different, so that in both Ute and English the coding distribution is: (i) Stressed/independent pronouns code point (ii) Unstressed/clitic pronouns code both points and A Given the generalizations in (c) above, one could make a markedness prediction concerning the coding density of subjects, direct objects and obliques: "A language may never have more coding points on the topic continuity functional domain to code a case-role that is lower on the topicality hierarchy". In other words, subjects may exhibit more coding points than DO's, and DO's more than obliques, but not vice versa. This is a reflection of the old notion of functional load. Languages without unstressed pronouns — such as Japanese, Korean and Mandarin (cf. Li and Thompson, 1979) - will exhibit the following coding distribution: (i) Stressed pronouns code point (ii) Zero anaphora codes points and A Non-finite forms of verbal clauses, either with or without genitive subject ('agreement'), will on the whole code point A, much like zero anaphora. However, this is relevant only for subject topics, a restriction that again relfects the markedness prediction give in (d) above. 9.2. Word-order variation and dislocations The use of word-order devices to code topic continuity obviously pertains to a much more discontinuous section of the topic continuity scale, involving either independent pronouns or full NP's, both already highly discontinuous devices. Typologically, one could divide languages into two separate groups here. 32 T. GIVON 9.2.1. Languages with rigid word-order This category will include, of the languages studied in this volume: SVO: English, Hausa SOV: Japanese, Amharic VSO: Chamorro For all these languages, Y-movement and dislocations are attested, at least to some extent, although they are most clearly attested in SVO languages. The neutral word-order has an overall continuity value somewhere between the two extremes of pre-verbal ordering (L-dislocation, Y-movement) and post-verbal ordering (R-dislocation). The three specialized ('marked') devices may be de scribed via the same parameters as in (27) above. (28) R-dislocation overall continuity : high topic continuity : high theme continuity : high ave. ref. distance: 1.00-2.00 ave. interference: 1.00-1.50 Y-movement L-dislocation low low high 2.00-3.00 2.00 low low low above 15.00 1.50-1.75 The three devices may thus be characterized as: (i) Y-movement: Relatively localized in terms of ref. distance, with the discontinuity/surprise due primarily to contrast with other referents in the immediately-preceding discourse environment, thus exhibiting high potential interference; (ii) L-dislocation: Used to return topics back into the register over long gaps of absence, thus high ref. distance, and also consequently fairly high potential interference values; often associated with major thematic breaks in discourse structure, i.e. typically a paragraph-initial device. (iii) R-dislocation: This is the 'hegde strategy', Hyman's (1975) 'afterthought topic', with topic-discontinuity a bit higher than that characteristic of unstressed/clitic pronouns, and probably due more to potential inter ference than to ref. distance. In addition, some more specific predictions can be made concerning the likely distribution of these three word-order devices: (a) SVO language tend to exhibit all three devices, in addition to cleft which is also a pre-verbal movement and highly contrastive; INTRODUCTION (b) (c) 33 SOV languages tend to exhibit primarily R-dislocation as a distinct 'move ment rule', since the neutral position of nominal topics is pre-verbal. The cleft-focus position quite often is fixed immediately preceding the verb. Y-movement is most commonly handled by stress with or without added topic-marking morphology. The function covered elsewhere by L-dislocation is most commonly handled by topic-marking morphology; VSO languages most commonly allow left-movement (pre-verbal ordering) only for highly contrastive functions such as cleft-focus, L-dislocation and Y-movement. The functions performed elsewhere by R and L-dis location are most commonly handled by either topic/case-marking mor phology, including passivization (cf. Philippine languages), but may also involve word-order changes on the right of the verb 9.2.2. Languages with pragmatically-flexible word-order The three languages in our sample which display pragmatically-controlled word-order — Ute, Biblical Hebrew and Spanish — are hierarchized in terms of the scope of this phenomenon, with Ute displaying complete flexibility of both subject and object position, Biblical Hebrew complete flexibility of only the subject position (and rigid VO for the object), and Spanish a much more restric ted flexibility of only the subject, which is on its way to becoming — at least in colloquial Spanish — a rigidly held pre-verbal element (i.e. SVO). This hier archy, which is probably a diachronic cline19, is another reflection of the markedness prediction given in section 9.1. above, whereby a particular prag matic device has a wider scope of application for the main topic — subject — than for the secondary topic. In a language with pragmatically-controlled word-order flexibility, the preverbal position of NP's covers a wide range of discontinuity, including what in a strict SVO language would be L-dislocation and Y-movement (with the latter distinguished most commonly by stress). While the post-verbal position covers an equally wide range, including probably both the 'neutral' word-order (cf. the proportionately high frequency of the VS word-order in both Ute and Biblical Hebrew) and R-dislocation. One may thus say that a pragmaticallyflexible language is undercoded in the functional domain of topic continuity, at least as far as the use of word-order is concerned, as compared to (at least) a rigid SVO language. This relationship may be schematically expressed as: 34 T. GIVÓN It is quite likely, however, that the coding slack is picked up in a flexible wordorder language by various topic-marking, definitizing and indeflnitizing mor phemes. 9.3. Indefinites and existential-presentative constructions As both Ute and Biblical Hebrew demonstrate, the pre-verbal position of the topic (SV in both, OV in Ute) is used in a flexible word-order language not only to code discontinuous definite topics, but also to mark referential in definite topics. On the other hand, Spanish, which is gravitating toward rigidification of SVO, already employs the so-called existential-presentative VS word order to introduce indefinite subjects into the register for the first time. This is a typical case of "markedness reversal",.and may be summarized schematically (excluding V-first languages) as: In V-first languages an existential-presentative construction with the verb 'be' may be used, but at least in some of them (Philippines) the subject must then be fronted, a grammatical operation that is reserved for the most discontinuous topics, such as Y-movement and cleft. Thus, consider the following from Bikol:20 (31) a. DEF-subject: nag-gadán 'ang-laláke ning-kandíng AGT-kill TOP-man ACC-goat 'The man killed a goat' INTRODUCTION 35 b. INDEF-subjecť. marái'ang-laláke nanag-gadán ning-kandíng be TOP-man SUB AGT-kill ACC-goat 'There's a man who killed a goat' 'A man killed a goat' One may of course argue that (31b) represents two separate sentences, the first one with the verb 'be' being the "presentative" sentence, the second being a relative clause. While historically that may be true, the argument may hinge on pure terminology. Within the presentative "sentence" itself, indeed one obser ves a VS word-order, characteristic of rigid word-order SVO and SOV languages. However, the verb marái 'be' is completely neutralized in terms of verb mor phology (see predictions to this effect in Givón, 1976a), so that one may just as easily call it an "indefinite subject morpheme". Further, semantically the verb 'kill' in (31b) still carries the bulk of the predicate information. And with respect to that verb, the indefinite subject is placed at a pre-verbal position. One may thus conclude that in V-first languages, where the order VS is the neutral order for highly continuous definite subjects, the same relative wordorder contrast is observed as for flexible word-order languages in (30). Which obviously makes sense in terms of coding differentiation. 9.4. Topic continuity and morphology At this point it is probably premature to make typological prediction concerning the exact role of various morphological sub-systems in the marking of topic continuity in discourse. Many such sub-systems obviously interact in intricate ways with word-order, intonation and our quantity universais in coding this rich functional domain. I have surveyed many of them elsewhere (Givón, 1978), and at this point one might simply list the main categories that tend to be involved cross-linguistically: (a) (b) (c) (d) Definite and indefinite articles Demonstratives/deictics Case-markers and other "topic markers"21 The verb 'be'/'exist' or some similar lexeme22 NOTES 1) 'Vague' and 'mysterious' are neither synonymous nor necessarily linked in a causal fashion. Under a more charitable reading of 'vague', hereby adopted, one would interpret 36 T. GIVÓN it to mean 'not fully specified', or 'leaving some details that are not yet well understood out of the description'. Under this more charitable reading, then, 'vagueness' is simply the common scientific practice of handing one's readers a blank check, with the tacit under standing ֊ or at least hope - that future research will fill in the detail. 2) The luxury of sharp, elevated perspective is seldom enjoyed by those engaged at the outlying regions of the tapestry, for which context is yet to be specified. 3) It is of course somewhat transparent that we are dealing here with another instance of the most general iconic expression in grammar, this time mandating that the most macro, hardest-to-define levels of expression receive the least overt structural coding at the lowest, clause level. While the more micro, more concrete levels - action and topic/participant, in that order - receive increasingly more detailed clause-level coding. One could add to this the ontological observation that in Pidgins and child language, the earliest grammatical coding sub-system discernible involves topic-identification, while the other two lag considerably behind (see Givón, 1979,Ch. 5, 7, as well as 1982b). 4) One may thus arrive at a concept of 'unities' startlingly akin to the classical Greek Theater's: Unity of time, place and action, to which one may add, given our somewhat expanded perspective in epistemology if not in art, the unities of theme and topics/partici pants. Most commonly, one finds tighter unities (or 'contiguities') in all five within the thematic paragraph than one finds across thematic paragraph boundaries. And this may easily develop into a heuristic test for thematic paragraphs as well as higher discourse units (see eg. Givón's Ute contribution, in this volume). 5) Ultimately, anything goes that does the job in human communication, including telepathy. The relatively predominant (and thus demonstrable) role of more measurable factors such as referential distance and potential interferance once again points to human language as a routinized system of communication, where high-probability (though not absolute!) predictions can be made, and where a rough but efficient processing system ֊ grammar - has indeed evolved. 6) There are gounds for suspecting that the value 20 clauses is actually over-estimated. Our cross-language study shows that, characteristically, the average values for the most dis continuous definite-topic devices, i.e. those used to return a topic into the register after a relatively long gap of absence, is around 15-17 clauses. This value is already itself biased up ward by the arbitrarily assigned 20 clause value. Further, in a number of languages studied here and elsewhere, the very same grammatical device - involving either word-order or morphology or both, is used to mark both definite topics returning into the register after a long gap of absence, as well as indefinites introduced for the first time. This is as strong a suggestion as one can obtain from 'purely linguistic' data that speakers/hearers tend to assign the same degree of processing difficulty to those two types of discontinuity. To the extent that this suggestion pans out, one is justified in using the average referential distance values of definites returning into the register after a long absence as (i) a rough estimation of the maximal length of the pertinent discourse register, and thus (ii) the arbitrarily as signed referential distance value of referential-indefinites. 7) If the relevant value were stretched further, say to our maximal assigned value of 20 clauses, then automatically topics with a high referential distance would show - all other things being equal - more interfering topics in the preceding register, since more clauses allow more arguments/referents, at least potentially. INTRODUCTION 37 8) This is again a rough guess about the nature of immediate memory/recall, assuming that there is a certain decay effect associated with referential distance of topics/referents. Eventually such an assumption requires experimental psycho-linguistic support. 9) The other major functional correlate being that of propositional semantics, involving the specification of the predicate type, hence action/event/state type, including most ob viously the semantic role-function of the various case-arguments vis-a-vis the predicate. For a extensive discussion, see Givón (1983). 10) As in the lexicon, a correlation exists in syntax between the phonological size of a coding device, and some functional size along a scalar dimension - provided such a dimenson indeed exists. This will be demonstrated repeatedly below. I first broke the topic identi fication coding scale into this and other sub-scales in Givón (1982d). 11) This is not, as Haiman (1982, ms) would have it, purely an economy principle, al though clearly there is some element of the law of energy conservation ֊ and thus of in ertia - in it. It reflects the need to jar the mind and attract the attention of hearers when their attention is focused elsewhere. The inertia of heares during the processing of discourse is not due to the mind's being asleep or sluggish, but rather it's being engaged elsewhere. Breaking that inertia requires more effort than going along with it. 12) See in particular Bentivoglio's paper on Spanish (in this volume). 13) An argument may be raied that Ergative languages don't really have a uniformly well-marked subject (cf. Anderson, 1976), but there are counter arguments to this analysis (cf. Givón, 1980b). In a different vein, Li and Thompson (1976) have argued that some language are 'topic prominent' rather than 'subject prominent'. The argument, however, is based only on case-marking morphology or grammatical agreement, and does not take into account actual text distribution of the 'subjects' or 'topics' in discourse. 14) Other languages where the numeral 'one' functions similarly are Creoles (Bickerton, 1975), Turkish, Sherpa, Persian, Neo-Armaic, older versions of German, English, Spanish and French, Mandarin Chinese and probably many others. Ute makes use of object suffixes and their removal (via object incorporation into the verb) to affect a similar discoursepragmatic contrast. Bemba achieves the same end via the use of the prefixintial vowel of nouns. For details see Givón (1978). 15) For further discussion of the pragmatics of referentiality, see Givón (1982b). 16) On the scale of 1.00 to 2.00, with 1.00 denoting no interference, and 2.00 denoting interference by one or more referents in the immediately preceding discourse environment. 17) In English the writing system obscures the stress difference between stressed and un stressed pronouns, but a field linguist would easily identify the English unstressed pronouns as clitics. In Ute the independent pronouns have extra phonological material in addition to stress. 18) The frequency differences merely point out the slight artificiality of cutting the continuum into only three sections. There is a good reason to believe, cf. the frequencies, that in English unstressed pronouns code portions of point A as well. The frequency of zero vs. clitic subject pronouns in Ute (Givón, in this volume) is 321 vs. 42 or roughly 8 to 1. In English the relation is reversed, with (colloquial English, Givón, in this volume) 117 zeros to 423 pronouns (mostly unstressed), or roughly a ratio of 1 to 3 in favor of the pronouns. 19) Late Biblical Hebrew lost its flexibility and became a rigid SVO language just like Spanish is currently doing (Givón, 1977). French, currently a fairly rigid SVO language, 38 T. GIVON has probably undergone a similar change, as has also Portuguese (Naro, in personal communication). There are also grounds for suspecting that Middle English had a wide range of VS/SV flexibility, currently reflected only in frozen constructions. 20) Bikol is a rigid V-first language of the Philippines, closely related to Tagalog. The data is from my own field notes, originally due to Manuel Factora (in personal communication). 21) Cf. Japanese, Korean or Amharic. But one must consider the case-marking status of subjects and direct objects in general (cf. Givón, 1981a) as part of the grammar of topic continuity, seeing that subjects tend to be the more continuous ('primary') topics, while direct objects tend to be less continuous ('secondary') topics, though obviously more con tinuous than obliques. 22) Cf. the Bikol data in (31) above. Any bona fide locative verb, such as 'be', 'stand', 'sit' 'stay', 'lie down' etc., may historically become the grammaticalized existential-presentative marker. Less common are 'appear', 'remain', 'be left', 'be put there', 'enter' etc. All these verbs may be characterized as either verbs of "being there" or of "entering into the scene". See further discussion in Givón (1976a). REFERENCES Anderson, S. (1976) "On the notion of 'subject' in Ergative languages", in C. Li (1976, ed.) Bickerton, D. (1975) "Creolization, linguistic universals, natural semantax and the brain", U. of Hawaii, Honolulu (ms) Bolinger, D. (1952) "Linear modification", in his Fonns of English, Cambridge: Harvard University Press  (1954) "Meaningful word order in Spanish", Boletín de Filología, Universidad de Chile, vol. 8 Chafe, W. (1976) "Givenness, contrastiveness, definiteness, subjects, topics and point of view", in C.Li (1976, ed,) Chafe, W. (1979) "The flow of thought and the flow of language", in T. Givón (1979, ed.) Firbas, J. (1966a) "Non-thematic subjects in contemporary English", Traveaux Linguistiques de Prague, 2 (1966b) "On defining the theme in functional sentence analysis", Traveaux Linguistiques de Prague, 1 Fox, B. (1982) "Clause linking and focus affixes in Old Javanese", UCLA(ms) Givón, T. (1975) "Promotion, accessibility and case-marking: Toward under standing grammar", Working Papers in Language Universals, vol. 19, Stanford University (1976a) "Topic, pronoun and grammatical agreement", in C. Li (1976, Ed.֊ INTRODUCTION 39 (1976b) 'On the VS word-order in Israeli Hebrew: Pragmatics and typological change", in P. Cole (ed.) Studies in Modern Hebrew Syntax and Semantics, Amsterdam: North Holland (1977) "The drift from VSO to SVO in Biblical Hebrew: The prag matics of tense-aspect", in Li (ed.) Mechanisms for Syntactic Change, Austin: University of Texas Press (1978) "Difìniteness and referentiality", in J. Greenberg (ed.) Uni versals of Human Language, vo. 4, Syntax, Stanford: Stanford University Press (1979) On Understanding Grammar, NY: Academic Press (1979 ed.) Discourse and Syntax, Syntax and Semantics, vol. 12, NY: Academic Press (1980a) "The binding hierarchy and the typology of complements", Studies in Language, 4.3 (1980b) "The drift away from ergativity in Sherpa", Folia Linguistica Historica, 1.1 (1981a) "Direct object and dative shifting: Semantic and pragmatic case", in F. Plank (ed.) Objects, NY: Academic Press (in press) (1981b) "Typology and functional domains", Studies in Language (1981c) "On the development of the numeral 'one' as an indefinite marker", Folia Linguistica Historica, 1.2 (1982a) "Tense-aspect-modality: The Creole prototype and beyond", in P. Hopper (ed.) Tense and Aspect: Between Semantics and Pragmatics, Typological Studies in Language, vol. 1, Amsterdam: J. Benjamins (1982b) "Logic vs. pragmatics, with human language as the referee: Toward an empirically viable epistemology",J. of Pragmatics, 6.2 (1982c) "Topic continuity in discourse: The functional domain of switch-reference", in J. Haiman and P. Munro (eds) Switch Reference, Typo logical Studies in Language, vol. 2, Amsterdam: J. Benjamins (in press) (1982d) "Universals of discourse structure and second language acquisi tion", in W.Rutherford (ed.) Language Universals and Second Language Acquisition, Typological Studies in Language, vol. 5, Amsterdam: J. Benja mins (in press) (1983) Syntax: A Functional-Typological Introduction, (in prepa ration) Haiman, J. (1980) Hua Grammar, Amsterdam: J. Benjamins (1982, MS) Iconicity in Language, Cambridge: Cambridge University Press (in press) 40 T. GIVON Halliday, M.A.K. (1967) "Notes on fransitivity and thema in English", J. of Linguistics, 3 Hawkinson, A. and Լ. Hyman (1974) "Natural topic hierarchies in Shona", Studies in African Linguistics, 5 Hetzron, R. (1971) "Presentative function and presentative movement", Studies in African Linguistics, supplement 2 Hinds, J. (1978, ed.) Anaphora in Discourse, Edmonton: Linguistic Research (1979) "properties of discourse structure", in T. Givón (1979 ed.) Hopper, P. (1979) "Aspect and foregrounding in discourse", in T. Givón (ed., 1979) and S. Thompson (1982) untitled paper read at the Conference on Language Universals and Second Language Acquisition, University of South ern California, February 1982 (ms) Hwang, M. (1982) "Topic continuity and discontinuity in Korean narrative", UCLA (ms) Hyman, L. (1975) "The change from SOV to SVO: Evidence from Niger-Congo", in Li (ed.) Word Order and Word Order Change, Austin: University of Texas Press Keenan, Elinor (1977) "Why look at planned and unplanned discourse?" in E. Keenan and T. Bennett (eds) Discourse Across Time and Space, SCOPIL vol. 5, Los Angeles: University of Southern California Kirsner, R. (1979) "Deixis in discourse: An explanatory quantitative study in Modern Dutch demonstrative adjectives", in T. Givón (1979, ed.) Li, C. (1976, ed.) Subject and Topic, NY: Academic Press and S. Thompson (1975) "The semantic function of word-order in Mandarin Chinese", in Li (ed.) Word Order and Word Order Change, Austin: University of Texas Press (1976) "Subject and topic: A new typology for language", in Li (1976, Ed.) (1979) "Pronouns in Mandarin Chinese discourse", in T. Givón (ed., (1979, Ed.) Linde, . (1979) "Syntax & Semantics, vol.12, Ac Press", in T. Givón (1979, Ed.) Longacre, R. (1916) Anatomy of Speech Notions, (1979) "The paragraph as a grammatical unit", in T. Givón (ed., 1979) Mahootian, S. (1979) "Given/new and definite/indefinite in Farsi", U. of Oregon, Eugene (ms) Matisoff, J. (1975) Lahu Grammar, Berkeley: U.C. Press Rude, N. (1982) "promotion and topicality of Nez Perce Objects", BLS, vol. 8, INTRODUCTION 41 Berkely: University of California (forthcoming) Studies in Nez Perce Grammar and Discourse, PhD Dissertation, University of Oregon, Eugene (ms) Shir, N. (1979) "Discourse constraints on dative movement", in T. Givón (ed.) Discourse and Syntax, Syntax and Semantics, vol. 12, NY: Academic Press. Thurman, R. (1978) Interclausal Relations in Chuave, MA Thesis, UCLA (ms) TOPIC CONTINUITY IN JAPANESE JOHN HINDS Center for English as a Second Language Penn State University University Park, Pennsylvania TABLE OF CONTENTS 1.0 Introduction 1.1 Word order 1.2 Case relationships 2.0 Grammatical devices investigated 2.1 Ellipsis (Zero anaphora) 2.1.1 Ellipsis of subject 2.1.2 Ellipsis of object 2.2 Stressed/independent pronouns 2.3 Right-dislocated definite NP (Postposed NP) 2.4 Scrambling 2.5 Postpositional particles 2.5.1 Subject/topic marking particles 2.5.2 Object marking particles 2.6 Summary of grammatical devices examined 3.0 Description of methodology 3.1 Texts 3.2 Measurements 4.0 Numerical results of measurements 4.1 Topic continuity properties of subjects 4.1.1 Distance 184.108.40.206 Momotaro 220.127.116.11 Female conversational interaction 18.104.22.168 Male conversational interactions 22.214.171.124 Generalizations about distance 4.1.2 Decay 4.2 Topic continuity properties of direct objects 4.2.1 Distance 4.2.2 Decay 4.3 Topic continuity properties of indirect objects 5.0 Discussion 5.1 Distance 5.2 Decay 47 47 48 49 49 49 50 50 52 53 53 53 55 56 57 57 58 59 59 59 59 60 64 70 71 74 74 76 77 77 77 81 46 6.0 Conclusion Notes References Appendix A Appendix Appendix J. HINDS 83 85 86 88 91 JAPANESE 47 1.0 Introduction In this chapter I investigate the ways in which referential items in general, and topics in particular, are continued or discontinued in Japanese conversa tional interaction. The primary means of indicating continued reference in Japanese is through ellipsis, although pronominal forms also play a role. Para meters discussed by Givón 1983 and in the introduction to this volume are es pecially important in assessing continuity in a variety of discourse types. These parameters introduced by Givón provide a significant improvement over earlier attempts to plot topic progression. One of the earliest attempts at plotting topic progressions is advanced by Danes 1970, working in the frame work of the Prague School. Danes discussed five basic types of topic progres sion in discourse, each a variation of the concept of communicative dynamism, where thematic elements typically precede rhematic elements. In this formula tion, sentences are divided into themes and rhemes, and a text progresses, for example, as a first rheme becomes a succeeding theme, or as a first theme is followed by a succession of different rhemes. As a descriptive statement, there is little to be said about this approach — it is correct as far as it goes. As an explanatory statement, however, there are difficulties. Despite extensive psychological evidence that there are optimal organizing frameworks in discourse [see for example McKoon 1979], the state ments of the Prague School provide no means to predict which progression is more marked than others. Givón 1983 has given an account of topic progression in discourse which provides such a prediction. He claims that discourse is built of clause-level units which (a) comprise the same theme, and (b) tend to repeat the same participant/topic continuity. In this view, topic continuity, those instances in which the same topic extends over numerous clauses, is the unmarked form. Topic change is the marked form.2 Givón 1983 further claims that there is a scale of crosslinguistic coding devices which may be used to indicate topic continuity in discourse, and these are presented in the Introduction to this volume. This scale provides a point of departure for the discussion of topic continuity in Japanese conversational inter action. Two specific grammatical features of Japanese must first be mentioned. 1.1 Word Order Japanese has a basic SOV word order, although variations occur with impunity. In addition to "scramblings" which occur in preverbal position, items may be freely "postposed" (right-dislocated) to the position following the 48 J. HINDS verb. Postposing typically requires a special intonation contour, although this is not a strict requirement [details may be found in Hinds 1982, chapter 7, Shibamoto 1982.3 1.2 Case Relationships Postpositional particles are one way to indicate case relationships. A major activity in Japanese linguistic studies has been the investigation of particle al ternations and distributions [see Kuno 1973, Kuroda 1978, Tonoike 1975-76]. In addition to particles which mark basic case relationships, there is a set of special particles which function to subdue or highlight themes [Martin 1975]. Of interest is the fact that these special particles may obliterate particles which indicate case relationships. Included in this special set are wa and nara 'topic marking particles', and mo 'too', sae 'even', and dake 'only'. Examples follow.4 (1) a. b. d. e. dare ga sushi tabemashita ka? who SM sushi OM ate-polite QU Who ate the sushi? minako-san wa fumiko-san ni hon agemashita. Ms TM Ms IO book gave-polite Minako gave Fumiko the book. asoko ni hon ga arimasu ne. there LC book SM exist-polite EM There's a book over there, isn't there. yoshi-kun wa tookyoo kara oosaka made jibun no Mr TM Tokyo SR Osaka GL self LK jitensha de ikimashita yo. bicycle IN went-polite EM Yoshi went from Tokyo to Osaka by his own bike, you know. sensei mo bum dake nomimasu ne. teacher too beer only drink-polite EM The teacher drinks only beer too, doesn't he. In Hinds 1983 I have demonstrated that there are actually five possible means to indicate case relationships in Japanese. These are (a) the use of post positional particles, (b) word order, (c) "selectionai" restrictions, (d) a saliency principle, and (e) "world knowledge". The presence of any one of these is enough to indicate case relationships, although in conversational interaction JAPANESE 49 there is usually a measure of redundancy. 2.0 Grammatical devices investigated In this section, I introduce and illustrate the grammatical devices which play a role in topic continuity. All devices occur with each case relationship. 2.1 Ellipsis (Zero anaphora) It has been show (Clancy 1980, Hinds 1978, 1983, 1982, Shibamoto 1980, 1982) that ellipsis is the unmarked form of topic continuity. The pervasiveness of this phenomenon in Japanese conversational interaction cannot be over stated. In fact, one way to understand the importance of ellipsis in Japanese is through a consideration of the frequency of its occurrence. 2.1.1 Ellipsis of Subject Martin 1975:185 cited statistics to show that grammatical subjects, for example, may be ellipted as much as 74% of the time in normal conversational interaction, and as much as 37% of the time in expository styles such as news broadcasts. Shibamoto 1982 demonstrates a difference between rates of ellipsis for males and females, with males ellipting subject noun phrases in multiparty conversational interaction approximately 61% of the time, and females approxi mately 73% of the time. Hinds 1982 supports these statistics and shows that noun phrases in all case relationships, postpositional particles, and even main verbals may be ellipted with considerable frequency. It is also shown there that ellipsis may be used to introduce a noun phrase into the disourse if certain con ditions are met. Thus, it is quite common for participants in a conversation to make reference to themselves without any overt noun phrases.5 H5. W6. doko de umaremashita? where at was-born Where were you born? ano-ne, anoo, ehime-ken no oomishima tte iu chitcha-na uh uh Ehime LK Omishima QT say small shima de umaremashita. island at was-born Uh, let's see, I was born on a small island in Ehime Prefecture called Omishima. It is also important to note that, unlike many other languages which allow 50 J. HINDS ellipsis in subject position, Japanese has no marking on the verb to provide any clues to the identity of the ellipted subject.6 The. following example, decontextualized, could refer to the actions of a first, second, or third person subject, singular or plural. W9. mm, soko wa umareta dake de, there TP was-born only copula No, I was only born there, and then This example actually referred to the actions of the speaker, but in other contexts, the one who was born there could be the addressee, or some other person. 2.1.2 Ellipsis of Object In contrast to the situation described by Givón for Ute, object ellipsis also occurs with considerable frequency in Japanese conversational interaction. Shibamoto 1980, for instance, reports that the objects of verbal predicates are ellipted as often as 67% of the time. Either animate or inanimate objects may be ellipted, as the following examples demonstrate. A135. A71. A72. nonde-ru to omou n da kedo. taking QT think nom cop but I think she was taking them, but, tada, onna-no-ko wa ippen dekiru to just girl TM once make when I think that once a girl gets pregnant [makes a baby] sodatetaku-natchau ո ja-nai ka. want-to-raise nom neg-tag QU she wants to have the baby. 2.2 Stressed/independen Pronouns In present day Japanese, there is little, if anything, to differentiate "pro nouns" from nouns syntactically (despite the claims of Hinds 1971), although there are clearcut distributional differences (see below). There are a large number of pronominal elements available to speakers, and a representative list of singular forms is indicated in the following chart. See Hinds 1978:138ff for a discussion of plurality. JAPANESE I MALE FEMALE SEX-NEUTRAL 51 boku ore atashi atakushi watashi watakushi II SEX-NEUTRAL anata kimi anta omae III MALE FEMALE INANIMATE kare kanoj o kore 'this' sore 'that' are 'that over there' Historically, all of these forms are derived from nouns, and they function syntactically as nouns do in the present day language. For example, it is relative ly common for these pronominal forms to have modifiers or determiners pre ceding them. Thus, the following examples all have determiners modifying the head nouns. A80 and A81 have pronouns as head nouns, while A1O and A67 have nouns. A1O. A67. da-kedo ne, da-kedo, boku wa, boku wa, sono onna-no-ko but EM but I TM I TM that girl baka da to omotte nee. fool cop QT think-and EM But, uh, but, I, I think that girl's stupid, and un, sore ne, da-kara, ippen sono banii no yatsu wa shikago un that EM so once that Bunny LK guy TM Chicago ni ai ni itte, ne, biru ni ai ni itte, to meet to go-and EM Bill to meet to go-and Urn, about that, one time that Bunny went to Chicago to see him, to see Bill, and 52 A80. A81. J. HINDS da-kedo,ima, sono kanoj ne, , kookai, kono-mae but now that she too EM uh regret recently tegami ni kaite kite ne, yappari, kookai letter in write-and come-and EM expectedly regret shite ne. do-and EM But, now, that girl too, uh, she regrets, she wrote in a letter recently that she regrets, and sono kernojo, atama ii n da yo, sugoku. that she head good nom cop EM very That girl is smart, you know, very. The forms kare and kanojo, used currently as third person pronouns, owe some part of their frequency of usage to influences from western languages and the requirement there that all sentences have an expressed subject. The use of these forms is further complicated in that they also mean 'boyfriend' and 'girl friend', respectively (see Hinds 1975 for a complete discussion). Data adduced later will provide some of the feel for these observations — traditional narratives never employ the forms kare and kanojo, while participants in normal conver sational interaction use them in accordance with their own perceptions of good usage. 2.3 Right-dislocated definite NP (Postposed NP) Subsequent discussion does not include reference to right-dislocated NP (postposed NP), and so a brief explication of this phenomenon is offered here. Considerable effort has been spent in attempting to describe the characteristics of postposed constructions in Japanese (the most recent being Hinds 1982, chapter 7). It is an important construction since it appears to figure in typologi cal word order change, creating either verb medial or verb initial typologies from verb final (see Givón this volume for additional comments). The existence of this phenomenon has never been questioned. Peng 1977 claims, for instance, that postposing occurs in 9.2% of all conversational utter ances. Shibamoto 1982 finds that females postpose elements 12.7% of the time, while males postpose 5% of the time. Clancy 1980:167, while not citing per centages, states: Another device, which appears to function at least partly to clarify cases of elliptical switch reference, is the use of a postposed sub ject . . . JAPANESE 53 The data base examined in this study, however, simply does not have enough instances of postposing to warrant detailed discussion or conjecture. There are no instances of postposing in the narrative Momotaro, understandable since this is a planned rather than spontaneous performance. In all, 567 clauses in three interactions have been examined, and only 5 postposed constructions (0.9%) were found. Of these, three involved reference to the speaker herself, and two others were at transition points, or boundaries, in the conversation. 2.4 Scrambling Another device which deserves mention, but which will not receive detailed examination is "scrambling", the mutation of the basic SOV word order. Again, although this phenomenon has been insightfully discussed in the literature [Shibamoto 1980, 1982], the total percentage of scrambling in the current data base does not make it worth pursuing. Of 567 clauses, 8 (1.4%) evidenced a scrambled word order. 2.5 Postpositional particles 2.5.1 Subject/Topic Marking Particles In this section I discuss a number of issues in the realm of highlighting, focussing on the contrast between noun phrases marked by wa and noun phrases marked by ga. I attempt to do this without becoming embroiled in terminologi cal and theoretical matters which surround this issue, and which have little to do with how noun phrases marked by these particles function in topic continu ance. The clearest statement on the difference in meaning for noun phrases marked by these two particles comes from pedagogical grammars of Japanese. Alfonso 1966:973, for instance, states what he terms a 'fundamental rule' for the distinction between these two particles. Use WA when you introduce a topic or when the topic IS KNOWN ALREADY and you want to direct the other's attention to what FOLLOWS.8 Use GA to mark the subject when WHAT FOLLOWS is already known and you want to draw attention to the SUBJECT ITSELF. This contrast is illustrated in Jorden 1954:43 with the following types of examples. 54 (2) (3) J. HINDS kore ga akai desu. this SM red cop THIS is red (tells which one is red). kore wa akai desu. this TM red cop This is RED (tells what color this is). The distinction is frequently sp