ASSESSING GRAMMAR
Differing notions of ‘grammar’ for assessment
Introduction
The study of grammar has had a long and important role in the history of second language and foreign language teaching. For centuries, to learn another language, or what I will refer to generically as an L2, meant to know the grammatical structures of that language and to cite prescriptions for its use. Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be sufficient for learners to actually acquire another language (Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study – to the extent that in the Middle Ages in Europe, it was thought to be the foundation of all knowledge and the gateway to sacred and secular understanding (Hillocks and Smith, 1991). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century. Even a few decades ago, it would have been hard to imagine language instruction without immediately thinking of grammar.
What is meant by ‘grammar’ in theories of language?
Grammar and linguistics
When most language teachers, second language acquisition (SLA) researchers and language testers think of ‘grammar’, they call to mind one of the many paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for the study and analysis of language. Such linguistic grammars are typically derived from data taken from native speakers and minimally constructed to describe well-formed utterances within an individual framework. These grammars strive for internal consistency and are mainly accessible to those who have been trained in that particular paradigm.
Since the 1950s, there have been many such linguistic theories – too numerous to list here – that have been proposed to explain language phenomena. Many of these theories have helped shape how L2 educators currently define grammar in educational contexts. Although it is beyond the purview of this book to provide a comprehensive review of these theories, it is, nonetheless, helpful to mention a few, considering both the impact they have had on L2 education and the role they play in helping define grammar for assessment purposes.
These two views of linguistic analysis have been instrumental in determining how grammar has been conceptualized in L2 classrooms in recent years. They have also influenced definitions of L2 grammar for assessment purposes. I will now provide a brief overview of some of the more influential linguistic theories that typify the syntactocentric and communicative views of language.
Form-based perspectives of language
One of the oldest theories to describe the structure of language is traditional grammar. Originally based on the study of Latin and Greek, traditional grammar drew on data from literary texts to provide rich and lengthy descriptions of linguistic form. Unlike some other syntactocentric theories, traditional grammar also revealed the linguistic meanings of these forms and provided information on their usage in a sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied an extensive set of prescriptive rules along with the exceptions. A typical rule in a traditional English grammar might be:
The first-person singular of the present tense verb ‘to be’ is ‘I am’. ‘Am’ is used with ‘I’ in all cases, except in first-person singular negative tag and yes/no questions, which are contracted. In this case, the verb ‘are’ is used instead of ‘am’. For example, ‘I’m in a real bind, aren’t I?’ or ‘Aren’t I trying my best?’
Form- and use-based perspectives of language
The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote different meanings, we might turn to those theories that study grammatical form and use interfaces. This would address questions such as: Why does a language need two or more structures that are similar in meaning? Are similar forms used to convey different specialized meanings? To what degree are similar forms a function of written versus spoken language, or to what degree are these forms characteristic of a particular social group or a specific situation? It is important for us to discuss these questions briefly if we ultimately wish to test grammatical forms along with their meanings and uses in context.
One approach to linguistic analysis that has contributed greatly to our understanding of the grammatical forms found in language use, as well as the contextual factors that influence the variability of these forms, is corpus linguistics. I will briefly describe corpus linguistics along with how findings from this approach can be useful for assessing grammar.
The common practice of compiling linguistic corpora, or large and principled collections of natural spoken and written texts, in order to analyze by computer patterns of language use in large databases of authentic texts has led to a relatively new field known as ‘corpus linguistics’. Not a theory of language per se, corpus linguistics embodies a suite of tools and methods designed to provide a source of evidence so that linguistic data can be analyzed distributionally – that is, to show how often and where a linguistic form occurs in spoken or written text. According to Biber, Conrad and Reppen (1998), these analyses typically focus on two concerns. One type of study examines the use of one linguistic feature (i.e., a lexical item or grammatical structure) in comparison with another. For example, corpus-based studies might examine the different uses of would. These studies might also compare the word wish with that clauses and to-infinitives, or they might examine a linguistic feature with a non-linguistic feature, such as gender, dialect or setting.
Communication-based perspectives of language
Other theories have provided grammatical insights from a communication based perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora. Rather, a communication-based perspective views grammar as a set of linguistic norms, preferences and expectations that an individual invokes to convey a host of pragmatic meanings that are appropriate, acceptable and natural depending on the situation. The assumption here is that linguistic form has no absolute, fixed meaning in language use (as seen in sentences 1.5 and 1.7 above), but is mutable and open to interpretation by those who use it in a given circumstance. Grammar in this context is often co-terminous with language itself, and stands not only for form, but also for meaningfulness and pragmatic appropriacy, acceptability or naturalness – a topic I will return to later since I believe that a blurring of these concepts is misleading and potentially problematic for language educators.
What is pedagogical grammar?
A pedagogical grammar represents an eclectic, but principled description of the target-language forms, created for the express purpose of helping teachers understand the linguistic resources of communication. These grammars provide information about how language is organized and offer relatively accessible ways of describing complex, linguistic phenomena for pedagogical purposes.
Research on L2 grammar teaching, learning and assessment
Research on L2 teaching and learning
Over the years, several of the questions mentioned above have intrigued language teachers, inspiring them to experiment with different methods, approaches and techniques in the teaching of grammar. To determine if students had actually learned under the different conditions, teachers have used diverse forms of assessment and drawn their own conclusions about their students. In so doing, these teachers have acquired a considerable amount of anecdotal evidence on the strengths and weaknesses of using different practices to implement L2 grammar instruction. These experiences have led most teachers nowadays to ascribe to an eclectic approach to grammar instruction, whereby they draw upon a variety of different instructional techniques, depending on the individual needs, goals and learning styles of their students.
Comparative methods studies
The comparative methods studies sought to compare the effects of different language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from are action to the grammar-translation method, which had dominated language instruction during the first half of the twentieth century. More generally, these studies were in reaction to form-focused instruction (referred to as ‘focus on forms’ by Long, 1991), which used a traditional structural syllabus of grammatical forms as the organizing principle for L2 instruction. According to Ellis (1997), form-focused instruction contrasts with meaning-focused instruction in that meaning-focused instruction emphasizes the communication of messages (i.e., the act of making a suggestion and the content of such a suggestion) while form focused instruction stresses the learning of linguistic forms. These can be further contrasted with form-and-meaning focused instruction (referred to by Long (1991) as ‘focus-on-form’), where grammar instruction occurs in a meaning-based environment and where learners strive to communicate meaning while paying attention to form. (Note that Long’s version of ‘focus-on-form’ stresses a meaning orientation with an incidental focus on forms.) These comparative methods studies all shared the theoretical premise that grammar has a central place in the curriculum, and that successful learning depends on the teaching method and the degree to which that promotes grammar processing.
Non-interventionist studies
While some language educators were examining different methods of teaching grammar in the 1960s, others were feeling a growing sense of dissatisfaction with the central role of grammar in the L2 curriculum. As a result, questions regarding the centrality of grammar were again raised by a small group of L2 teachers and syllabus designers who felt that the teaching of grammar in any form simply did not produce the desired classroom results. Newmark (1966), in fact, asserted that grammatical analysis and the systematic practice of grammatical forms were actually interfering with the process of L2 learning, rather than promoting it, and if left uninterrupted, second language acquisition, similar to first language acquisition, would proceed naturally.
At the same time, the role of grammar in the L2 curriculum was also being questioned by some SLA researchers (e.g., Dulay and Burt, 1973; Bailey, Madden and Krashen, 1974) who had been studying L2 learning in instructed and naturalistic settings. In their attempts to characterize the L2 learner’s inter language at one or more points along the path toward target-like proficiency, several researchers came to similar conclusions about L2 development. They found that instead of making incremental leaps in grammatical ability through an accumulation of grammatical forms, as presented in a traditional grammar syllabus, learners in both instructed and naturalistic settings acquired the target structures in a relatively fixed order (Ellis, 1994) regardless of when they were introduced. For example, Krashen (1977) claimed that, in general, ESL learners first acquire the -ing affix, plural markings and the copula (stage 1), and then the auxiliary and the articles (stage 2). This is followed by the irregular past verb forms (stage 3) and finally, the regular past, the third-person singular affix and the possessive -s affix (stage 4). While this information is interesting, research findings involve only a skeletal list of the possible grammar points that any typical curriculum would encompass. As a result, we might wonder how this order will change if other grammar points are investigated at the same time. Also, we have no idea how this order would hold for many other languages.
Empirical studies in support of non-intervention
The non-interventionist position was examined empirically by Prabhu (1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication. In the CTP, Prabhu (1987) argued against the notion that the development of grammatical ability depended on a systematic presentation of grammar followed by planned practice. However, in an effort to evaluate the CTP program, Beretta and Davies (1985) compared classes involved in the CTP with classes outside the project taught with a structural-oral-situational method. They administered a battery of tests to the students, and found that the CTP learners outperformed the control group on a task-based test, whereas the non-CTP learners did better on a traditional structure test. These results lent partial support to the non-interventionist position by showing that task-based classrooms based on meaningful communication can also be effective in promoting SLA. However, these results also showed that again students do best when they are taught and tested in similar ways.
Possible implications of fixed developmental order to language assessment
The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably have some relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s inter language. In other words, information on the acquisitional order of grammatical items could conceivably serve as a basis for selecting grammatical content for tests that aim to measure different levels of developmental progression, such as Chang (2002, 2004) did in examining the underlying structure of a test that attempted to measure knowledge of the relative clauses. These findings also suggest a substantive approach to defining test tasks according to developmental order and sequence on the basis of how grammatical features are acquired over time (Ellis, 2001b). In other words, one task could potentially tap into developmental level one, while another taps into developmental level two, and so forth.
To illustrate, grammar tests targeting beginning English-language learners often include questions on the articles and the third-person singular -s affix, two features considered to be ‘very challenging’ from an acquisitional perspective. Since, according to these findings, no beginning learner would be expected to have target-like control of these particular grammatical items, the inclusion of these grammatical features in a beginning classroom achievement test might be questionable. However, the inclusion of these items in a placement test would be highly appropriate since the goal of placement assessment is to identify a wide range of ability levels so that developmentally homogeneous groups can be formed.
Problems with the use of development sequences as a basis for assessment
Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, I believe this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing. This is because our understanding of natural acquisitional sequences is incomplete and at too early a stage of research to be the basis for concrete assessment recommendations (Lightbown, 1985; Hudson, 1993). First, the number of grammatical sequences that show a fixed order of acquisition is very limited, far too limited for all but the most restricted types of grammar tests. For example, what is the order for acquiring the modals, the conditionals, or the infinitive or gerund complements? Second, much of the research on acquisitional sequences is based on data from naturalistic settings, where students are provided with considerable exposure to the language. We have yet to learn about how these sequences hold for students whose only exposure to a language is an L2 classroom. Furthermore, acquisitional sequences make reference only to linguistic forms; no reference is made to how these forms interact with the conveyance of literal and implied meanings associated with a specific context. Third, as the rate (not the route) of acquisition appears to be influenced by the learner’s first language and by exposure to other languages, we need to understand how these factors might impact on development rates and how we would reconcile this if we wished to test heterogeneous groups of language learners. Finally, as the developmental levels represent an ordering of grammatical rules during acquisition, this may or may not be on the same measurement scale as accuracy scores. Thus, until further research demonstrates the precise relationship between these scales, we should be careful about comparisons between proficiency levels based on accuracy scales and levels of inter language development. In the end, it is premature to apply the findings from acquisitional sequences research to language assessment given our current level of understanding of developmental sequences.
Interventionist studies
Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain, 1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so. Testimony to this is the large number of non-native speakers who emigrate to countries around the world, live there all their lives and fail to learn the target language, or fail to learn it well enough to realize their personal, social and long-term career goals. In these situations, language teachers affirm that formal grammar instruction of some sort can be of benefit. Furthermore, most language teachers would contend that explicit grammar instruction, including systematic error correction and other instructional techniques, contributes immensely to their students’ linguistic development. Finally, despite the non-interventionist recommendations toward grammar teaching, I believe grammar still plays an important role in most L2 classrooms around the world.
Empirical studies in support of intervention
Aside from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn (1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction. They found that when coupled with meaning-focused instruction, the formal instruction of grammar appears to be more effective than exposure to meaning or form alone. Long (1991) also argued for a focus on both meaning and form in classrooms that are organized around meaningful and sustained communicative interaction.
Research on instructional techniques and their effects on acquisition
Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside the purview of this book (see Ellis, 1997; Doughty and Williams, 1998), these techniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000).
Grammar processing and second language development
In the grammar-learning process, explicit grammatical knowledge refers to a conscious knowledge of grammatical forms and their meanings. Explicit knowledge is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b). DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it involves the explanation of a rule or the request to focus on a grammatical feature. Instruction can be explicitly deductive, where learners are given rules and asked to apply them, or explicitly inductive, where they are given samples of language from which to generate rules and make generalizations. Similarly, many types of language test tasks (i.e., gap-filling tasks) seem to measure explicit grammatical knowledge.
Implicit grammatical knowledge refers to ‘the knowledge of a language that is typically manifest in some form of naturally occurring language behavior such as conversation’ (Ellis, 2001b, p. 252). In terms of processing time, it is unconscious and is accessed quickly. DeKeyser (1995) classifies grammatical instruction as implicit when it does not involve rule presentation or a request to focus on form in the input; rather, implicit grammatical instruction involves semantic processing of the input with any degree of awareness of grammatical form. The hope, of course, is that learners will ‘notice’ the grammatical forms and identify form–meaning relationships so that the forms are recognized in the input and eventually incorporated into the interlanguage. This type of instruction occurs when learners are asked to listen to a passage containing a specific grammatical feature. They are then asked to answer comprehension questions, but not asked to attend to the feature. Similarly, language test tasks that require examinees to engage in interactive talk might also be said to measure implicit grammatical knowledge.
Implications for assessing grammar
The studies investigating the effects of teaching and learning on grammatical performance present a number of challenges for language assessment. First of all, the notion that grammatical knowledge structures can be differentiated according to whether they are fully automatized (i.e., implicit) or not (i.e., explicit) raises important questions for the testing of grammatical ability (Ellis, 2001b). Given the many purposes of assessment, we might wish to test explicit knowledge of grammar, implicit knowledge of grammar or both. For example, in certain classroom contexts, we might want to assess the learners’ explicit knowledge of one or more grammatical forms, and could, therefore, ask learners to answer multiple-choice or short-answer questions related to these forms.
The role of grammar in models of communicative language ability
The role of grammar in models of communicative competence
Every language educator who has ever attempted to measure a student’s communicative language ability has wondered: ‘What exactly does a student need to “know” in terms of grammar to be able to use it well enough for some real-world purpose?’ In other words, they have been faced with the challenge of defining grammar for communicative purposes. To complicate matters further, linguistic notions of grammar have changed over time, as we have seen, and this has significantly increased number of components that could be called ‘grammar’. In short, definitions of grammar and grammatical knowledge have changed over time and across context, and I expect this will be no different in the future.
Rea-Dickins’ definition of grammar
In discussing more specifically howgrammatical knowledge might be tested within a communicative framework, Rea-Dickins (1991) defined ‘grammar’ as the single embodiment of syntax, semantics and pragmatics. She argued against Canale and Swain’s (1980) and Bachman’s (1990b) multi-componential view of communicative competence on the grounds that componential representations overlook the interdependence and interaction between and among the various components. She further stated that in Canale and Swain’s (1980) model, the notion of grammatical competence was limited since it defined grammar as ‘structure’ on the one hand and as ‘structure and semantics’ on the other, but ignored the notion of ‘structure as pragmatics’. Similarly, she added that in Bachman’s (1990b) model, grammar was defined as structure at the sentence level and as cohesion at the suprasentential level, but this model failed to account for the pragmatic dimension of communicative grammar.
Larsen-Freeman’s definition of grammar
Another conceptualization of grammar that merits attention is LarsenFreeman’s (1991, 1997) framework for the teaching of grammar in com- municative language teaching contexts. Drawing on several linguistic theories and influenced by language teaching pedagogy, she has also characterized grammatical knowledge along three dimensions: linguistic form, semantic meaning and pragmatic use. Form is defined as both morphology, or how words are formed, and syntactic patterns, or how words are strung together. This dimension is primarily concerned with linguistic accuracy. The meaning dimension describes the inherent or literal message conveyed by a lexical item or a lexico-grammatical feature. This dimension is mainly concerned with the meaningfulness of an utterance. The use dimension refers to the lexico-grammatical choices a learner makes to communicate appropriately within a specific context. Pragmatic use describes when and why one linguistic feature is used in a given context instead of another, especially when the two choices convey a similar literal meaning. In this respect, pragmatic use is said to embody presuppositions about situational context, linguistic context, discourse context, and sociocultural context. This dimension is mainly concerned with making the right choice of forms in order to convey an appropriate message for the context.
What is meant by ‘grammar’ for assessment purposes?
Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes. With this goal in mind, we need a definition of grammatical knowledge that is broad enough to provide a theoretical basis for the construction and validation of tests in a number of contexts. At the same time, we need our definition to be precise enough to distinguish it from other areas of language ability.
From a theoretical perspective, the main goal of language use is communication, whether it be used to transmit information, to perform transactions, to establish and maintain social relations, to construct one’s identity or to communicate one’s intentions, attitudes or hypotheses. Being the primary resource for communication, language knowledge consists of grammatical knowledge and pragmatic knowledge. Therefore, I propose a theoretical definition of language knowledge that consists of two distinct, but related, components.
Towards a definition of grammatical ability
Defining grammatical constructs
Although our basic underlying model of grammar will remain the same in all testing situations (i.e., grammatical form and meaning), what it means to ‘know’ grammar for different contexts will most likely change (see Chapelle, 1998). In other words, the type, range and scope of grammatical features required to communicate accurately and meaningfully will vary from one situation to another. For example, the type of grammatical knowledge needed to write a formal academic essay would be very different from that needed to make a train reservation. Given the many possible ways of interpreting what it means to ‘know’ grammar, it is important that we define what we mean by ‘grammatical knowledge’ for any given testing situation. A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability.
One of the first steps in designing a test, aside from identifying the need for a test, its purpose and audience, is to provide a clear theoretical definition of the construct(s) to be measured. If we have a theoretically sound, as well as a clear and precise definition of grammatical knowledge, we can then design tasks to elicit performance samples of grammatical ability. By having the test-takers complete grammar tasks, we can observe – and score – their answers with relation to specific grammaticalcriteria for correctness. If these performance samples reflect the under-lying grammatical constructs – an empirical question – we can then use the test results to make inferences about the test-takers’ grammatical ability. These inferences, in turn, may be used to make decisions about the test-takers (e.g., pass the course). However, we need first to provide evidence that the tasks on a test have measured the grammatical constructs we have designed them to measure (Messick, 1993). The process of providing arguments in support of this evidence is called validation, and this begins with a clear definition of the constructs.
What is ‘grammatical ability’ for assessment purposes?
The approach to the assessment of grammatical ability in this book is based on several specific definitions. First, grammar encompasses grammatical form and meaning, whereas pragmatics is a separate, but related, component of language. A second is that grammatical knowledge, along with strategic competence, constitutes grammatical ability. A third is that grammatical ability involves the capacity to realize grammatical knowledge accurately and meaningfully in test-taking or other language-use contexts. The capacity to access grammatical knowledge to understand and convey meaning is related to a person’s strategic competence. It is this interaction that enables examinees to implement their grammatical ability in language use. Next, in tests and other language-use contexts, grammatical ability may interact with pragmatic ability (i.e., pragmatic knowledge and strategic competence) on the one hand, and with a host of non-linguistic factors such as the test-taker’s topical knowledge, personal attributes, affective schemata and the characteristics of the task on the other. Finally, in cases where grammatical ability is assessed by means of an interactive test task involving two or more interlocutors, the way grammatical ability is realized will be significantly impacted by both the contextual and the interpretative demands of the interaction.
The components of grammatical knowledge
Knowledge of phonological or graphological form and meaning
Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations.
Knowledge of lexical form and meaning
Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that reveal meaning. This includes words that mark gender (e.g., waitress), countability (e.g., people) or part of speech (e.g., relate, relation). For example, when the word think in English is followed by the preposition about before a noun, this is considered the grammatical dimension of lexis, representing a co-occurrence restriction with prepositions. One area of lexical form that poses a challenge to learners of some languages is word formation. This includes compounding in English with a noun + noun or a verb + particle pattern (e.g., fire escape; breakup) or derivational affix-ation in Italian (e.g., ragazzino ‘little kid’, ragazzone ‘big kid’). For example, a student who says ‘a teacher of chemistry’ instead of ‘chemistry teacher’ or ‘*this people’ would need further instruction in lexical form.
Knowledge of morphosyntactic form and meaning
Knowledge of morphosyntactic form permits us to understand and produce both the morphological and syntactic forms of the language. This includes the articles, prepositions, pronouns, affixes (e.g., -est), syntactic structures, word order, simple, compound and complex sentences, mood, voice and modality. A learner who knows the morphosyntactic form of the English conditionals would know that: (1) an if-clause sets up a condition and a result clause expresses the outcome; (2) both clauses can be in the sentence-initial position in English; (3) if can be deleted under certain conditions as long as the subject and operator are inverted; and (4) certain tense restrictions are imposed on if and result clauses.
Knowledge of cohesive form and meaning
Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and express cohesion on both the sentence and the discourse levels. Cohesive form is directly related to cohesive meaning through cohesive devices (e.g., she, this, here) which create links between cohesive forms and their referential meanings within the linguistic environment or the surrounding co-text. Halliday and Hasan (1976, 1989) list a number of grammatical forms for displaying cohesive meaning.
Knowledge of information management form and meaning
Knowledge of information management form allows us to use linguistic forms as a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures. These forms are used to create information management meaning.
Knowledge of interactional form and meaning
Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-ininteraction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions.
Designing test tasks to measure L2 grammatical ability
How does test development begin?
Every grammar-test development project begins with a desire to obtain (and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996). Within these situations, the tasks or activities requiring language to achieve a communicative goal are called the target language use tasks. A TLU task is one of many languageuse tasks that test-takers might encounter in the target language use domain. It is to this domain that language testers would like to make inferences about language ability, or more specifically, about grammatical ability.
What do we mean by ‘task’?
The notion of ‘task’ in language-learning contexts has been conceptualized in many different ways over the years. Traditionally, ‘task’ has referred to any activity that requires students to do something for the intent purpose of learning the target language. A task then is any activity (i.e., short answers, role-plays) as long as it involves a linguistic or nonlinguistic (circle the answer) response to input. Traditional learning or teaching tasks are characterized as having an intended pedagogical purpose – which may or may not be made explicit; they have a set of instructions that control the kind of activity to be performed; they contain input (e.g., questions); and they elicit a response. More recently, learning tasks have been characterized more in terms of their communicative goals, their success in eliciting interaction and negotiation of meaning, and their ability to engage learners in complex meaningfocused activities (Nunan, 1989, 1993; Berwick, 1993; Skehan, 1998).
What are the characteristics of grammatical test tasks?
As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to the differences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests. As all language teachers know, the kinds of tasks we use in tests and their quality can greatly influence how students will perform. Therefore, given the role that the effects of task characteristics play on performance, we need to strive to manage (or at least understand) the effects of task characteristics so that they will function the way we designed them to – as measures of the constructs we want to measure (Douglas, 2000). In other words, specifically designed tasks will work to produce the types of variability in test scores that can be attributed to the underlying constructs given the contexts in which they were measured (Tarone, 1998). To understand the characteristics of test tasks better, we turn to Bachman and Palmer’s (1996) framework for analyzing target language use tasks and test tasks.
The Bachman and Palmer framework
Bachman and Palmer’s (1996) framework of task characteristics represents the most recent thinking in language assessment of the potential relationships between task characteristics and test performance. In this framework, they outline five general aspects of tasks, each of which is characterized by a set of distinctive features. These five aspects describe characteristics of (1) the setting, (2) the test rubrics, (3) the input, (4) the expected response and (5) the relationship between the input and response.
Describing grammar test tasks
When language teachers consider tasks for grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. We also need to remember that the tasks we include on tests should strive to match the types of language-use tasks found in real-life or language instructional domains.
In designing grammar tests, we need to be familiar with a wide range of activities to elicit grammatical performance. In the rest of the chapter, I will describe several tasks in light of how they can be used to measure grammatical knowledge. I will use the Bachman and Palmer framework as a guide for task specification in this discussion.
Selected-response task types
Selected-response tasks present input in the form of an item, and testtakers are expected to select the response. Other than that, all other task characteristics can vary. For example, the form of the input can be language, non-language or both, and the length of the input can vary from a word to larger pieces of discourse. In terms of the response, selectedresponse tasks are intended to measure recognition or recall of grammatical form and/or meaning. They are usually scored right/wrong, based on one criterion for correctness; however, in some instances, partial-credit scoring may be useful, depending on how the construct is defined. Finally, selected-response tasks can vary in terms of reactivity, scope and directness.
Limited-production task types
Limited-production tasks present input in the form of an item with language and/or non-language information that can vary in length or topic. Different from selected-response tasks, limitedproduction tasks elicit a response embodying a limited amount of language production. The length of this response can be anywhere from a word to a sentence. All task characteristics in limited-production tasks can vary with the exception of two: the type of input (always an ‘item’) and the type of expected response (always ‘limited-production’).
Limited-production tasks are intended to assess one or more areas of grammatical knowledge depending on the construct definition. Unlike selected-response items, which usually have only one possible answer, the range of possible answers for limited-production tasks can, at times, be large – even when the response involves a single word.
Developing tests to measure L2 grammatical ability
What makes a grammar test ‘useful’?
Score-based inferences from grammar tests can be used to make a variety of decisions. For example, classroom teachers use these scores as a basis for making inferences about learning or achievement. These inferences can then serve to provide feedback for learning and instruction, assign grades, promote students to the next level, or even award a certificate. They can also be used to help teachers or administrators make decisions about instruction or the curriculum.
The information derived from language tests, of which grammar tests are a subset, can be used to provide test-takers and other test-users with formative and summative evaluations. Formative evaluation relating to grammar assessment supplies information during a course of instruction or learning on how test-takers might increase their knowledge of grammar, or how they might improve their ability to use grammar in communicative contexts. It also provides teachers with information on how they might modify future instruction or fine-tune the curriculum. For example, feedback on an essay telling a student to review the passive voice would be formative in nature. Summative evaluation provides test stakeholders with an overall assessment of test-taker performance related to grammatical ability, typically at the end of a program of instruction. This is usually presented as a profile of one or more scores or as a single grade.
Score-based inferences from grammar tests can also be used to make, or contribute to, decisions about program placement. This information provides a basis for deciding how students might be placed into a level of a language program that best matches their knowledge base, or it might determine whether or not a student is eligible to be exempted from further L2 study. Finally, inferences about grammatical ability can make or contribute to other high-stakes decisions about an individual’s readiness for learning or promotion, their admission to a program of study, or their selection for a job.
Given the goals and uses of tests in general, and grammar tests in particular, it is fitting to ask how we might actually know if a test is, indeed, able to elicit scorable behaviors from which to make trustworthy and meaningful inferences about an individual’s ability. In other words, how do we know if a grammar test is ‘good’ or ‘useful’ for our particular context?
Many language testers (e.g., Harris, 1969; Lado, 1961) have addressed this question over the years. Most recently, Bachman and Palmer (1996) have proposed a framework of test usefulness by which all tests and test tasks can be judged, and which can inform test design, development and analysis. They consider a test ‘useful’ for any particular testing situation to the extent that it possesses a balance of the following six complementary qualities: reliability, construct validity, authenticity, interactiveness, impact and practicality. They further maintain that for a test to be ‘useful’, it needs to be developed with a specific purpose in mind, for a specific audience, and with reference to a specific target language use (TLU) domain.
Overview of grammar-test construction
Bachman and Palmer (1996) organize test development into three stages: design, operationalization and administration. I will discuss each of these stages in the process of describing grammar-test development.
Stage 1: Design
The design stage of test development involves the accumulation of information and making initial decisions about the entire test process. In tests involving one class, this may be a relatively informal process; however, in tests involving wider audiences, such as a joint final exam or a placement test, the decisions about test development must be discussed and negotiated with several stakeholders. The outcome of the design stage is a design statement. According to Bachman and Palmer (1996, p. 88), this document should contain the following components:
- a description of the purpose(s) of the test,
- a description of the TLU domains and task types,
- a description of the test-takers,
- a definition of the construct(s) to be measured,
- a plan for evaluating test usefulness, and
- a plan for dealing with resources.
Stage 2: Operationalization
The operationalization stage of grammar-test development describes how an entire test involving several grammar tasks is assembled, and how the individual tasks are specified, written and scored.
- Specifying the scoring method
- Scoring selected-response tasks
- Scoring extended-production tasks
- Using scoring rubrics
- Grading
Stage 3: Test administration and analysis
The final stage in the process of developing grammar tests involves the administration of the test to individual students or small groups, and then to a large group of examinees on a trial basis.
Illustrative tests of grammatical ability
The First Certificate in English Language Test (FCE)
Given the assessment purposes and the intended uses of the FCE, the FCE grammar assessments privilege construct validity, authenticity, interactiveness and impact. This is done by the way the construct of grammatical ability is defined. This is also done by the ways in which these abilities are tapped into, and the ways in which the task characteristics are likely to engage the examinee in using grammatical knowledge and other components of language ability in processing input to formulate responses. Finally, this is done by the way in which Cambridge ESOL has promoted public understanding of the FCE, its purpose and procedures, and has made available certain kinds of information on the test. These qualities may, however, have been stressed at the expense of reliability.
The Comprehensive English Language Test (CELT)
In terms of the purposes and intended uses of the CELT, the authors explicitly stated, ‘the CELT is designed to provide a series of reliable and easy-to-administer tests for measuring English language ability of nonnative speakers’ (Harris and Palmer, 1970b, p. 1). As a result, concerns for high reliability and ease of administration led the authors to make choices privileging reliability and practicality over other qualities of test usefulness. To maximize consistency of measurement, the authors used only selected-response task types throughout the test, allowing for minimal fluctuations in the scores due to characteristics of the test method. This allowed them to adopt ‘easy-to-administer’ and ‘easy-toscore’ procedures for maximum practicality and reliability. Reliability Illustrative tests of grammatical ability 201was also enhanced by pre-testing items with the goal of improving their psychometric characteristics.
Reliability might have been emphasized at the expense of other important test qualities, such as construct validity, authenticity, interactiveness and impact. For example, construct validity was severely compromised by the mismatch among the purpose of the test, the way the construct was defined and the types of tasks used to operationalize the constructs. In short, scores from discrete-point grammar tasks were used to make inferences about speaking ability rather than make interpretations about the test-takers’ explicit grammatical knowledge.
Finally, authenticity in the CELT was low due to the exclusive use of multiple-choice tasks and the lack of correspondence between these tasks and those one might encounter in the target language use domain. Interactiveness was also low due to the test’s inability to fully involve the test-takers’ grammatical ability in performing the tests. The impact of the CELT on stakeholders is not documented in the published manual.
In all fairness, the CELT was a product of its time, when emphasis was on discrete-point testing and reliability, and when language testers were not yet discussing qualities of test usefulness in terms of authenticity, interactiveness and impact.
The Community English Program (CEP) Placement Test
Given the purpos sebagai and the intended uses of the CEP Placement Test, the grammar section privileges authenticity, construct validity, reliability and practicality. Similar to tasks in the instruction, the theme-based test tasks all support the same overarching theme presented from different perspectives. Then, the construct of grammatical knowledge is defined in terms of the grammar used to express the theme. Given the multiple-choice format and the piloting of items, reliability is an important concern. Finally, the multiple-choice format is used over a limited-production format to maximize practicality. This compromise is certainly emphasized at the expense of construct validity and authenticity (of task).
Nonetheless, grammatical ability is also measured in the writing and speaking parts of the CEP Placement Test. These sections privilege construct validity, reliability, authenticity and interactiveness. In these tasks, students are asked to use grammatical resources to write about and discuss the theme they have been learning about during the test. In both the writing and speaking sections, grammatical ability is a separately scored part of the scoring rubric, and definitions of grammatical knowledge are derived from theory and from an examination of benchmark samples. Reliability is addressed by scoring all writing and speaking performance samples ‘blind’ by two raters. In terms of authenticity and interactiveness, these test sections seek to establish a strong correspondence between the test tasks and the type of tasks encountered in themebased language instruction – that is, examinees listen to texts in which the theme is presented, they learn new grammar and use it to express ideas related to the theme, they then read, write and speak about the theme. The writing and speaking sections require examinees to engage both language and topical knowledge to complete the tasks. In both cases, grammatical control and topical control are scored separately. Finally, while these test sections prioritize construct validity, reliability, authenticity and interactiveness, it is certainly at the expense of practicality and impact.
Learning-Oriented Assessments of Grammatical Ability
What is learning-oriented assessment of grammar?
Alternative assessment emphasizes an alternative to and rejection of selected-response, timed and one-shot approaches to assessment, whether they occur in large-scale or classroom assessment contexts. Alternative assessment encourages assessments in which students are asked to perform, create, produce or do meaningful tasks that both tap into higher-level thinking (e.g., problem-solving) and have real-world implications (Herman et al., 1992). Alternative assessments are scored by humans, not machines.
Similar to alternative assessment, authentic assessment stresses measurement practices which engage students’ knowledge and skills in ways similar to those one can observe while performing some real-life or ‘authentic’ task (O’Malley and Valdez-Pierce, 1996). It also encourages tasks that require students to perform some complex, extendedproduction activity, and emphasizes the need for assessment to be strictly aligned with classroom goals, curricula and instruction. Selfassessment is considered a key component of this approach.
Performance assessment refers to the evaluation of outcomes relevant to a domain of interest (e.g., grammatical ability), which are derived from the observation of students performing complex tasks that invoke realworld applications (Norris et al., 1998). As with most performance data, assessments are scored by human judges (Stiggins, 1987; Herman et al., 1992; Brown, 1998) according to a scoring rubric that describes what testtakers need to do in order to demonstrate knowledge or ability at a given performance level. Bachman (2002) characterized language performance assessment as typically: (1) involving more complex constructs than those measured in selected-response tasks; (2) utilizing more complex and authentic tasks; and (3) fostering greater interactions between the characteristics of the test-takers and the characteristics of the assessment tasks than in other types of assessments. Performance assessment encourages self-assessment by making explicit the performance criteria in a scoring rubric. In this way, students can then use the criteria to evaluate their performance and contribute proactively to their own learning.
Challenges and new directions in assessing grammatical ability
Challenge 1: Defining grammatical ability
One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintained a strong syntactocentric view of language rooted largely in linguistic structuralism. Moreover, models of language ability, such as those proposed by Lado (1961) and Carroll (1961), had a clear linguistic focus, and assessment concentrated on measuring language elements –defined in terms of morphosyntactic forms on the sentence level – while performing language skills. Grammatical knowledge was determined solely in terms of linguistic accuracy. This approach to testing led to examinations such at the CELT (Harris and Palmer, 1970a) and the English Proficiency Test battery (Davies, 1964).
Challenge 2: Scoring grammatical ability
A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures. For example, the use of dichotomous scoring, even with certain selectedresponse items, might need to give way to partial-credit scoring, since some wrong answers may reflect partial development either in form or meaning. As a result, language educators might need to adapt their scoring procedures to reflect the two dimensions of grammatical knowledge. This might, in turn, require the use of measurement models that can accommodate both dichotomous and partial-credit data in calculating and analyzing test scores. Then, in scoring extended-production tasks for both form and meaning, descriptors on scoring rubrics might need to be adapted to reflect graded performance in the two dimensions of grammatical knowledge more clearly. It should also be noted that more complex scoring procedures will impact the resources it takes to mark responses or to program machine-scoring devices. It will also require a closer examination (and hopefully ongoing research) of how a wrong answer may be a reflection of interlanguage development. However, successfully meeting these challenges could provide a more valid assessment of the test takers’ underlying grammatical ability.
Challenge 3: Assessing meanings
The third challenge revolves around ‘meaning’ and how ‘meaning’ in a model of communicative language ability can be defined and assessed. The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language. Therefore, while the grammatical resources used to communicate these meanings precisely are important, the notion of meaning conveyance in the communicative curriculum is critical. Therefore, in order to test something as intangible as meaning in second or foreign language use, we need to define what it is we are testing.
Challenge 4: Reconsidering grammar-test tasks
The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching. Discrete-point testing methods may have even led some teachers to have reservations about testing grammar or to have uncertainties about how to test it communicatively.
Challenge 5: Assessing the development of grammatical ability
The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind. This notion stems from the work of several SLA researchers (e.g. Clahsen, 1985; Pienemann and Johnson, 1987; Ellis, 2001b) who maintain that the principal finding from years of SLA research is that structures appear to be acquired in a fixed order and a fixed developmental sequence. Furthermore, instruction on forms in non-contiguous stages appears to be ineffective. As a result, the acquisitional development of learners, they argue, should be a major consideration in the L2 grammar testing
ASSESSING VOCABULARY
Chapter 1: The Place of Vocabulary in language Assessment
At first glance, it may seem that assessing the vocabulary knowledge of second language learners is both necessary and reasonably straightforward. It is necessary in the sense that words are the basic building blocks of language, the units of meaning from which larger structure such as sentences, paragraphs and whole texts are formed. The widespread acceptance of the validity of these criticism has led to the adoption particularly in the major English-speaking countries-of the communicative approach to language testing. Todays language proficiency tests do not set out to determine whether learners know the meaning of magazine or put on or approximate; whether they can distinguish ship and sheep. Instead, the test are based on tasks simulating communication activities that the learners are likely to be engaged in outside of the classroom.
Following Bachmans (1990) earlier work, the authors see the purpose of language testing as being to allow us to make inferences about learner language ability, which consist of two components. One is language knowledge and the other is strategic competence. That is to say, learners need to know a lot about vocabulary grammar, sound system and spelling of the target language, but also need to be able to draw on that knowledge effectively for communicative purpose under normal time constraints.
Chapter 2: The Nature of Vocabulary
This chapter takes up the question of what we mean by vocabulary. We tend to think of it as consisting of individual words, as in the headwords of a dictionary; however, even the definition of a `word' is by no means straightforward. It is also necessary to consider lexical units that are larger than single words, such as compound nouns, phrasal verbs, idioms and fixed expressions of various kinds. For assessment purposes, vocabulary is not just a set of linguistic units but also an attribute of individual language learners, in the form of vocabulary knowledge and the ability to access that knowledge for communicative purposes.
At the simplest level vocabulary consist of words, but even the concept of a word is challenging to define and classify. For a number of assessment purpose, it is important to clarify what is meant by a word if the correct conclusion are to be drawn from the test result. Construct, Chapelles work points the way toward a definition of vocabulary ability that covers a winder range of assessment purpose and at the same time is consistent with Bachman and Palmers general construct of language ability. Whereas a construct of vocabulary knowledge may be satisfactory as the basis for the design of discrete, selective and context-independent test, Chapelles definition provides a better theoretical foundation for a construct that can incorporate embedded, comprehensive and context-dependent vocabulary measures as well.
Chapter 3: Research on Vocabulary Acquisition and Use
This chapter review the main lines of enquiry by researchers on second language vocabulary acquisition. Apart from the extensive work on methods of conscious vocabulary learning, researchers are investigating how acquisition of word knowledge occurs in a more incidental fashion through reading and listening activities. Other areas of interest are the ability of learners to guess the meaning of unknown words which they encounter in their reading, and the strategies they use to overcome gaps in their vocabulary knowledge when engaged in speaking and writing tasks.
Language acquisition research, L1 and L2, makes use of vocabulary assessment to explore how language skill develops; in tum, research informs our testing constructs. The ensuing review of research on vocabulary acquisition studies is concise and well presented. Of particular interest is Read's discussion of 'incidental vocabulary learning' and its relevance to the level of knowing a word that vocabulary tests tap. Read also notes that much of the research on vocabulary has been related to reading, leaving a gap in our knowledge of spoken language vocabulary.
In Chapter 4: Research on Vocabulary assessment
Consider research in language testing that either has involved the investigation of vocabulary tests or has a bearing on vocabulary assessment. One issue in this area is whether the notion of a `pure' vocabulary test is at all tenable. I trace the move away from discrete-point vocabulary tests and look in some detail at the extent to which the cloze procedure and its variants can be regarded as measures of vocabulary. Much recent work on vocabulary testing has focused on estimating how many words learners know (or their vocabulary size). A complementary perspective is provided by other studies that seek to assess the quality (or `depth') of their vocabulary knowledge. Here, the previous threads are knitted with various types of vocabulary testing (eg vocabulary size, quality of vocabulary knowledge, doze testing).
Chapter 5: Vocabulary Test, Four Case Studies
Presents case studies of four vocabulary tests:
- Nation's Vocabulary Levels Test;
- Meara and Jones's Eurocentres Vocabulary Size Test;
- Paribakht and Wesche's Vocabulary Knowledge Scale; and
- the vocabulary items in the Test of English as a Foreign Language (TOEFL).
In addition to being influential instruments in their own right, these tests exemplify several of the main currents in vocabulary testing discussed in the previous chapter. Practical issues in the design of vocabulary tests are discussed in.
Chapter 6: The Design of Discrete Vocabulary Test
The chapter includes discussion of two specific examples of test design from my own experience. One looks at some typical items for classroom progress tests, and the other is an account of my efforts to develop a workable test to measure depth of vocabulary knowledge. The reader might assume, given Read's framework, that discrete item testing would receive a negative review in this book, but that is not the case. Read argues for the appropriateness of the test to the purpose for which the test is used- for example, in assessing the progress of vocabulary learning in a classroom situation, the discrete test may be quite appropriate. Read lists the advantages of discrete vocabulary testing and gives practical examples of the difficulties involved with various test designs.
As noted previously, Read argues in Chapter 6 that the contrast between receptive and productive vocabularies may be misleading. Instead, Read suggests two dimensions of this contrast: recognition-recall and comprehension-use. Recognition is where the test-taker's understanding of the meaning of a word is assessed, whereas recall refers to the ability to remember, having encountered the word (such as in an experiment). Comprehension, of course, is the understanding of meanings encountered when listening or reading; use refers to the vocabulary that actually appears in speech or writing. Thus, recognition and comprehension are different aspects, or levels if you will, for testing receptive vocabulary, and recall and use are aspects of productive vocabulary. For the language teacher, Chapter 6 is perhaps the most practical part of the book, for it is this type of testing that will most likely be used in classroom situations.
Chapter 7: Comprehensive Measure of vocabulary
The largest section of the chapter covers procedures that have been applied to the assessment of learners' writing. These include `objective' counts of the relative proportions of different types of word in a composition, as well as `subjective' rating scales. I also consider the application of comprehensive measures, such as read ability formulas, to the analysis of input material for tests involving reading and listening tasks. This chapter also introduces assessing speech, noting available studies in this area. Also included in this section is a rather general discussion of readability and calculating lexical density.
Chapter 8: Further Development in Vocabulary Assessment
This includes discussion of ways in which computer-based corpus research can contribute to the development of vocabulary measures. A second major theme is the need to broaden our view of the nature of vocabulary. More consideration should be given to the role of multi-word lexical items in language use. Another priority is to gain a better understanding of the vocabulary of speech, as distinct from written language. There should also be more focus on the social dimension of vocabulary use.
Read underlines throughout the book that much of the work on vocabulary has come from studies of reading, with little work on spoken vocabulary. There is a very real need for more work on spoken vocabulary and how to assess it. Read also notes that there is a need to assess longer lexical items, rather than the more traditional focus on single words. He also sees great promise from the increasing use of computers in second language testing. Another need is for a current frequency list of word use, which would also take into account current knowledge of specialized vocabularies and multiword items.
References:
Purpura, James. 2004. ASSESSING GRAMMAR. United Kingdom: University Press Cambridge.
Read, John. 2000. ASSESSING VOCABULARY. United Kingdom: University Press Cambridge.