Kamis, 14 Mei 2020

SUMMARY ASSESSING GRAMMAR AND ASSESSING VOCABULARY

ASSESSING GRAMMAR

Differing notions of ‘grammar’ for assessment

Introduction

The study of grammar has had a long and important role in the history of second language and foreign language teaching. For centuries, to learn another language, or what I will refer to generically as an L2, meant to know the grammatical structures of that language and to cite prescriptions for its use. Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be sufficient for learners to actually acquire another language (Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study – to the extent that in the Middle Ages in Europe, it was thought to be the foundation of all knowledge and the gateway to sacred and secular understanding (Hillocks and Smith, 1991). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century. Even a few decades ago, it would have been hard to imagine language instruction without immediately thinking of grammar.

What is meant by ‘grammar’ in theories of language?

Grammar and linguistics

When most language teachers, second language acquisition (SLA) researchers and language testers think of ‘grammar’, they call to mind one of the many paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for the study and analysis of language. Such linguistic grammars are typically derived from data taken from native speakers and minimally constructed to describe well-formed utterances within an individual framework. These grammars strive for internal consistency and are mainly accessible to those who have been trained in that particular paradigm.

Since the 1950s, there have been many such linguistic theories – too numerous to list here – that have been proposed to explain language phenomena. Many of these theories have helped shape how L2 educators currently define grammar in educational contexts. Although it is beyond the purview of this book to provide a comprehensive review of these theories, it is, nonetheless, helpful to mention a few, considering both the impact they have had on L2 education and the role they play in helping define grammar for assessment purposes.
These two views of linguistic analysis have been instrumental in determining how grammar has been conceptualized in L2 classrooms in recent years. They have also influenced definitions of L2 grammar for assessment purposes. I will now provide a brief overview of some of the more influential linguistic theories that typify the syntactocentric and communicative views of language.

Form-based perspectives of language

One of the oldest theories to describe the structure of language is traditional grammar. Originally based on the study of Latin and Greek, traditional grammar drew on data from literary texts to provide rich and lengthy descriptions of linguistic form. Unlike some other syntactocentric theories, traditional grammar also revealed the linguistic meanings of these forms and provided information on their usage in a sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied an extensive set of prescriptive rules along with the exceptions. A typical rule in a traditional English grammar might be:

The first-person singular of the present tense verb ‘to be’ is ‘I am’. ‘Am’ is used with ‘I’ in all cases, except in first-person singular negative tag and yes/no questions, which are contracted. In this case, the verb ‘are’ is used instead of ‘am’. For example, ‘I’m in a real bind, aren’t I?’ or ‘Aren’t I trying my best?’

Form- and use-based perspectives of language

The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote different meanings, we might turn to those theories that study grammatical form and use interfaces. This would address questions such as: Why does a language need two or more structures that are similar in meaning? Are similar forms used to convey different specialized meanings? To what degree are similar forms a function of written versus spoken language, or to what degree are these forms characteristic of a particular social group or a specific situation? It is important for us to discuss these questions briefly if we ultimately wish to test grammatical forms along with their meanings and uses in context.

One approach to linguistic analysis that has contributed greatly to our understanding of the grammatical forms found in language use, as well as the contextual factors that influence the variability of these forms, is corpus linguistics. I will briefly describe corpus linguistics along with how findings from this approach can be useful for assessing grammar.

The common practice of compiling linguistic corpora, or large and principled collections of natural spoken and written texts, in order to analyze by computer patterns of language use in large databases of authentic texts has led to a relatively new field known as ‘corpus linguistics’. Not a theory of language per se, corpus linguistics embodies a suite of tools and methods designed to provide a source of evidence so that linguistic data can be analyzed distributionally – that is, to show how often and where a linguistic form occurs in spoken or written text. According to Biber, Conrad and Reppen (1998), these analyses typically focus on two concerns. One type of study examines the use of one linguistic feature (i.e., a lexical item or grammatical structure) in comparison with another. For example, corpus-based studies might examine the different uses of would. These studies might also compare the word wish with that clauses and to-infinitives, or they might examine a linguistic feature with a non-linguistic feature, such as gender, dialect or setting.

Communication-based perspectives of language

Other theories have provided grammatical insights from a communication based perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morphosyntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora. Rather, a communication-based perspective views grammar as a set of linguistic norms, preferences and expectations that an individual invokes to convey a host of pragmatic meanings that are appropriate, acceptable and natural depending on the situation. The assumption here is that linguistic form has no absolute, fixed meaning in language use (as seen in sentences 1.5 and 1.7 above), but is mutable and open to interpretation by those who use it in a given circumstance. Grammar in this context is often co-terminous with language itself, and stands not only for form, but also for meaningfulness and pragmatic appropriacy, acceptability or naturalness – a topic I will return to later since I believe that a blurring of these concepts is misleading and potentially problematic for language educators.

What is pedagogical grammar?

A pedagogical grammar represents an eclectic, but principled description of the target-language forms, created for the express purpose of helping teachers understand the linguistic resources of communication. These grammars provide information about how language is organized and offer relatively accessible ways of describing complex, linguistic phenomena for pedagogical purposes.

Research on L2 grammar teaching, learning and assessment

Research on L2 teaching and learning

Over the years, several of the questions mentioned above have intrigued language teachers, inspiring them to experiment with different methods, approaches and techniques in the teaching of grammar. To determine if students had actually learned under the different conditions, teachers have used diverse forms of assessment and drawn their own conclusions about their students. In so doing, these teachers have acquired a considerable amount of anecdotal evidence on the strengths and weaknesses of using different practices to implement L2 grammar instruction. These experiences have led most teachers nowadays to ascribe to an eclectic approach to grammar instruction, whereby they draw upon a variety of different instructional techniques, depending on the individual needs, goals and learning styles of their students.

Comparative methods studies

The comparative methods studies sought to compare the effects of different language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from are action to the grammar-translation method, which had dominated language instruction during the first half of the twentieth century. More generally, these studies were in reaction to form-focused instruction (referred to as ‘focus on forms’ by Long, 1991), which used a traditional structural syllabus of grammatical forms as the organizing principle for L2 instruction. According to Ellis (1997), form-focused instruction contrasts with meaning-focused instruction in that meaning-focused instruction emphasizes the communication of messages (i.e., the act of making a suggestion and the content of such a suggestion) while form focused instruction stresses the learning of linguistic forms. These can be further contrasted with form-and-meaning focused instruction (referred to by Long (1991) as ‘focus-on-form’), where grammar instruction occurs in a meaning-based environment and where learners strive to communicate meaning while paying attention to form. (Note that Long’s version of ‘focus-on-form’ stresses a meaning orientation with an incidental focus on forms.) These comparative methods studies all shared the theoretical premise that grammar has a central place in the curriculum, and that successful learning depends on the teaching method and the degree to which that promotes grammar processing.

Non-interventionist studies

While some language educators were examining different methods of teaching grammar in the 1960s, others were feeling a growing sense of dissatisfaction with the central role of grammar in the L2 curriculum. As a result, questions regarding the centrality of grammar were again raised by a small group of L2 teachers and syllabus designers who felt that the teaching of grammar in any form simply did not produce the desired classroom results. Newmark (1966), in fact, asserted that grammatical analysis and the systematic practice of grammatical forms were actually interfering with the process of L2 learning, rather than promoting it, and if left uninterrupted, second language acquisition, similar to first language acquisition, would proceed naturally.

 At the same time, the role of grammar in the L2 curriculum was also being questioned by some SLA researchers (e.g., Dulay and Burt, 1973; Bailey, Madden and Krashen, 1974) who had been studying L2 learning in instructed and naturalistic settings. In their attempts to characterize the L2 learner’s inter language at one or more points along the path toward target-like proficiency, several researchers came to similar conclusions about L2 development. They found that instead of making incremental leaps in grammatical ability through an accumulation of grammatical forms, as presented in a traditional grammar syllabus, learners in both instructed and naturalistic settings acquired the target structures in a relatively fixed order (Ellis, 1994) regardless of when they were introduced. For example, Krashen (1977) claimed that, in general, ESL learners first acquire the -ing affix, plural markings and the copula (stage 1), and then the auxiliary and the articles (stage 2). This is followed by the irregular past verb forms (stage 3) and finally, the regular past, the third-person singular affix and the possessive -s affix (stage 4). While this information is interesting, research findings involve only a skeletal list of the possible grammar points that any typical curriculum would encompass. As a result, we might wonder how this order will change if other grammar points are investigated at the same time. Also, we have no idea how this order would hold for many other languages.

Empirical studies in support of non-intervention

The non-interventionist position was examined empirically by Prabhu (1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication. In the CTP, Prabhu (1987) argued against the notion that the development of grammatical ability depended on a systematic presentation of grammar followed by planned practice. However, in an effort to evaluate the CTP program, Beretta and Davies (1985) compared classes involved in the CTP with classes outside the project taught with a structural-oral-situational method. They administered a battery of tests to the students, and found that the CTP learners outperformed the control group on a task-based test, whereas the non-CTP learners did better on a traditional structure test. These results lent partial support to the non-interventionist position by showing that task-based classrooms based on meaningful communication can also be effective in promoting SLA. However, these results also showed that again students do best when they are taught and tested in similar ways.

Possible implications of fixed developmental order to language assessment

The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably have some relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s inter language. In other words, information on the acquisitional order of grammatical items could conceivably serve as a basis for selecting grammatical content for tests that aim to measure different levels of developmental progression, such as Chang (2002, 2004) did in examining the underlying structure of a test that attempted to measure knowledge of the relative clauses. These findings also suggest a substantive approach to defining test tasks according to developmental order and sequence on the basis of how grammatical features are acquired over time (Ellis, 2001b). In other words, one task could potentially tap into developmental level one, while another taps into developmental level two, and so forth. 

To illustrate, grammar tests targeting beginning English-language learners often include questions on the articles and the third-person singular -s affix, two features considered to be ‘very challenging’ from an acquisitional perspective. Since, according to these findings, no beginning learner would be expected to have target-like control of these particular grammatical items, the inclusion of these grammatical features in a beginning classroom achievement test might be questionable. However, the inclusion of these items in a placement test would be highly appropriate since the goal of placement assessment is to identify a wide range of ability levels so that developmentally homogeneous groups can be formed.

Problems with the use of development sequences as a basis for assessment

Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, I believe this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing. This is because our understanding of natural acquisitional sequences is incomplete and at too early a stage of research to be the basis for concrete assessment recommendations (Lightbown, 1985; Hudson, 1993). First, the number of grammatical sequences that show a fixed order of acquisition is very limited, far too limited for all but the most restricted types of grammar tests. For example, what is the order for acquiring the modals, the conditionals, or the infinitive or gerund complements? Second, much of the research on acquisitional sequences is based on data from naturalistic settings, where students are provided with considerable exposure to the language. We have yet to learn about how these sequences hold for students whose only exposure to a language is an L2 classroom. Furthermore, acquisitional sequences make reference only to linguistic forms; no reference is made to how these forms interact with the conveyance of literal and implied meanings associated with a specific context. Third, as the rate (not the route) of acquisition appears to be influenced by the learner’s first language and by exposure to other languages, we need to understand how these factors might impact on development rates and how we would reconcile this if we wished to test heterogeneous groups of language learners. Finally, as the developmental levels represent an ordering of grammatical rules during acquisition, this may or may not be on the same measurement scale as accuracy scores. Thus, until further research demonstrates the precise relationship between these scales, we should be careful about comparisons between proficiency levels based on accuracy scales and levels of inter language development. In the end, it is premature to apply the findings from acquisitional sequences research to language assessment given our current level of understanding of developmental sequences.

Interventionist studies

Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain, 1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so. Testimony to this is the large number of non-native speakers who emigrate to countries around the world, live there all their lives and fail to learn the target language, or fail to learn it well enough to realize their personal, social and long-term career goals. In these situations, language teachers affirm that formal grammar instruction of some sort can be of benefit. Furthermore, most language teachers would contend that explicit grammar instruction, including systematic error correction and other instructional techniques, contributes immensely to their students’ linguistic development. Finally, despite the non-interventionist recommendations toward grammar teaching, I believe grammar still plays an important role in most L2 classrooms around the world.

Empirical studies in support of intervention

Aside from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn (1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction. They found that when coupled with meaning-focused instruction, the formal instruction of grammar appears to be more effective than exposure to meaning or form alone. Long (1991) also argued for a focus on both meaning and form in classrooms that are organized around meaningful and sustained communicative interaction.

Research on instructional techniques and their effects on acquisition

Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside the purview of this book (see Ellis, 1997; Doughty and Williams, 1998), these techniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000).

Grammar processing and second language development

In the grammar-learning process, explicit grammatical knowledge refers to a conscious knowledge of grammatical forms and their meanings. Explicit knowledge is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b). DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it involves the explanation of a rule or the request to focus on a grammatical feature. Instruction can be explicitly deductive, where learners are given rules and asked to apply them, or explicitly inductive, where they are given samples of language from which to generate rules and make generalizations. Similarly, many types of language test tasks (i.e., gap-filling tasks) seem to measure explicit grammatical knowledge.

Implicit grammatical knowledge refers to ‘the knowledge of a language that is typically manifest in some form of naturally occurring language behavior such as conversation’ (Ellis, 2001b, p. 252). In terms of processing time, it is unconscious and is accessed quickly. DeKeyser (1995) classifies grammatical instruction as implicit when it does not involve rule presentation or a request to focus on form in the input; rather, implicit grammatical instruction involves semantic processing of the input with any degree of awareness of grammatical form. The hope, of course, is that learners will ‘notice’ the grammatical forms and identify form–meaning relationships so that the forms are recognized in the input and eventually incorporated into the interlanguage. This type of instruction occurs when learners are asked to listen to a passage containing a specific grammatical feature. They are then asked to answer comprehension questions, but not asked to attend to the feature. Similarly, language test tasks that require examinees to engage in interactive talk might also be said to measure implicit grammatical knowledge.

Implications for assessing grammar

The studies investigating the effects of teaching and learning on grammatical performance present a number of challenges for language assessment. First of all, the notion that grammatical knowledge structures can be differentiated according to whether they are fully automatized (i.e., implicit) or not (i.e., explicit) raises important questions for the testing of grammatical ability (Ellis, 2001b). Given the many purposes of assessment, we might wish to test explicit knowledge of grammar, implicit knowledge of grammar or both. For example, in certain classroom contexts, we might want to assess the learners’ explicit knowledge of one or more grammatical forms, and could, therefore, ask learners to answer multiple-choice or short-answer questions related to these forms.
The role of grammar in models of communicative language ability

The role of grammar in models of communicative competence

Every language educator who has ever attempted to measure a student’s communicative language ability has wondered: ‘What exactly does a student need to “know” in terms of grammar to be able to use it well enough for some real-world purpose?’ In other words, they have been faced with the challenge of defining grammar for communicative purposes. To complicate matters further, linguistic notions of grammar have changed over time, as we have seen, and this has significantly increased number of components that could be called ‘grammar’. In short, definitions of grammar and grammatical knowledge have changed over time and across context, and I expect this will be no different in the future.

Rea-Dickins’ definition of grammar

In discussing more specifically howgrammatical knowledge might be tested within a communicative framework, Rea-Dickins (1991) defined ‘grammar’ as the single embodiment of syntax, semantics and pragmatics. She argued against Canale and Swain’s (1980) and Bachman’s (1990b) multi-componential view of communicative competence on the grounds that componential representations overlook the interdependence and interaction between and among the various components. She further stated that in Canale and Swain’s (1980) model, the notion of grammatical competence was limited since it defined grammar as ‘structure’ on the one hand and as ‘structure and semantics’ on the other, but ignored the notion of ‘structure as pragmatics’. Similarly, she added that in Bachman’s (1990b) model, grammar was defined as structure at the sentence level and as cohesion at the suprasentential level, but this model failed to account for the pragmatic dimension of communicative grammar.

Larsen-Freeman’s definition of grammar

Another conceptualization of grammar that merits attention is LarsenFreeman’s (1991, 1997) framework for the teaching of grammar in com- municative language teaching contexts. Drawing on several linguistic theories and influenced by language teaching pedagogy, she has also characterized grammatical knowledge along three dimensions: linguistic form, semantic meaning and pragmatic use. Form is defined as both morphology, or how words are formed, and syntactic patterns, or how words are strung together. This dimension is primarily concerned with linguistic accuracy. The meaning dimension describes the inherent or literal message conveyed by a lexical item or a lexico-grammatical feature. This dimension is mainly concerned with the meaningfulness of an utterance. The use dimension refers to the lexico-grammatical choices a learner makes to communicate appropriately within a specific context. Pragmatic use describes when and why one linguistic feature is used in a given context instead of another, especially when the two choices convey a similar literal meaning. In this respect, pragmatic use is said to embody presuppositions about situational context, linguistic context, discourse context, and sociocultural context. This dimension is mainly concerned with making the right choice of forms in order to convey an appropriate message for the context.

What is meant by ‘grammar’ for assessment purposes?

Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes. With this goal in mind, we need a definition of grammatical knowledge that is broad enough to provide a theoretical basis for the construction and validation of tests in a number of contexts. At the same time, we need our definition to be precise enough to distinguish it from other areas of language ability.

From a theoretical perspective, the main goal of language use is communication, whether it be used to transmit information, to perform transactions, to establish and maintain social relations, to construct one’s identity or to communicate one’s intentions, attitudes or hypotheses. Being the primary resource for communication, language knowledge consists of grammatical knowledge and pragmatic knowledge. Therefore, I propose a theoretical definition of language knowledge that consists of two distinct, but related, components.

Towards a definition of grammatical ability

Defining grammatical constructs

Although our basic underlying model of grammar will remain the same in all testing situations (i.e., grammatical form and meaning), what it means to ‘know’ grammar for different contexts will most likely change (see Chapelle, 1998). In other words, the type, range and scope of grammatical features required to communicate accurately and meaningfully will vary from one situation to another. For example, the type of grammatical knowledge needed to write a formal academic essay would be very different from that needed to make a train reservation. Given the many possible ways of interpreting what it means to ‘know’ grammar, it is important that we define what we mean by ‘grammatical knowledge’ for any given testing situation. A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability.

One of the first steps in designing a test, aside from identifying the need for a test, its purpose and audience, is to provide a clear theoretical definition of the construct(s) to be measured. If we have a theoretically sound, as well as a clear and precise definition of grammatical knowledge, we can then design tasks to elicit performance samples of grammatical ability. By having the test-takers complete grammar tasks, we can observe – and score – their answers with relation to specific grammaticalcriteria for correctness. If these performance samples reflect the under-lying grammatical constructs – an empirical question – we can then use the test results to make inferences about the test-takers’ grammatical ability. These inferences, in turn, may be used to make decisions about the test-takers (e.g., pass the course). However, we need first to provide evidence that the tasks on a test have measured the grammatical constructs we have designed them to measure (Messick, 1993). The process of providing arguments in support of this evidence is called validation, and this begins with a clear definition of the constructs.

What is ‘grammatical ability’ for assessment purposes?

The approach to the assessment of grammatical ability in this book is based on several specific definitions. First, grammar encompasses grammatical form and meaning, whereas pragmatics is a separate, but related, component of language. A second is that grammatical knowledge, along with strategic competence, constitutes grammatical ability. A third is that grammatical ability involves the capacity to realize grammatical knowledge accurately and meaningfully in test-taking or other language-use contexts. The capacity to access grammatical knowledge to understand and convey meaning is related to a person’s strategic competence. It is this interaction that enables examinees to implement their grammatical ability in language use. Next, in tests and other language-use contexts, grammatical ability may interact with pragmatic ability (i.e., pragmatic knowledge and strategic competence) on the one hand, and with a host of non-linguistic factors such as the test-taker’s topical knowledge, personal attributes, affective schemata and the characteristics of the task on the other. Finally, in cases where grammatical ability is assessed by means of an interactive test task involving two or more interlocutors, the way grammatical ability is realized will be significantly impacted by both the contextual and the interpretative demands of the interaction.

The components of grammatical knowledge

Knowledge of phonological or graphological form and meaning

Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations.

Knowledge of lexical form and meaning

Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that reveal meaning. This includes words that mark gender (e.g., waitress), countability (e.g., people) or part of speech (e.g., relate, relation). For example, when the word think in English is followed by the preposition about before a noun, this is considered the grammatical dimension of lexis, representing a co-occurrence restriction with prepositions. One area of lexical form that poses a challenge to learners of some languages is word formation. This includes compounding in English with a noun + noun or a verb + particle pattern (e.g., fire escape; breakup) or derivational affix-ation in Italian (e.g., ragazzino ‘little kid’, ragazzone ‘big kid’). For example, a student who says ‘a teacher of chemistry’ instead of ‘chemistry teacher’ or ‘*this people’ would need further instruction in lexical form.

Knowledge of morphosyntactic form and meaning
Knowledge of morphosyntactic form permits us to understand and produce both the morphological and syntactic forms of the language. This includes the articles, prepositions, pronouns, affixes (e.g., -est), syntactic structures, word order, simple, compound and complex sentences, mood, voice and modality. A learner who knows the morphosyntactic form of the English conditionals would know that: (1) an if-clause sets up a condition and a result clause expresses the outcome; (2) both clauses can be in the sentence-initial position in English; (3) if can be deleted under certain conditions as long as the subject and operator are inverted; and (4) certain tense restrictions are imposed on if and result clauses.

Knowledge of cohesive form and meaning

Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and express cohesion on both the sentence and the discourse levels. Cohesive form is directly related to cohesive meaning through cohesive devices (e.g., she, this, here) which create links between cohesive forms and their referential meanings within the linguistic environment or the surrounding co-text. Halliday and Hasan (1976, 1989) list a number of grammatical forms for displaying cohesive meaning.

Knowledge of information management form and meaning

Knowledge of information management form allows us to use linguistic forms as a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures. These forms are used to create information management meaning.

Knowledge of interactional form and meaning

Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-ininteraction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions. 

Designing test tasks to measure L2 grammatical ability

How does test development begin?

Every grammar-test development project begins with a desire to obtain (and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996). Within these situations, the tasks or activities requiring language to achieve a communicative goal are called the target language use tasks. A TLU task is one of many languageuse tasks that test-takers might encounter in the target language use domain. It is to this domain that language testers would like to make inferences about language ability, or more specifically, about grammatical ability.

What do we mean by ‘task’?

The notion of ‘task’ in language-learning contexts has been conceptualized in many different ways over the years. Traditionally, ‘task’ has referred to any activity that requires students to do something for the intent purpose of learning the target language. A task then is any activity (i.e., short answers, role-plays) as long as it involves a linguistic or nonlinguistic (circle the answer) response to input. Traditional learning or teaching tasks are characterized as having an intended pedagogical purpose – which may or may not be made explicit; they have a set of instructions that control the kind of activity to be performed; they contain input (e.g., questions); and they elicit a response. More recently, learning tasks have been characterized more in terms of their communicative goals, their success in eliciting interaction and negotiation of meaning, and their ability to engage learners in complex meaningfocused activities (Nunan, 1989, 1993; Berwick, 1993; Skehan, 1998).

What are the characteristics of grammatical test tasks?

As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to the differences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests. As all language teachers know, the kinds of tasks we use in tests and their quality can greatly influence how students will perform. Therefore, given the role that the effects of task characteristics play on performance, we need to strive to manage (or at least understand) the effects of task characteristics so that they will function the way we designed them to – as measures of the constructs we want to measure (Douglas, 2000). In other words, specifically designed tasks will work to produce the types of variability in test scores that can be attributed to the underlying constructs given the contexts in which they were measured (Tarone, 1998). To understand the characteristics of test tasks better, we turn to Bachman and Palmer’s (1996) framework for analyzing target language use tasks and test tasks.

The Bachman and Palmer framework

Bachman and Palmer’s (1996) framework of task characteristics represents the most recent thinking in language assessment of the potential relationships between task characteristics and test performance. In this framework, they outline five general aspects of tasks, each of which is characterized by a set of distinctive features. These five aspects describe characteristics of (1) the setting, (2) the test rubrics, (3) the input, (4) the expected response and (5) the relationship between the input and response.

Describing grammar test tasks

When language teachers consider tasks for grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. We also need to remember that the tasks we include on tests should strive to match the types of language-use tasks found in real-life or language instructional domains.

In designing grammar tests, we need to be familiar with a wide range of activities to elicit grammatical performance. In the rest of the chapter, I will describe several tasks in light of how they can be used to measure grammatical knowledge. I will use the Bachman and Palmer framework as a guide for task specification in this discussion.

Selected-response task types

Selected-response tasks present input in the form of an item, and testtakers are expected to select the response. Other than that, all other task characteristics can vary. For example, the form of the input can be language, non-language or both, and the length of the input can vary from a word to larger pieces of discourse. In terms of the response, selectedresponse tasks are intended to measure recognition or recall of grammatical form and/or meaning. They are usually scored right/wrong, based on one criterion for correctness; however, in some instances, partial-credit scoring may be useful, depending on how the construct is defined. Finally, selected-response tasks can vary in terms of reactivity, scope and directness.

Limited-production task types

Limited-production tasks present input in the form of an item with language and/or non-language information that can vary in length or topic. Different from selected-response tasks, limitedproduction tasks elicit a response embodying a limited amount of language production. The length of this response can be anywhere from a word to a sentence. All task characteristics in limited-production tasks can vary with the exception of two: the type of input (always an ‘item’) and the type of expected response (always ‘limited-production’). 

Limited-production tasks are intended to assess one or more areas of grammatical knowledge depending on the construct definition. Unlike selected-response items, which usually have only one possible answer, the range of possible answers for limited-production tasks can, at times, be large – even when the response involves a single word.

Developing tests to measure L2 grammatical ability

What makes a grammar test ‘useful’?

Score-based inferences from grammar tests can be used to make a variety of decisions. For example, classroom teachers use these scores as a basis for making inferences about learning or achievement. These inferences can then serve to provide feedback for learning and instruction, assign grades, promote students to the next level, or even award a certificate. They can also be used to help teachers or administrators make decisions about instruction or the curriculum.

The information derived from language tests, of which grammar tests are a subset, can be used to provide test-takers and other test-users with formative and summative evaluations. Formative evaluation relating to grammar assessment supplies information during a course of instruction or learning on how test-takers might increase their knowledge of grammar, or how they might improve their ability to use grammar in communicative contexts. It also provides teachers with information on how they might modify future instruction or fine-tune the curriculum. For example, feedback on an essay telling a student to review the passive voice would be formative in nature. Summative evaluation provides test stakeholders with an overall assessment of test-taker performance related to grammatical ability, typically at the end of a program of instruction. This is usually presented as a profile of one or more scores or as a single grade.

Score-based inferences from grammar tests can also be used to make, or contribute to, decisions about program placement. This information provides a basis for deciding how students might be placed into a level of a language program that best matches their knowledge base, or it might determine whether or not a student is eligible to be exempted from further L2 study. Finally, inferences about grammatical ability can make or contribute to other high-stakes decisions about an individual’s readiness for learning or promotion, their admission to a program of study, or their selection for a job.

Given the goals and uses of tests in general, and grammar tests in particular, it is fitting to ask how we might actually know if a test is, indeed, able to elicit scorable behaviors from which to make trustworthy and meaningful inferences about an individual’s ability. In other words, how do we know if a grammar test is ‘good’ or ‘useful’ for our particular context?

Many language testers (e.g., Harris, 1969; Lado, 1961) have addressed this question over the years. Most recently, Bachman and Palmer (1996) have proposed a framework of test usefulness by which all tests and test tasks can be judged, and which can inform test design, development and analysis. They consider a test ‘useful’ for any particular testing situation to the extent that it possesses a balance of the following six complementary qualities: reliability, construct validity, authenticity, interactiveness, impact and practicality. They further maintain that for a test to be ‘useful’, it needs to be developed with a specific purpose in mind, for a specific audience, and with reference to a specific target language use (TLU) domain.

Overview of grammar-test construction


Bachman and Palmer (1996) organize test development into three stages: design, operationalization and administration. I will discuss each of these stages in the process of describing grammar-test development.

Stage 1: Design

The design stage of test development involves the accumulation of information and making initial decisions about the entire test process. In tests involving one class, this may be a relatively informal process; however, in tests involving wider audiences, such as a joint final exam or a placement test, the decisions about test development must be discussed and negotiated with several stakeholders. The outcome of the design stage is a design statement. According to Bachman and Palmer (1996, p. 88), this document should contain the following components:
  1. a description of the purpose(s) of the test,
  2. a description of the TLU domains and task types,
  3. a description of the test-takers,
  4. a definition of the construct(s) to be measured,
  5. a plan for evaluating test usefulness, and
  6. a plan for dealing with resources.

Stage 2: Operationalization

The operationalization stage of grammar-test development describes how an entire test involving several grammar tasks is assembled, and how the individual tasks are specified, written and scored.
  1. Specifying the scoring method
  2. Scoring selected-response tasks
  3. Scoring extended-production tasks
  4. Using scoring rubrics
  5. Grading

Stage 3: Test administration and analysis

The final stage in the process of developing grammar tests involves the administration of the test to individual students or small groups, and then to a large group of examinees on a trial basis.

Illustrative tests of grammatical ability

The First Certificate in English Language Test (FCE)

Given the assessment purposes and the intended uses of the FCE, the FCE grammar assessments privilege construct validity, authenticity, interactiveness and impact. This is done by the way the construct of grammatical ability is defined. This is also done by the ways in which these abilities are tapped into, and the ways in which the task characteristics are likely to engage the examinee in using grammatical knowledge and other components of language ability in processing input to formulate responses. Finally, this is done by the way in which Cambridge ESOL has promoted public understanding of the FCE, its purpose and procedures, and has made available certain kinds of information on the test. These qualities may, however, have been stressed at the expense of reliability.

The Comprehensive English Language Test (CELT)

In terms of the purposes and intended uses of the CELT, the authors explicitly stated, ‘the CELT is designed to provide a series of reliable and easy-to-administer tests for measuring English language ability of nonnative speakers’ (Harris and Palmer, 1970b, p. 1). As a result, concerns for high reliability and ease of administration led the authors to make choices privileging reliability and practicality over other qualities of test usefulness. To maximize consistency of measurement, the authors used only selected-response task types throughout the test, allowing for minimal fluctuations in the scores due to characteristics of the test method. This allowed them to adopt ‘easy-to-administer’ and ‘easy-toscore’ procedures for maximum practicality and reliability. Reliability Illustrative tests of grammatical ability 201was also enhanced by pre-testing items with the goal of improving their psychometric characteristics.

Reliability might have been emphasized at the expense of other important test qualities, such as construct validity, authenticity, interactiveness and impact. For example, construct validity was severely compromised by the mismatch among the purpose of the test, the way the construct was defined and the types of tasks used to operationalize the constructs. In short, scores from discrete-point grammar tasks were used to make inferences about speaking ability rather than make interpretations about the test-takers’ explicit grammatical knowledge.

Finally, authenticity in the CELT was low due to the exclusive use of multiple-choice tasks and the lack of correspondence between these tasks and those one might encounter in the target language use domain. Interactiveness was also low due to the test’s inability to fully involve the test-takers’ grammatical ability in performing the tests. The impact of the CELT on stakeholders is not documented in the published manual.

In all fairness, the CELT was a product of its time, when emphasis was on discrete-point testing and reliability, and when language testers were not yet discussing qualities of test usefulness in terms of authenticity, interactiveness and impact.

The Community English Program (CEP) Placement Test

Given the purpos sebagai and the intended uses of the CEP Placement Test, the grammar section privileges authenticity, construct validity, reliability and practicality. Similar to tasks in the instruction, the theme-based test tasks all support the same overarching theme presented from different perspectives. Then, the construct of grammatical knowledge is defined in terms of the grammar used to express the theme. Given the multiple-choice format and the piloting of items, reliability is an important concern. Finally, the multiple-choice format is used over a limited-production format to maximize practicality. This compromise is certainly emphasized at the expense of construct validity and authenticity (of task).

Nonetheless, grammatical ability is also measured in the writing and speaking parts of the CEP Placement Test. These sections privilege construct validity, reliability, authenticity and interactiveness. In these tasks, students are asked to use grammatical resources to write about and discuss the theme they have been learning about during the test. In both the writing and speaking sections, grammatical ability is a separately scored part of the scoring rubric, and definitions of grammatical knowledge are derived from theory and from an examination of benchmark samples. Reliability is addressed by scoring all writing and speaking performance samples ‘blind’ by two raters. In terms of authenticity and interactiveness, these test sections seek to establish a strong correspondence between the test tasks and the type of tasks encountered in themebased language instruction – that is, examinees listen to texts in which the theme is presented, they learn new grammar and use it to express ideas related to the theme, they then read, write and speak about the theme. The writing and speaking sections require examinees to engage both language and topical knowledge to complete the tasks. In both cases, grammatical control and topical control are scored separately. Finally, while these test sections prioritize construct validity, reliability, authenticity and interactiveness, it is certainly at the expense of practicality and impact. 

Learning-Oriented Assessments of Grammatical Ability

What is learning-oriented assessment of grammar?
Alternative assessment emphasizes an alternative to and rejection of selected-response, timed and one-shot approaches to assessment, whether they occur in large-scale or classroom assessment contexts. Alternative assessment encourages assessments in which students are asked to perform, create, produce or do meaningful tasks that both tap into higher-level thinking (e.g., problem-solving) and have real-world implications (Herman et al., 1992). Alternative assessments are scored by humans, not machines.

Similar to alternative assessment, authentic assessment stresses measurement practices which engage students’ knowledge and skills in ways similar to those one can observe while performing some real-life or ‘authentic’ task (O’Malley and Valdez-Pierce, 1996). It also encourages tasks that require students to perform some complex, extendedproduction activity, and emphasizes the need for assessment to be strictly aligned with classroom goals, curricula and instruction. Selfassessment is considered a key component of this approach.

Performance assessment refers to the evaluation of outcomes relevant to a domain of interest (e.g., grammatical ability), which are derived from the observation of students performing complex tasks that invoke realworld applications (Norris et al., 1998). As with most performance data, assessments are scored by human judges (Stiggins, 1987; Herman et al., 1992; Brown, 1998) according to a scoring rubric that describes what testtakers need to do in order to demonstrate knowledge or ability at a given performance level. Bachman (2002) characterized language performance assessment as typically: (1) involving more complex constructs than those measured in selected-response tasks; (2) utilizing more complex and authentic tasks; and (3) fostering greater interactions between the characteristics of the test-takers and the characteristics of the assessment tasks than in other types of assessments. Performance assessment encourages self-assessment by making explicit the performance criteria in a scoring rubric. In this way, students can then use the criteria to evaluate their performance and contribute proactively to their own learning.

Challenges and new directions in assessing grammatical ability

Challenge 1: Defining grammatical ability

One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintained a strong syntactocentric view of language rooted largely in linguistic structuralism. Moreover, models of language ability, such as those proposed by Lado (1961) and Carroll (1961), had a clear linguistic focus, and assessment concentrated on measuring language elements –defined in terms of morphosyntactic forms on the sentence level – while performing language skills. Grammatical knowledge was determined solely in terms of linguistic accuracy. This approach to testing led to examinations such at the CELT (Harris and Palmer, 1970a) and the English Proficiency Test battery (Davies, 1964).

Challenge 2: Scoring grammatical ability

A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures. For example, the use of dichotomous scoring, even with certain selectedresponse items, might need to give way to partial-credit scoring, since some wrong answers may reflect partial development either in form or meaning. As a result, language educators might need to adapt their scoring procedures to reflect the two dimensions of grammatical knowledge. This might, in turn, require the use of measurement models that can accommodate both dichotomous and partial-credit data in calculating and analyzing test scores. Then, in scoring extended-production tasks for both form and meaning, descriptors on scoring rubrics might need to be adapted to reflect graded performance in the two dimensions of grammatical knowledge more clearly. It should also be noted that more complex scoring procedures will impact the resources it takes to mark responses or to program machine-scoring devices. It will also require a closer examination (and hopefully ongoing research) of how a wrong answer may be a reflection of interlanguage development. However, successfully meeting these challenges could provide a more valid assessment of the test takers’ underlying grammatical ability.

Challenge 3: Assessing meanings

The third challenge revolves around ‘meaning’ and how ‘meaning’ in a model of communicative language ability can be defined and assessed. The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language. Therefore, while the grammatical resources used to communicate these meanings precisely are important, the notion of meaning conveyance in the communicative curriculum is critical. Therefore, in order to test something as intangible as meaning in second or foreign language use, we need to define what it is we are testing.

Challenge 4: Reconsidering grammar-test tasks

The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching. Discrete-point testing methods may have even led some teachers to have reservations about testing grammar or to have uncertainties about how to test it communicatively.

Challenge 5: Assessing the development of grammatical ability

The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind. This notion stems from the work of several SLA researchers (e.g. Clahsen, 1985; Pienemann and Johnson, 1987; Ellis, 2001b) who maintain that the principal finding from years of SLA research is that structures appear to be acquired in a fixed order and a fixed developmental sequence. Furthermore, instruction on forms in non-contiguous stages appears to be ineffective. As a result, the acquisitional development of learners, they argue, should be a major consideration in the L2 grammar testing

ASSESSING VOCABULARY

Chapter 1: The Place of Vocabulary in language Assessment

At first glance, it may seem that assessing the vocabulary knowledge of second language learners is both necessary and reasonably straightforward. It is necessary in the sense that words are the basic building blocks of language, the units of meaning from which larger structure such as sentences, paragraphs and whole texts are formed. The widespread acceptance of the validity of these criticism has led to the adoption particularly in the major English-speaking countries-of the communicative approach to language testing. Todays language proficiency tests do not set out to determine whether learners know the meaning of magazine or put on or approximate; whether they can distinguish ship and sheep. Instead, the test are based on tasks simulating communication activities that the learners are likely to be engaged in outside of the classroom. 

Following Bachmans (1990) earlier work, the authors see the purpose of language testing as being to allow us to make inferences about learner language ability, which consist of two components. One is language knowledge and the other is strategic competence. That is to say, learners need to know a lot about vocabulary grammar, sound system and spelling of the target language, but also need to be able to draw on that knowledge effectively for communicative purpose under normal time constraints. 

Chapter 2: The Nature of Vocabulary 

This chapter takes up the question of what we mean by vocabulary. We tend to think of it as consisting of individual words, as in the headwords of a dictionary; however, even the definition of a `word' is by no means straightforward. It is also necessary to consider lexical units that are larger than single words, such as compound nouns, phrasal verbs, idioms and fixed expressions of various kinds. For assessment purposes, vocabulary is not just a set of linguistic units but also an attribute of individual language learners, in the form of vocabulary knowledge and the ability to access that knowledge for communicative purposes.

At the simplest level vocabulary consist of words, but even the concept of a word is challenging to define and classify. For a number of assessment purpose, it is important to clarify what is meant by a word if the correct conclusion are to be drawn from the test result. Construct, Chapelles work points the way toward a definition of vocabulary ability that covers a winder range of assessment purpose and at the same time is consistent with Bachman and Palmers general construct of language ability. Whereas a construct of vocabulary knowledge may be satisfactory as the basis for the design of discrete, selective and context-independent test, Chapelles definition provides a better theoretical foundation for a construct that can incorporate embedded, comprehensive and context-dependent vocabulary measures as well. 

Chapter 3: Research on Vocabulary Acquisition and Use

This chapter review the main lines of enquiry by researchers on second language vocabulary acquisition. Apart from the extensive work on methods of conscious vocabulary learning, researchers are investigating how acquisition of word knowledge occurs in a more incidental fashion through reading and listening activities. Other areas of interest are the ability of learners to guess the meaning of unknown words which they encounter in their reading, and the strategies they use to overcome gaps in their vocabulary knowledge when engaged in speaking and writing tasks.

Language acquisition research, L1 and L2, makes use of vocabulary assessment to explore how language skill develops; in tum, research informs our testing constructs. The ensuing review of research on vocabulary acquisition studies is concise and well presented. Of particular interest is Read's discussion of 'incidental vocabulary learning' and its relevance to the level of knowing a word that vocabulary tests tap. Read also notes that much of the research on vocabulary has been related to reading, leaving a gap in our knowledge of spoken language vocabulary.

In Chapter 4: Research on Vocabulary assessment

Consider research in language testing that either has involved the investigation of vocabulary tests or has a bearing on vocabulary assessment. One issue in this area is whether the notion of a `pure' vocabulary test is at all tenable. I trace the move away from discrete-point vocabulary tests and look in some detail at the extent to which the cloze procedure and its variants can be regarded as measures of vocabulary. Much recent work on vocabulary testing has focused on estimating how many words learners know (or their vocabulary size). A complementary perspective is provided by other studies that seek to assess the quality (or `depth') of their vocabulary knowledge. Here, the previous threads are knitted with various types of vocabulary testing (eg vocabulary size, quality of vocabulary knowledge, doze testing).

Chapter 5: Vocabulary Test, Four Case Studies 

Presents case studies of four vocabulary tests:
  • Nation's Vocabulary Levels Test;
  • Meara and Jones's Eurocentres Vocabulary Size Test;
  • Paribakht and Wesche's Vocabulary Knowledge Scale; and
  • the vocabulary items in the Test of English as a Foreign Language (TOEFL).


In addition to being influential instruments in their own right, these tests exemplify several of the main currents in vocabulary testing discussed in the previous chapter. Practical issues in the design of vocabulary tests are discussed in.

Chapter 6: The Design of Discrete Vocabulary Test

The chapter includes discussion of two specific examples of test design from my own experience. One looks at some typical items for classroom progress tests, and the other is an account of my efforts to develop a workable test to measure depth of vocabulary knowledge. The reader might assume, given Read's framework, that discrete item testing would receive a negative review in this book, but that is not the case. Read argues for the appropriateness of the test to the purpose for which the test is used- for example, in assessing the progress of vocabulary learning in a classroom situation, the discrete test may be quite appropriate. Read lists the advantages of discrete vocabulary testing and gives practical examples of the difficulties involved with various test designs. 

As noted previously, Read argues in Chapter 6 that the contrast between receptive and productive vocabularies may be misleading. Instead, Read suggests two dimensions of this contrast: recognition-recall and comprehension-use. Recognition is where the test-taker's understanding of the meaning of a word is assessed, whereas recall refers to the ability to remember, having encountered the word (such as in an experiment). Comprehension, of course, is the understanding of meanings encountered when listening or reading; use refers to the vocabulary that actually appears in speech or writing. Thus, recognition and comprehension are different aspects, or levels if you will, for testing receptive vocabulary, and recall and use are aspects of productive vocabulary. For the language teacher, Chapter 6 is perhaps the most practical part of the book, for it is this type of testing that will most likely be used in classroom situations.

Chapter 7: Comprehensive Measure of vocabulary

The largest section of the chapter covers procedures that have been applied to the assessment of learners' writing. These include `objective' counts of the relative proportions of different types of word in a composition, as well as `subjective' rating scales. I also consider the application of comprehensive measures, such as read ability formulas, to the analysis of input material for tests involving reading and listening tasks. This chapter also introduces assessing speech, noting available studies in this area. Also included in this section is a rather general discussion of readability and calculating lexical density.

Chapter 8: Further Development in Vocabulary Assessment

This includes discussion of ways in which computer-based corpus research can contribute to the development of vocabulary measures. A second major theme is the need to broaden our view of the nature of vocabulary. More consideration should be given to the role of multi-word lexical items in language use. Another priority is to gain a better understanding of the vocabulary of speech, as distinct from written language. There should also be more focus on the social dimension of vocabulary use.

Read underlines throughout the book that much of the work on vocabulary has come from studies of reading, with little work on spoken vocabulary. There is a very real need for more work on spoken vocabulary and how to assess it. Read also notes that there is a need to assess longer lexical items, rather than the more traditional focus on single words. He also sees great promise from the increasing use of computers in second language testing. Another need is for a current frequency list of word use, which would also take into account current knowledge of specialized  vocabularies and multiword items.  


References:

Purpura, James. 2004. ASSESSING GRAMMAR. United Kingdom: University Press Cambridge.

Read, John. 2000. ASSESSING VOCABULARY. United Kingdom: University Press Cambridge.

Sabtu, 09 Mei 2020

SUMMARY ASSESSING READING AND ASSESSING WRITING

ASSESSING READING

In foreign language learning, reading is likewise a skill that teachers simply expect learners to acquire. Basic, beginning-level textbooks in a foreign language presuppose a student's reading ability if only because it's a book that is the medium. Reading, arguably the most essential skill for success in all educational contexts, remains a skill of Paramount importance as we create assessments of general language ability. Two primary hurdles must be cleared in order to become efficient readers. First, they need to be 'able to master fundamental bottom-up strategies for processing separate letters, words, and phrases, as well as top-down, conceptually driven strategies for comprehension. Second, as part of that top-down approach, second language readers must develop appropriate content and formal schemata-background information and cultural experience-to carry out those interpretations effectively.

The assessment of reading ability does not end with the measurement of comprehension. As we consider a number of different types or genres of written texts, the components of reading ability, and specific tasks that are commonly used in the assessment of reading, let's not forget the unobservable nature of reading. Like listening, one cannot see the process of reading, nor can one observe a specific product of reading. once something is read-information from the written text is stored-no technology allows us to empirically measure exactly what is lodged in the brain. All assessment of reading must be carried out by inference.

TYPES (GENRES) OF READING 

Each type or genre of written text has its own set of governing rules and conventions. With an extraordinary number of genres present in any literate culture, the· reader's ability to process texts must be very sophisticated. Genres of reading: 
  • Academic reading
  • Job-related reading
  • Personal reading

The genre of a text enables readers to apply certain schemata that will assist them in extracting appropriate meaning. The content validity of an assessment procedure is largely established through the genre of a text.

MICROSKILLS, MACROSKILLS, AND STRATEGIES FOR READING

The micro- and macro skills below represent the spectrum of possibilities for objectives in the assessment of reading comprehension. Aside from simply testing the ultimate achievement of comprehension of a written text, it may be important in some contexts to assess one or more of a storehouse of classic reading strategies.

TYPES OF READING

In the case of reading, variety of performance is derived more from the multiplicity of types of texts (the genres listed above) than from the variety of overt types of performance. Nevertheless, for considering assessment procedures, several types of reading performance are typically identified, and these will serve as organizers of various assessment tasks.
  1. Perceptive. In keeping with the set of categories specified for listening comprehension, similar specifications are offered here; except with some differing terminology to capture the uniqueness of reading.
  2. Selective. This category is largely an artifact of assessment formats.
  3. Interactive. Included among interactive reading types are stretches of language of several paragraphs to one page or more in which the reader must, in a psycholinguistic sense, interact with the text.
  4. Extensive. Extensive reading, as discussed in this book, applies to texts of more than a page, up to and including professional articles, essays, technical reports, short stories, and books.

DESIGNING ASSESSMENT TASKS: PERCEPTIVE READING

Such tasks of perception are often referred to as literacy tasks, implying that the learner is in the early stages of becoming "literate. Assessment of literacy is no easy assignment, and if you are interested in this particular challenging area, further reading beyond this book is advised (Harp, 1991; Farr &Tone, 1994; Genesee, 1994; Cooper, 1997). Assessment of basic reading skills may be carried out in a number of different ways.

Reading Aloud

The test-taker sees separate letters, words, and/or short sentences and reads them aloud, one by one, in the presence of an administrator. Since the assessment is of reading comprehension, any recognizable oral approximation of the target response is considered correct.

Written Response 

The same stimuli are presented, and the test-taker's task is' to reproduce the probe in writing. Because of the transfer across different skills here, evaluation of the test taker's response must be carefully treated.

Multiple-Choice 

Multiple-choice responses are not only a matter of choosing one of four or five possible answers. Other formats, some of which are especially useful at the low levels of reading, include same/different, circle the answer, true/false, choose the letter, and matching.

Picture-Cued Items 

Test-takers are shown a picture, such as the one on the next page, along with a written text and are given one of a number of possible tasks to perform. Finally, test-takers might see a word or phrase and then be directed to choose one of four pictures that is being described, thus requiring the test-taker to transfer from a verbal to a nonverbal mode. In the following item

DESIGNING ASSESSMENT TASKS: SE~CTIVE READING 

Just above the rudimentary skill level of perception of letters and words is a category in which the test designer focuses on formal aspects of language (lexical, grammatical, and a few discourse features). This category includes what many incorrectly think of as testing "vocabulary and grammar. Here are some of the possible tasks you can use to assess lexical and grammatical aspects of reading ability.

Multiple-Choice (for Form-Focused Criteria) 

By far the most popular method of testing a reading knowledge of vocabulary and grammar is the multiple-choice format, mainly for reasons of practicality: it is easy to administer and can be scored quickly. The most straightforward .multiple-choice items may have little context, but might serve as a vocabulary or grammar check. While such dependencies offer greater authenticity to an assessment, they also add the potential problem of a test taker's missing several later items because of an earlier comprehension error.

Matching Tasks 

At this selective level of reading, the test-taker's task is simply to respond correctly, which makes matching an appropriate format. The most frequently appearing criterion in matching procedure is vocabulary. Alderson (2000, p.: 218) suggested matching procedures at an even more sophisticated level, where test takers have to discern pragmatic interpretations of certain signs or labels such as "Freshly made ·sandwiches" and "Use before 10/23/02."

Matching tasks have the advantage of offering an alternative to traditional multiple-choice or flJ.1-in-the-blank formats and are sometimes easier to construct than multiple-choice items, as long as the test designer has chosen the matches carefully.

Editing Tasks 

Editing for grammatical or rhetorical errors is a widely used test method for assessing linguistic competence in reading. The TOEFL® and many "other tests employ this technique with the argument that it not only focuses on grammar but also, introduces a simulation of the authentic task of editing, or discerning errors in written passages. Its authenticity may be supported if you consider proofreading as a real-world skill that is being tested.

Picture-Cued Tasks 

In the previous section we looked at picture-cued tasks for perceptive recognition of symbols and words. Several types of picture-cued methods are commonly used. 

  1. Test-takers read a sentence or passage and choose one of four pictures that is being described. The sentence (or sentences) at this level is more complex.
  2. Test-takers read a series of sentences or definitions, each describing a labeled part of a picture or diagram. Their task is to identify each labeled item.


Gap-Filling Tasks 

Many of the multiple-choice tasks described above can be converted into gap-filling, or "fill-in-the-blank' items in which the test-taker's response is to write a word or phrase. An extension of simple gap-filling tasks is to create sentence completion items where test-takers read part of a sentence and then complete it by writing a phrase. Another drawback is scoring the variety of creative responses that are likely to appear. In a test of reading comprehension only, you must accept as correct any responses that demonstrate comprehension of. the first part of the sentence. This alone indicates that such tasks are better categorized as integrative procedures.

DESIGNING ASSESSMENT TASKS: INTERACTIVE READING

Interactive tasks  may therefore imply a little more focus on top-down processing than on bottom-up. Texts are a little longer, from a paragraph to as much as a page or so in the case of ordinary prose. Charts, graphs, and other graphics may be somewhat complex in their format. 

Cloze Tasks

In written language, a sentence with a word left out show have enough context that a reader can close that gap with a calculated guess, using linguistic expectancies (formal schemata), background experience (content schemata), and some strategic competence. Some research (Oller, 1973, 1976, 1979) on second language acquisition vigorously defends cloze testing as an integrative measure not only of reading ability but also of other language abilities. 

Cloze tests are usually a minimum of two paragraphs in length in order to account for discourse expectancies. Typically every seventh word (plus or minus two) is deleted (known as fixed-ratio \/ deletion), but many cloze test designers instead use a rational deletion procedure 11, of choosing deletions according to the grammatical or discourse functions of the words.

Two approaches to the scoring of cloze tests are commonly used. The exact word method gives credit to test-takers only if they insert the exact word that was originally deleted. The second method, appropriate word scoring, credits the test taker for supplying any word that is grammatically correct and that makes good sense in the context. In the C-test (KleinBraley & Raatz, 1984; Klein-Braley,: 1985; D6tnyei & Katona, 1992), the second half (according to the number of letters) of every other word is obliterated, and the test taker must restore each word.

Two disadvantages are nevertheless immediately apparent: (1) Neither the words to insert nor the frequency of insertion appears to have any rationale. (2) Fast and efficient readers are not adept at detecting the intrusive words. Good reader natural need out such potential interruptions. 

Impromptant Reading Plus Comprehension Questions

Virtually every proficiency test uses the format, and one would rarely consider assessing reading without some component of the assessment involving impromptu reading and responding to questions. The questions represent a sample of the test specifications for TOEFL reading passages, which are derived from research on a variety of abilities good reader exhibit. Notice that many of them are consistent with strategies of effective reading: skimming for main idea, scanning for details, guessing word meanings from context, inferencing, using discourse markers, etc.

Short-Answer Tasks

A reading passage is presented, and the test-taker reads questions that must be answered in a sentence or two. Questions might cover the same specifications indicated above for the TOEFL reading, but be worded in question form. Do not take lightly the design of questions. It can be difficult to make sure that they reach their intended criterion. You will also need to develop consistent specifications for acceptable student responses and be prepared to take the time necessary to accomplish their evaluation.

Editing (Longer Texts)

The previous section of this chapter (on selective reading) described editing task, put there the discussion was limited to a list of unrelated sentences, each presented with an error to be-detected by the test-taker. Several advantages are gained in the longer format. First, authenticity is increased. Second, the task simulates proofreading one's own essay, where it is imperative to find and correct errors, and third if the test is connected to a specific curriculum (such as placement into one of several writing courses). 

Scanning 

Scanning is a strategy used by all readers to find relevant information in a text. Among the variety of scanning objectives (for each of the genres named above), the test-taker must locate 
  • a date, name, or place in an article; 
  • the setting for a narrative or story; 
  • the principal divisions of a chapter; 
  • the principal research finding in a technical report; 
  • a res1.I,lt reported in a specified cell in a table; 
  • the cost of an item on a menu; and 
  • Specified data needed to fill out an application.

Ordering Tasks 

Students always enjoy the activity of receiving little strips of paper, each with a sentence on it, and assembling them into a story, sometimes called the "strip story" technique. Alderson et al. (1995, p. 53) warn, however, against assuming that there is only one 'logical order. Different acceptable sentence orders become an instructive point for subsequent discussion in class, and you thereby offer washback into students' understanding of how to connect sentences and ideas in a story or essay.

Information Transfer: Reading Charts, Maps, Graphs, Diagrams

Every educated person must be able to comprehend charts, maps, graphs, calendars, diagrams, and the like. Reading a map implies understanding the conventions of map graphics, but it is often accompanied by telling someone where to turn, how far to go, etc. All of these media presuppose the reader's appropriate schemata for interpreting them and often are accompanied by oral or written discourse in order to convey, clarify, question, argue, and debate, among other linguistic functions. To comprehend information in this medium (hereafter referred to simply as "'graphics"), learners must be able to
  1. comprehend specific conventions of the various types of graphics; 
  2. comprehend labels, headings, numbers, and symbols; 
  3. comprehend the possible relationships among elements of the graphic; and 
  4. make inferences that are not presented overtly. 

The act of comprehending graphics includes the linguistic' performance of oral or written interpretations, comments, questions, etc. This implies a process of information transfer from one skill to another: in this case, from reading verbal and/or nonverbal information to speak/writing. 

DESIGNING ASSESSMENT TASKS: EXTENSIVE READING

Extensive reading involves somewhat longer texts than we have been dealing with up to this point. The reason for placing such reading into a separate category is that reading of this type of discourse almost always involves a focus on meaning using mostly top-down processing, with only occasional use of a targeted bottom-up strategy. 

Another complication in assessing extensive, reacting is that the expected response from the reader is likely to involve as much written (or sometimes oral) performance as reading. Before examining a few tasks that have proved to be useful in assessing extensive reading, it is essential to note that a number of the tasks described in previous categories can apply here. Among them are 
  • Impromptu reading plus comprehension questions, 
  • short-answer tasks, 
  • editing, 
  • scanning, 
  • ordering, 
  • information transfer, and 
  • interpretation (discussed under graphics),

Skimming Tasks 

Skimming is the process of rapid coverage of reading matter to determine its gist or main idea. It is a prediction strategy used to give a reader a sense of the topic and purpose of a text, the organization of the text, the perspective or point of view of the writer, its ease or difficulty, and/or its usefulness to the reader. Most assessments in the domain of skimming are informal and formative: they are grist for an imminent discussion, a more careful reading to follow, or an in-class discussion and therefore their washback potential is good.

Summarizing and Responding 

One of the most common means of assessing extensive reading is to ask the test taker to write a summary of the text. Evaluation of the reading comprehension criterion will of necessity remains somewhat subjective because the teacher will need to determine degrees of fulfillment of the objective (see below for more about scoring this task). 

Two tasks should not be confused with each other: summarizing requires a synopsis or overview of the text, while responding asks the reader to provide his or her own opinion on the text as a whole or on some statement or issue within it. An attempt has been made here to underscore the reading component of summarizing and responding to reading, but it is crucial to consider the interactive relationship between reading and writing that is highlighted in these two tasks. As you direct students to engage in such integrative performance, it is advisable not to treat them as tasks for assessing reading alone.

Note-Taking and Outlining

Finally, a reader's comprehension of extensive texts may be assessed through an evaluation of a process of note-taking and/or outlining. Because of the difficulty of controlling the conditions and time frame for both these techniques, they rest firmly in the category of informal assessment. A teacher perhaps in one-on-one conferences-with students, can use student notes/outlines as indicators of the presence or absence of effective reading strategies, and thereby point the learners in positive directions.

ASSESSING WRITING

Not many centuries ago, writing was a skill that was the exclusive domain of scribes and scholars in educational or religious institutions. In the field of second language teaching, only a half-century ago experts were saying that writing was primarily a convention for recording speech and for reinforcing grammatical and lexical features of language. With such a monumental goal, the job of teaching writing has occupied the attention of papers, articles, dissertations, books, and even separate professional journals exclusively devoted to writing in a second language.

GENRES OF WRITI'EN LANGUAGE

The same classification scheme is reformulated here to include the most common genres that a second language writer might produce, within and beyond the requirements of a curriculum. Genres of writing: 
  1. Academic writing
  2. Job-related writing
  3. Personal writing

TYPES OF WRITING PERFORMANCE

Each category resembles the categories defined for the other three skills, but these categories, as always, reflect the uniqueness of the skill area. 
  1. Imitative. This category includes the ability to spell correctly and to perceive phoneme-grapheme correspondences in the English spelling system.
  2. Intensive (controlled). Beyond the fundamentals of imitative writing are skills in producing appropriate vocabulary within a context, collocations and idioms, and correct grammatical features up to the length of a: sentence.
  3. Responsive. Here, assessment tasks require learners to perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected sequence of two or three paragraphs.
  4. Extensive. Extensive writing implies successful management of all the processes and strategies of writing for all purposes, up to the length of an essay, a term paper, a major research project report, or even a thesis.

MICRO- AND MACROSKILLS OF WRITING

We tum once again to a taxonomy of micro- and macro skills that will assist you in defining the ultimate criterion of an assessment procedure. The earlier micro skills apply more appropriately to imitative and intensive types of writing task, while title macro skills are essential for the successful mastery of responsive and extensive writing.

DESIGNING ASSESSMENT TASKS: IMITATIVE WRITING

English learner knows how to handwrite the Roman alphabet. Such is not the case. Many beginning level English learners, from young children to older adults, need basic training in and assessment of imitative writing: the rudiments of forming letters, words, and simple sentences. We examine this level of writing first. 

Tasks in [Hand] Writing Letters, Words, and Punctuation

A limited variety of types of tasks are commonly used to assess a person's ability to produce written letters and symbols. A few of the more common types are described here:' 
  1. Copying. There is nothing innovative or modern about directing a test-taker to copy letters or words.
  2. Listening cloze selection tasks. These tasks combine dictation with a written script that has a relatively frequent deletion ratio (every fourth or fifth word, perhaps).
  3. Picture-cued tasks. Familiar pictures are displayed, and test-takers are told to write the word that the picture represents.
  4. Form completion tasks. A variation on pictures is the use of a simple form (registration, application, etc.) that asks for name, address, phone number, and other data.
  5. Converting numbers and abbreviations to words. Some tests have a section on which numbers are written-for example, hours of the day, dates, or schedule sand test-takers are directed to write out the numbers.

Spelling Tasks and Detecting Phoneme-Grapheme Correspondences 

A number of task types are in popular use to assess the ability to spell words correctly and to process phoneme-grapheme correspondences. 
  1. Spelling tests. In a traditional, old-fashioned spelling test, the teacher dictates a simple list of words, one word at a time, followed by the word in a sentence, repeated again, with a pause for test-takers to write the word.
  2. Picture-cued tasks. Pictures are displayed with the objective of focusing on familiar words whose spelling may be unpredictable.
  3. Multiple-choice techniques. Presenting words and phrases in the form of a multiple-choice task risks crossing over into the domain of assessing reading, but if the items have a follow-up writing component, they can serve as formative reinforcement of spelling conventions.
  4. Matching phonetic symbols. If students have become familiar with the phonetic alphabet, they could be shown phonetic symbols and asked to write· the correctly spelled word alphabetically.

DESIGNING ASSESSMENT TASKS: INTENSIVE (CONTROLLED) WRITING

This next level of writing is what second language teacher training manuals have for decades called controlled writing. It may also be thought of as form focused writing, grammar writing, or simply guided writing. A good deal of writing at this level is display writing as opposed to real writing: students produce language to display their competence in grammar, vocabulary, or sentence formation, and not necessarily to convey meaning for an authentic purpose.

Dictation and Dicto-Comp

A form of controlled writing related to dictation is a dicto-comp. In one of several variations of the dicto- Comp technique, the teacher, after reading the passage, distributes a handout with key words from the paragraph, in sequence, as cues for the students. In either case, the dicto-comp is genuinely classified as an intensive, if not a responsive, writing task. 

Grammatical Transformation Tasks

In the heyday of structural paradigms of language teaching with slot-filler, echniques and slot substitution drills, the practice of making grammatical transformations orally or in writing-was very popular. Numerous versions of the task are possible: 
  1. Change the tenses in a paragraph. 
  2. Change full forms of verbs to reduced forms (contractions). 
  3. Change statements to yes/no or wh-questions. 
  4. Change questions into statements. 
  5. Combine two sentences into one using a relative pronoun. 
  6. Change direct speech to indirect speech. 
  7. Change from active to passive voice.

Picture-Cued Tasks

The main advantage in this technique is in detaching the almost ubiquitous reading and writing connection and offering instead a nonverbal means to stimulate written responses.
  1. Short sentences. A drawing of some simple action is shown; the test taker writes a brief sentence.
  2. Picture description. A somewhat more complex picture may be presented showing, say, a person reading on a couch, a cat under a table, books and pencils on the table, chairs around the table, a lamp next to the couch, and a picture on the wall over the couch.
  3. Picture sequence description. A sequence of three to six pictures depicting a story line can provide a suitable stimulus for written production.

Vocabulary Assessment Tasks

The major techniques used to assess vocabulary are (a) Defining and (b) using a word in a sentence. The latter is the more authentic, but even that task is constrained by a contrived situation in which the test-taker, usually in a matter of seconds, has to come up with an appropriate sentence, which major may not indicate that the test-taker "knows" the word. Read (2000) suggested several types of items for assessment of basic knowledge of the meaning of a word, collocational possibilities, and derived morphological forms. Vocabulary assessment is clearly form-focused in the above tasks, but the procedures are creatively linked by means of the target word, its collocations, and its morphological variants.

Ordering Tasks 

One task at the sentence level may appeal to those who are fond of word games and puzzles: ordering (or reordering) a scrambled set of words into a correct sentence. While this somewhat inauthentic task generates writing performance and may be said to tap into grammatical word-ordering rules, it presents a challenge to test takers whose learning styles do not dispose them to logical-mathematical problem solving.

Short-Answer and Sentence Completion Tasks 

Some types of short-answer tasks were discussed in Chapter 8 because of the heavy participation of reading performance in their completion. Such items range from very simple and predictable to somewhat more elaborate responses. The reading writing connection is apparent in the first three item types but has less of an effect in the last three, where reading is necessary in order to understand the directions but is not crucial in creating sentences.

ISSUES IN ASSESSING RESPONSIVE AND EXTENSIVE WRITING 

Responsive writing creates the opportunity for test-takers to offer an array of 'possible creative responses within a pedagogical or assessment framework: test-takers are "responding" to a prompt or assignment. The genres of text that are typically addressed here are 
  • short reports (with structured formats and conventions); 
  • responses to the reading of an article or story; 
  • summaries of articles or stories; 
  • brief narratives or descriptions; and 
  • interpretations of graphs, tables, and charts. 

It is here that writers become involved in the art (and science) of composing, or real writing, as opposed to display writing. Both responsive and extensive writing tasks are the subject of some classic, widely debated assessment issues that take on a distinctly different flavor from those at the lower-end production of writing. 

  1. Authenticity. Authenticity is a trait that is given special attention: if test takers are being asked to perform a task, its face and content validity need to be assured in order to bring out the best in the writer.
  2. Scoring. Scoring is the thorniest issue at these final two stages of writing.
  3. Time. Yet another assessment issue surrounds the unique nature of writing: it is the only skill in which the language producer is not necessarily constrained by time, which implies the freedom to process multiple drafts before the text becomes a finished product.

We have a whole testing industry that has based large-scale assessment of writing on the premise that the timed impromptu format is a valid method of assessing writing ability. 

DESIGNING ASSESSMENT TASKS RESPONSIVE AND EXTENSIVE WRITING

They will -be regarded here as a continuum of possibilities ranging from lower-end tasks whose complexity exceeds those in the previous category of intensive or controlled writing, through more open-ended tasks such as writing short reports, essays, summaries, and responses. up to texts of several pages or more.

Paraphrasing

The initial step in teaching paraphrasing is to ensure that learners understand the importance of. paraphrasing: to say something in one's own words, to avoid plagiarizing, to offer some variety in expression. Paraphrasing is more often a part of informal and formative assessment than of formal, summative assessment, and therefore student responses should be viewed as opportunities for teachers and students to gain positive washback on die art of paraphrasing. 

Guided Question and Answer

Another lower-order task in this type of writing, which has the pedagogical benefit of guiding a learner without dictating the form of the output, is a guided question-and-answer format in which the test administrator poses a series of questions that essentially serve as an outline of the emergent written text. Guided writing texts, which may be as long as two or three paragraphs, may be scored on either an analytic or a holistic scale (discussed below). Guided writing prompts like these are less likely to appear on a formal test and more likely to serve as a way to prompt initial drafts of writing. 

Paragraph Construction Tasks

Assessment of paragraph development takes on a number of different forms:
  1. Topic sentence writing.
  2. Topic development within a paragraph.
  3. Development of main and supporting ideas across paragraphs.

Strategic Options

A number of strategies are commonly taught to second language writers to accomplish their purposes. Aside from strategies of free writing, outlining, drafting, and revising, writers need to be aware of the task that has been demanded '" and to focus on the genre o~ writing and the expectations of that genre.

  1. Attending to task. In responsive writing, the context is seldom completely open-ended: a task has been defined by the teacher or test administrator, and the writer must fulfll1 the criterion of the task.
  2. Attending to genre. Assessment of the more common genres may include the following criteria, along with chosen factors from the list in item #3 (main and supporting ideas) above: 

  • Reports (Lab Reports, Project Summaries, Article/Book Reports, etc.)
  • Summaries of Readings/Lectures/Videos
  • Responses to Readings/Lectures/Videos
  • Narration, Description, Persuasion/Argument, and Exposition
  • Interpreting Statistical, Graphic, or Tabular Data
  • Library Research Paper

TEST OF WRITTEN ENGLISH (TWE®) 

One of a number of internationally available standardized tests of writing ability is the Test o/Written English. The TWE is in the category of a timed impromptu test in that test takers are under a 30-minute time limit and are not able to prepare ahead of time for the topic that will appear. Topics are prepared by a panel of experts following specifications for topics that represent commonly used discourse and thought patterns at the university level. Each point on the scoring system is defined by a set of statements that address topic, organization and development, supporting ideas, facility (fluency, naturalness, appropriateness) in writing, and grammatical and lexical correctness and choice.

It is important to put tests like the TWE in perspective. Timed impromptu tests have obvious limitations if you are looking for an authentic sample of performance in a real-world context. The convenience of the TWE should not lull administrators into believing that TWEs and TOEFLs and the like are the only measures that should be applied to students. 

SCORING METHODS FOR RESPONSIVE AND EXTENSIVE WRITING

In the first method, a single score is assigned to an essay, which represents a reader's general overall assessment. Primary trait scoring is a variation of the holistic method in that the achievement of the primary purpose, or trait, of an essay is the only factor rated.

Holistic Scoring

Advantages of holistic scoring include 
  • fast evaluation, 
  • relatively high inter-rater reliability, 
  • the fact that scores represent "standards" that are easily interpreted by lay persons, 
  • the fact that scores tend to emphasize the writer's strengths (Cohen, 1994, p. 315), and
  • applicability to writing across many different disciplines. 

Its disadvantages must also be weighed into a decision on whether to use holistic scoring: 

  • One score masks differences across the sub skills within each score. 
  • No diagnostic information is available (no washback potential). 
  • The scale may not apply equally well to all genres of writing. 
  • Raters need to be extensively trained to use the scale accurately.

In general, teachers and test designers lean toward holistic scoring only when it is expedient for administrative purposes. As long as trained evaluators are in place, differentiation across six levels may be quite adequate for admission into an institution or placement into courses.

Primary Trait Scoring

This type of scoring emphasizes the task at hand and assigns a score based on the effectiveness of the text's achieving that one goal. For rating the primary trait of the text, lloyd-Jones (1977) suggested a four point scale ranging from zero (no response or fragmented response) to 4 (the purpose is unequivocally accomplished in a convincing fashion). In summary, a primary trait score would assess 
  • the accuracy of the account of the original (summary), 
  • the clarity of the steps of the procedure and the final result (lab report), 
  • the description of the main features of the graph (graph description), and 
  • the expression of the writer's opinion (response to an article), 


Analytic Scoring

Primary trait scoring focuses on the principal function of the text and therefore offers some feedback potential, but no washback for any of the aspects of the written production that enhance the ultimate accomplishment of the purpose. Analytic scoring may be more appropriately called analytic assessment in order to capture its closer association with classroom language instruction than with formal testing. 

The order in which the five categories (organization, logical development of ideas, grammar, punctuation/spelling/mechanics, and style and quality of expression) are listed may bias the evaluator toward the greater importance of organization and logical development as opposed to punctuation and style. Not all writing and assessment specialists agree. You might, for example, consider the analytical scoring profile suggested by Jacobs et al. (1981), in which five slightly different categories were given the point values. Analytic scoring of compositions offers writers a little more washback than a single holistic or primary trait score. Scores in five or six major elements will help to call the writers' attention to areas of needed improvement.

BEYOND SCORING: RESPONDING TO EXTENSIVE WRITING

To accomplish that mission, designers of writing tests are charged with the task of providing as "objective" a scoring procedure as possible, and one that in many cases can be easily interpreted by agents beyond the learner. Yet beyond mathematically calculated scores lies a rich domain of assessment in which a developing writer is coached from stage to stage in a process of building a storehouse of writing skills.

To give the student the maximum benefit of assessment, it is important to consider (a) earlier stages (from free writing to the first draft or two) and (b) later stages (revising and finalizing) of producing a written text. A further factor in assessing writing is the involvement of self, peers, and teacher at appropriate steps in the process.

Assessing Initial Stages of the Process of Composing

Following are some guidelines for assessing the initial stages (the first draft or two) of a written composition. These guidelines are generic for self, peer,. and teacher , responding. Each assessor will need to modify the list according to the level of the learner, the context, and the purpose in responding. An early focus on overall structure and meaning will enable writers to . clarify their purpose and plan and will set a framework for the writers' later refinement of the lexical and grammatical issues.

Assessing Later Stages of the Process of Composing

Once the writer has determined and clarified his or her purpose and plan, and has completed at least one or perhaps two drafts, the focus shifts toward "fine tuning" the expression with a view toward a final revision. Through all these stages it is assumed that peers and teacher are both responding to the writer through conferencing in person, electronic communication, or, at the very least, an exchange of papers. 

All those developmental stages may be the preparation that learners need both to function in creative real world writing tasks and to successfully demonstrate their competence on a timed impromptu test. And those holistic scores are after all generalizations of the various components of effective writing. If the hard work of successfully progressing through a semester or two of a challenging course in academic writing ultimately means that writers are ready to function in their real-world contexts, and to get a 5 or 6 on the TWE, then all the effort was worthwhile.



References:
Brown, H. Douglas. Language Assessment: Principles and Classroom Practices. Logman.








SUMMARY ASSESSING GRAMMAR AND ASSESSING VOCABULARY

ASSESSING GRAMMAR Differing notions of ‘grammar’ for assessment Introduction The study of grammar has had a long and important ...