Assessing Speaking

Dorry Kenyon
Center for Applied Linguistics

We will begin with an overview of issues in assessing second language speaking ability. Based on research and development of technologically based instruments at the Center for Applied Linguistics (CAL), we will continue with a discussion of issues in task design, emphasizing the importance of aligning task characteristics and scoring criteria to response expectations. Participants will have the opportunity to develop and critique their own tasks. The workshop will conclude with a discussion of some alternative approaches developed by CAL to the training evaluators of speaking assessments.

[back]                                                                                                 [return to SCALAR Home]
 
 
A comparison of composing processes and written products in timed-
essay tests across paper & pencil and computer modes

Young-ju Lee
University of California, Los Angeles

The present qualitative study attempted to discover plausible differences in composing processes (i.e., planning and text production) when ESL (English as a Second Language) students write timed essays on paper and on computer. There has been a growing interest in the equivalence of computerized and paper & pencil tests especially in the context of multiple choice tests; however, the comparability of writing processes across the two modes of paper and computer has not been investigated thoroughly. For this study, six Korean undergraduate and graduate students with high proficiency in English were asked to write two-timed essays across modes with two different prompts that were rhetorically comparable. Upon completion of the test, participants reported on their composing processes in detail. Questions to stimulate recall (i.e., retrospective reports), video tapes of composing sessions, and interviews were employed to gather information about composing processes in timed-essay tests across the two modes. This study also investigated the extent to which computer attitude affects composing processes when subjects are writing on the computer and the ways in which the quality of the written product differs across the modes of paper and computer.

The results of this study showed that the planning and text production on the computer were more interwoven than they were when composing on paper. The written product analysis indicated that participants produced significantly more words in the computer mode than they did in the paper mode. However, there was no significant difference in the numbers of sentences produced across modes. Also there were no significant differences for scores across the modes, which suggests that mode of composition does not affect the scores.

[back]                                                                                               [return to SCALAR Home]
 
 
The Foreign Language National Assessment of Educational Progress

Dorry Kenyon
Center for Applied Linguistics

What is the National Assessment of Educational Progress (NAEP), also known as "the Nation's Report Card"? For over 30 years, NAEP has been collecting data annually on what students in the United States know and can do in reading, math, writing, and other school subject areas. In 2003, the first NAEP for foreign languages will be administered. Through this presentation, audience members will understand what the NAEP is all about, how it operates, and gain an overview of the design and procedures for the planned foreign language NAEP in 2003. Examples of assessment tasks in four areas will be presented and discussed: conversational tasks in the interpersonal communication mode; listening and reading tasks in the interpretive communication mode; and writing tasks in the presentational mode.

[back]                                                                                                  [return to SCALAR Home]
 
 
Validation of grammar, content, spelling, and text length
in English and Korean writing of elementary school students

Jungok Bae
UCLA Extension

This study evaluates the construct validity of four components of writing in Korean and in English based on the responses produced by bilingual and monolingual children. Composition data were collected from students in grades 2 - 4 in a Korean/English dual language program, consisting of two ethno-linguistic groups, and peers from English classes in Los Angeles and Korean classes in Seoul. Tests consisting of letter-writing and story-writing tasks equivalent for both English and Korean were developed. For each task, parallel picture prompts and counter-balanced designs were administered. Componential scoring evaluated four writing constructs content, grammar, spelling, and text length separately for Korean and English compositions while using common criteria.

A latent variable approach was used to carry out a construct validation with EQS; interrelated hypotheses implied by both theory and writing assessment were sequentially tested. Internal stages of investigation, beginning with definitions of the writing constructs, instructional contexts and study participants, tested and compared several alternative factor models. Subsequently, multiple group analyses tested whether an identical factor pattern holds across grades, gender, groups defined by parallel picture use, and immersion/monolingual groups. Measurement properties (represented by factor loadings and intercepts) were checked for their cross-group invariance to see whether the measurement procedure had actually applied in the same way and to the same extent across groups defined above.

In the external stages of investigation, writing factors were examined as influence by student background characteristics. Factor-mean analyses with covariance-structure MIMIC models using dummy-coded group predictors are discussed pedagogically; the utility of this approach is indicated. A simultaneous analysis investigated the relations between English writing factors and corresponding Korean writing factors.

Results and implications are discussed sequentially. We hope the methods and findings in this study provide a heuristic for future research.

[back]                                                                                                   [return to SCALAR Home]
 
 
Input delivery, response collection, and automated
scoring in web-based language testing

Nathan T. Carr
University of California, Los Angeles

In the Web-based testing (WBT) environment, test input and test taker responses are centrally managed, presenting operational advantages not necessarily available in a computer-based testing (CBT) environment. Three primary issues are involved in the development of a WBT system: (1) Creation of the delivery system, which generally involves the creation of a series of web pages; (2) response collection, which requires the coordination of the web pages with scripts, which in turn collect test taker responses via either e-mail messages or direct input into a database; and (3) scoring, which is potentially the most challenging aspect of the process. At the same time, automated scoring offers the possibility of scoring open-ended responses in a cost-effective manner. This presentation will provide an overview of how test developers can go about constructing the three segments of a WBT system, with examples taken from the ESL portion of the ongoing UCLA Web-Based Language Assessment System (WebLAS) Project.

[back]                                                                                                   [return to SCALAR Home]
 
 
Speaking: Ability or activity? Where's the construct?

Lyle F. Bachman
University of California, Los Angeles

Speaking has historically been viewed as a skill, not only in language testing, but in virtually all other areas of applied linguistics, and has been defined in different ways, depending on the particular perspective of the researcher. All of these definitions, however, view speaking as an aspect of language ability that is learned or acquired, on the one hand, and something language users can do, on the other. In this paper I argue that to view speaking as a "skill" that can be acquired, used and assessed confounds what language users have with what they do. This confounding of speaking as theoretical construct ("has speaking ability") with speaking as pragmatic ascription ("able to speak") leads to irresolvable ambiguities in the way we interpret the results of research into the nature of speaking. This confound is equally problematic for language testing practice, since these two views of speaking imply different kinds of score-based inferences that require different validation arguments to justify them. As a way of resolving this problem, I will propose that for both language testing and applied linguistics research and practice, speaking can be more fruitfully viewed as an activity in which language users realize their language ability.

[back]                                                                                                    [return to SCALAR Home]
 
 
On the scalability of the components of the reading
comprehension ability: A progress report

Hossein Farhady
with
Parisa Daftari Fard
Iran University of Science and Technology

Although extensive research has been conducted to explain the latent structure of the reading comprehension (RC) ability, the controversy over the nature of RC skills has not been settled yet. Some researchers advocate the global nature of RC and claim that there is a general reading competence (Rost, 1993), while others believe that RC rests on certain interrelated underlying abilities (Alderson, 2000). Regarding the disagreements on the nature of RC, Grabe (1997) suggests 'A reading components' perspective as an appropriate research direction.

The purpose of this paper is to report on the first phase of an ongoing research designed to shed some light on the debate over these two perspectives. More specifically, an in depth investigation of the theoretical models offered to account for the nature of RC is attempted and all common features mentioned by researchers have been systematically organized into a comprehensive model. Then a reading comprehension test with varying item formats is developed with at least three items for each of the features.

The test is being administered to around 600 university students in Iran. After the data collection, various analyses will be utilized to answer the following three questions:
1. Could micro components be identified according to the theoretical model of RC?
2. Could the micro components be grouped into certain macro components?
3. Would the micro and macro components lend to a certain scalability criterion?

[back]                                                                                                  [return to SCALAR Home]
 
 
A stakeholder approach to trialling test-tasks

Rama Mathew
Rachel Lalitha Eapen
CIEFL, Hyderabad, India

We report in this paper on the experience of trialling a set of tasks on different kinds of testees in India. The pilot testing of the tasks adopted a stakeholder approach to test development in that teachers at different levels of education in different parts of the country were involved in test administration and gathering test-taker feedback. This involvement was seen to be of importance in ensuring a beneficial backwash of the test when it is launched for large-scale use. This presentation focuses only on the reading component of the test.

The study reveals information in terms of what kinds of tests work /don't work in our context and why. The findings are especially significant in the light of the fact that pilot testing in India is a relatively new practice, although there are several proficiency tests in use in the country. The study has implications not only for arriving at levels of English language proficiency in a multilingual context but also for teachers' on-going professional development. 

[back]                                                                                                    [return to SCALAR Home]