Use of Automated Writing Evaluation (AWE) for placement tests: Can scores of AWE be criteria to place students into language courses?

Similar documents
Name Period Date. Grade 9, Unit 3 Pre-assessment. High Stakes for Children in Immigration Reform. By: Alison Burns

2018 MCBAINE COMPETITION Brief Evaluation Scoring & Comment Sheet. Instructions

Comparison of the Psychometric Properties of Several Computer-Based Test Designs for. Credentialing Exams

International Law Office: WWP 452 Fall Semester, Tel /6842

Civics Grade 12 Content Summary Skill Summary Unit Assessments Unit Two Unit Six

PERSUASIVE WRITING THE BASIC PRINCIPLES 6/26/ Consider your reader s perspective. 2. Economize with words. 3.

Hoboken Public Schools. AP Statistics Curriculum

CSE 520S Real-Time Systems

Student Reading. American Indian Tribal Governments

Introduction to Path Analysis: Multivariate Regression

INTERNATIONAL STUDIES 205: INTRODUCTION TO EUROPEAN STUDIES

The Publication Process Demystified

Practice Questions for Exam #2

Hoboken Public Schools. PLTW Introduction to Computer Science Curriculum

PASW & Hand Calculations for ANOVA

Prentice Hall Abriendo Paso: Gramatica 2007 and Abriendo Paso: Lectura 2007

Scoring Notes for Secondary Social Studies CBAs (Grades 6 12)

Politics is about who gets what, when, and how. Harold Lasswell

EXAMINATION 3 VERSION B "Wage Structure, Mobility, and Discrimination" April 19, 2018

! = ( tapping time ).

All s Well That Ends Well: A Reply to Oneal, Barbieri & Peters*

BYLAWS OF THE WISCONSIN LAW REVIEW

Cambridge International General Certificate of Secondary Education 0495 Sociology November 2009 Principal Examiner Report for Teachers

AP EUROPEAN HISTORY 2011 SCORING GUIDELINES (Form B)

Jackson County Schools Curriculum Pacing Guide High School Social Science - Civics Fall / Spring Semester Unit 1 Unit 2 Unit 3 Unit 4 Unit 5 Unit 6

MODULE 8: REPORT WRITING

DU PhD in Home Science

Tuesday November 29, 2016

A LEADING AMERICAN UNIVERSITY WITH INTERNATIONAL REACH

COURT REPORTER 1 GENERAL

The role of Social Cultural and Political Factors in explaining Perceived Responsiveness of Representatives in Local Government.

To Pass, or not to Pass The Equal Rights Amendment Dilemma

Comparison on the Developmental Trends Between Chinese Students Studying Abroad and Foreign Students Studying in China

Learning Expectations

Classifier Evaluation and Selection. Review and Overview of Methods

Is inequality an unavoidable by-product of skill-biased technical change? No, not necessarily!

THE PREPARED CURRICULUM: FOR POST-SECONDARY AND CAREEER READINESS

AP European History 2007 Scoring Guidelines Form B

Election 2000: A Case Study in Human Factors and Design

Case Study: Get out the Vote

THE IMPORTANCE OF DROPOUT RETRIEVAL AMONG MIGRANT STUDENTS THE EXTENT OF DROPPING OUT AMONG MIGRANTS

Hoboken Public Schools. Spanish Two Curriculum

Aspire To Inspire LANGUAGE PHILOSOPHY OF GENESIS GLOBAL SCHOOL LANGUAGE POLICY

GBA 335 Case Brief 2 Guidelines and Rubric

Scoring Notes for Secondary Social Studies CBAs (Grades 6 12)

Course and Contact Information. Telephone: (408)

ENGLISH LANGUAGE ARTS IV Correlation to Common Core READING STANDARDS FOR LITERATURE KEY IDEAS AND DETAILS Student Text Practice Book

AP World History. Sample Student Responses and Scoring Commentary. Inside: R Long Essay Question 3. R Scoring Guideline.

All About Scoring. A Webinar from the GED Testing Service. September 22, 2015

Guide to the Calgary Subdivision and Development Appeal Board. Jointly created by

THE PREPARED CURRICULUM:

What Constitutes a Constitution?

Technology Teachers Safety & Responsibilities

Stimulus Text: Read this text and answer the question. Election of the President. The process of electing a President was set up in the United

GEORGIA INSTITUTE OF TECHNOLOGY Sam Nunn School of International Affairs. Ethics in International Affairs INTA 2030 Spring Dr.

O at the International OPPD Workshop on Technical options for capturing and reporting parliamentary proceedings

Literacy, Numeracy, Technological Problem Solving, and Health among U.S. Adults: PIAAC Analyses

Department of Political Science and International Relations. Writing Papers

Somruthai Soontayatron Department of Recreation and Tourism Management, Faculty of Sports Science Chulalongkorn University

Compare This. Diagnostic Assessment #1 For the Unit, We The People (Reading) Table of Contents

1. Students access, synthesize, and evaluate information to communicate and apply Social Studies knowledge to Time, Continuity, and Change

Chapter 2 Section II - Social Science Methods

Real-Time Wireless Control Networks for Cyber-Physical Systems

SIMPLE LINEAR REGRESSION OF CPS DATA

UNIT 4 The Executive: Dream Job or Nightmare?

Acculturation Strategies : The Case of the Muslim Minority in the United States

Introduction: Data & measurement

FACTORS INFLUENCING POLICE CORRUPTION IN LIBYA A Preliminary Study.

International Migration and Refugee Law Moot Court VU Amsterdam Migration Law Clinic 2019 RULES

PREPARED PUBLIC SPEAKING LEADERSHIP DEVELOPMENT EVENT

COMPETITIVE EVENTS AWARDS PROGRAM IOWA STATE CHAPTER RECOGNITION EVENTS

PREVIEW If men are not angels, what are they? 2. Why are governments necessary?

Red flags of institutionalised grand corruption in EU-regulated Polish public procurement 2

Lessons from the Issue Correlates of War (ICOW) Project

CONSTITUTION OF THE WISCONSIN LAW REVIEW

Guidelines for Minutes of Monthly Meeting for Business

VoteCastr methodology

Teacher s Science Talk and Preschoolers Engagement and Learning

Course and Contact Information. Telephone: (408)

The first of these contains the FAQs concerning the main document.

Student Text Student Practice Book Activities and Projects

Gender preference and age at arrival among Asian immigrant women to the US

The Wilson Moot Official Rules 2018

Minute Take: Tips & Tricks Minutes on the Fly

Research on the Impact of Electronic Commerce on Trade English Correspondence Writing Longyuan Xiao1, a

FY 2011 Performance Oversight Hearing

Grade 8: Sample Social Studies Extended Response Questions

WORKGROUP S CONSENSUS PROCESS AND GUIDING PRINCIPLES CONSENSUS

The Effect of Immigrant Student Concentration on Native Test Scores

A survey of 200 adults in the U.S. found that 76% regularly wear seatbelts while driving. True or false: 76% is a parameter.

Share of Children of Immigrants Ages Five to Seventeen, by State, Share of Children of Immigrants Ages Five to Seventeen, by State, 2008

Legislative Drafting for Democratic Social Change A Manual for Drafters

SMART English Junior Elementary

The IWSLT 2015 Evaluation Campaign

U. S. Government and Politics, AP

May 22 June 23, st 5-Week Session Sections 00-29

SECTION II PARLIAMENTARY PROCEDURE

Economic Growth, Foreign Investments and Economic Freedom: A Case of Transition Economy Kaja Lutsoja

SIERRA LEONE 2012 ELECTIONS PROJECT PRE-ANALYSIS PLAN: INDIVIDUAL LEVEL INTERVENTIONS

CHAPTER III BOARD OF DIRECTORS

Transcription:

Use of Automated Writing Evaluation (AWE) for placement tests: Can scores of AWE be criteria to place students into language courses? Zhi Li, Hyejin Yang, Stephanie Link, Volker Hegelheimer IOWA STATE UNIVERSITY October 5-6, 2012 University of Illinois, Urbana-Champaign 1

English Placement Test Ø To place international students into appropriate ESL writing classes Ø Practical needs Low cost options Immediate scoring Improved time management

Example: Time Management Fall 2012 @ ISU: 500+ essays to score 1 human rater: 5 min/essay -------------------------------------------------------------- How long will it take for raters to score all the essays? -------------------------------------------------------------- 18 raters: 28 essays = 2.3 hours Each essay rated 2 to 3 times TOTAL: 4.6 hours

What can the computer do?

What can the computer do? Human Rating 5 minutes Computer Rating 1 minute Essays per Rater 23 2.8 hours to rate 500 essays vs. 4.6 hours

Motivation and Purpose Low cost options Immediate scoring Improved time management To investigate whether the scores of Criterion can be utilized to help make placement decisions in an ESL program.

AWE Validation Studies Ø High level of correspondence e-rater IntelliMetric Intelligent Essay Assessor (IEA).73-.93(correlation) 87-97% (Exact agreement).50-.90 (correlation) 56-88% (Exact agreement).81-.83 (correlation) Attali & Burstein, 2005; Burstein, Chodorow, & Leacock, 2004 Elliot, 2003; Vantage Learning, 1998, 1999, 2000, 2001 Landauer, Laham, and Foltz (2003)

Computer Scoring for Placement Ø Concerns ACCUPLACER OnLine WritePlacer Plus by IntelliMetric Impersonal Distorts the nature of writing Discriminates according to: length grammar mechanics Weak correlations may be due to: lack of formal training/calibrating human evaluators Herrington and Moran, 2006 Jones, 2006 James, 2006

Computer Scoring for Placement Ø Validity ACCUPLACER OnLine WritePlacer Plus by IntelliMetric not that much worse.. than placement by readers (p. 126) Useful with spotchecking and retesting a valid tool for assessing writing samples and placing students in composition courses Herrington and Moran, 2006 Jones, 2006 James, 2006

Research questions Ø RQ1. What is the relationship between Criterion output and EPT decisions? Holistic scores Trait feedback Ø RQ2. To what extent can holistic scores of Criterion distinguish between different levels of ESL writing classes?

Participants Ø 135 international undergraduate students Ø Fall semester 2012 at ISU Disciplines Number of participants Engineering 48 LAS 37 Business 33 Design 10 Human Science 5 Agriculture 2

Setting Ø Paper-based English Placement Test (30 min.) Ø Topic: Modern convenience from Criterion Topic category - College level first year Topic mode - Persuasive Modern conveniences such as fast food, automatic teller machines, and labor-saving appliances promise to make life easier. Do these products and services actually make our lives more convenient or do they simply create new problems? Explain your position with reasons and examples from your own experience, observations, or reading.

Ø Number of raters Experienced: 9 New: 9 Rating Procedure Ø Rubric based on ACTFL Proficiency Guidelines General Description, Organization, Grammar & Vocab, Functional, Mechanics, and, Comprehensibility Ø Placement based on two raters agreement Third rating for controversial papers Inter-rater reliability: 62% exact agreement

EPT Scoring Criteria Advanced Mid (Pass) Advanced Low (101C/D) Intermediate High (101B) ü able to meet a range of work and/or academic writing needs. ü able to narrate and describe with detail in all major time frames ü cohesive devices in texts up to several paragraphs ü good control of the most frequently used target-language syntactic structures and a range of general vocabulary. ü able to meet basic work and/or academic writing needs. ü able to narrate and describe in major time frames ü a limited number of cohesive devices, ü some redundancy and awkward repetition ü some additional effort may be required in the reading of the text. ü able to write compositions and simple summaries related to work and/or school experiences ü inconsistent in the use of appropriate major time markers, resulting in a loss of clarity. ü correspond to those of the spoken language.

Curriculum Ø ESL Writing Curriculum Placement Decisions Engl 101B Engl 101C Pass/ Eng150

Materials Writing samples from EPT Stratified Random Sampling EPT Level Two-rater Samples Verbatim Transcription Three-rater Samples Word count (M) 101 B 30 15 259 101 C 30 15 260 Pass 30 15 302

Data Collection Ø Entering essays into Criterion Ø Data extraction Holistic scores Trait feedback (error numbers are normalized) Grammar (S-V agreement, fragment and etc.) Usage (wrong article, preposition errors and etc.) Mechanics (spelling, missing comma, and etc.) Style (repetition of words, short sentences, and etc.)

Criterion Scoring rubric 4 3 Slights some parts of the task Treats the topic simplistically or repetitively Is organized adequately, but you need more fully to support your position with discussion, reasons, or examples Shows that you can say what you mean, but you could use language more precisely or vigorously Demonstrates control in terms of grammar, usage, or sentence structure, but you may have some errors. Neglects or misinterprets important parts of the topic or task Lacks focus or is simplistic or confused in interpretation Is not organized or developed carefully from point to point Provides examples without explanation, or generalizations without completely supporting them Uses mostly simple sentences or language that does not serve your meaning Demonstrates errors in grammar, usage, or sentence structure

Data Analysis Ø RQ1: Criterion output vs. Human ratings à Descriptive Statistics à Correlation à Regression Ø RQ2: Criterion output differences b/w EPT levels à ANOVA

RQ1: Criterion output vs. Human ratings Distribution of Criterion scores over EPT levels B (N=43) C (N=45) Pass (N=44) 28 18 19 19 20 10 5 3 0 6 1 3 0 0 0 1 2 3 4 5 Criterion scores

RQ1: Criterion output vs. Human ratings Ø Correlation (Spearman rho) EPT levels (complete set N=132) EPT levels (two-rater N = 89) EPT levels (three-rater N=43) Criterion scores (N=132) Criterion scores 0.39** 0.47** 0.22 1 Ø GUMStyle Word count 0.25** 0.31** 0.11 0.69** Total errors -0.40** -0.48** -0.21-0.43** Grammar -0.25** -0.33** -0.07-0.36** Usage -0.21* -0.20-0.22-0.47** Mechanics -0.28** -0.30** -0.22-0.34** Style -0.30** -0.35** -0.20-0.26** ** significant at 0.05

RQ1: Criterion output vs. Human ratings Ø Regression analysis (Criterion scores) Model Beta t p-value Constant 7.963 0.000 Word count 0.604 11.933 0.000 Total errors 0.339 2.101 0.038 Grammar -0.287-4.914 0.000 Usage -0.341-6.330 0.000 Mechanics -0.229-3.117 0.002 Style -0.383-2.678 0.008 Dependent variable: Criterion scores R 2 is 0.727

RQ1: Criterion output vs. Human ratings Ø Regression analysis (EPT levels) Model Beta (standardized coefficient) t p-value Constant 6.039 0.000 Word count 0.117 1.372 0.173 Total errors 0.384 1.379 0.170 Grammar -0.219-2.099 0.038 Usage -0.186-1.988 0.049 Mechanics -0.305-2.489 0.014 Style -0.549-2.252 0.026 Dependent variable: EPT levels R 2 is 0.208

RQ2: Differences b/w EPT levels Ø Post-hoc Multiple comparison in One-way ANOVA (N=135) Model B-C C-Pass B-Pass Criterion Scores -0.139-0.580* -0.719* Word count -1.489-41.978* -43.467* Total errors 3.02* 1.9 4.92* Grammar 0.379 0.38 0.761* Usage -0.088 0.474* 0.562* Mechanics -0.013 1.193* 1.180* Style 2.740* 0.348 3.088* *. The mean difference is significant at 0.05 level

Discussion Ø RQ1 à relatively low correlation May be due to: Different grading rubrics Essay lengths Essay prompt level on Criterion (College 1 st year) Ø RQ2 à Distinguished Pass from 101B / C Can - because of: wide coverage of error categories Cannot - because of: style (repetition and spelling)

Implications & Future Studies Ø Potential use for distinguishing PASS from Non-Pass confirming placement through diagnostic test Ø Future studies on The effects of different essay topic categories and mode The predictive evidence of Criterion output Paper-based writing vs. computer-based writing

References Ø Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V. 2. The Journal of Technology, Learning and Assessment,4(3). Retrieved from http://www/jtla.org. Ø Fulcher, G. (1997). English Language Placement Test: Issues in reliability and validity. Language Testing, 14(2), 113 139. Ø James, C. L. (2006). Validating a computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11, 167 178. Ø Ware, P. D., & Warschauer, M. (2006). Electronic feedback and second language writing. In Hyland & F. Hyland (Eds.), Feedback in second language writing: Contexts and issues (pp. 105-122). Cambridge: Cambridge University Press.

Thank you! Questions and Comment? Acknowledgements: Yoo Ree Chung --------------------------------------------------------------- Zhi Li zhili@iastate.edu Hyejin Yang hjyang@iastate.edu Stephanie Link smcross@iastate.edu Volker Hegelheimer volkerh@iastate.edu Website: volkerh.public.iastate.edu/awe 28