Sunday, July 19, 2020

How does the TOEFL e-rater Work

How does the TOEFL e-rater Work How does the TOEFL e-rater Work?A General Introduction to the e-rater and Teaching TOEFL WritingBoth TOEFL essays (independent and integrated) get a score from the e-rater and onehuman rater. These scores are averaged out to produce a final score out of thirty points. It is important to remember that the e-rater and the human raterusually produce the same scorewhen evaluating essays so the averaging out is somewhat irrelevant.So where does the score come from? And how can we use knowledge of the e-rater to help students pass the TOEFL?In this article I want to describe:The main categories the e-rater uses to score essays and how much of the writing score comes from each categoryThe smaller sub-categories these are sometimes broken down intoThe sub-categories that most affect students on test dayHow this information can be used to teach students betterSome problems are worth mentioning here:Most of the published information about the e-rater is about the independent TOEFL essay. This article mostly refers to that essay type.My information is probably out of date. The e-rater is adjusted every year, so all articles inevitably describe old versions. It seems that changes are minor, though.Note that at the end of the article there is a video version of everything.Main e-rater Categories (Macrofeatures)The e-rater gives students scores in specific categories, called macrofeatures. Each of these hasa different weight when it comes to scoring. In 2010, the macrofeatureswere listed as (and the number of points outof 30 the weight works out to):Organization (32%, 9.6 points)Development (29%, 8.7 points)Mechanics (10%, 3 points)Usage (8%, 2.4 points)Grammar (7%, 2.1 points)Lexical Complexity word length (7%, 2.1 points)Lexical Complexity less frequent words (7%, 2.1 points)Style (3%, 1 points)Source. Yes, the above works out to 101%. Blame the source.Problem: These figures are from 2010. Since then two more categories have been introduced. They are positive features an d topic-specific vocabulary (source). My guess would be that each is weighted atabout 3% and that the two lexical complexity categories have been deweighted,but that isnt based on any documentedevidence.Defining MacrofeaturesIt is best to consider the macrofeatures as either technical stuff (specifically: grammar, usage, mechanics, style) and content stuff (specifically: organization, development, lexical complexity, positive features, topic-specific vocabulary).We can look at them one at a time.Development (29% of score)This is specifically defined by ETSas background, thesis, main ideas, supporting ideas, and conclusion (source). All of these need to be presented using a series of paragraphs. I teach my students to use afour paragraph modelto achieve the desired organization in the independent essay. The background (which I call a hook) and thesis are contained in the introductory paragraph. Each of two body paragraphs contains a main idea (which I call a topic sentence) and suppo rting ideas (which I call elaboration sentences and personal examples). The model ends with a short conclusion. I also teach the use of certain phrases (templates) that ensure the e-rater knows these features are being included.For the integrated task,my modelhas students indicate the background (topic) of the sources in the first line. As a thesis it indicated the relationship between the two sources (always casting doubt). As main and supporting ideas it presents the specific ways in which the lecture challenges the reading.Problems: Old articles about the e-rater specifically state that a five paragraph structure is necessary (that is, three main arguments). As my students have gotten perfect scores with a four paragraph structure, I feel this specific requirementsomewhat out of date.Organization (32% of score)According to ETS, for the organization feature, e-rater computes the average length of the discourse elements (in words) in an essay (source).Obviously, then,a longer essay is better. However, writing a longer essay can cause students to make more mistakes and reduce their scores in the categories of grammar, usage, mechanics and style. It is important to find a sweet spot so that essay lengths match the abilities of specific students. Igenerally recommendabout 400 words for the independent task and 300 words for the integrated task.It isnt supported by and published articles, but I also believe that this feature requires long body paragraphs, a shorter introduction and an even shorter conclusion. Otherwise, the category would bemerelythe total overall word count.Lexical Complexity Less Frequent Words (perhaps 4% of score)This assesses the level of the words used in the essay based on their frequency ina large corpus of text (source) Obviously, then, less-frequently used words are considered a more advanced, and therefore higher-scoring. A chart of word frequency can befound online. To meet this requirement I generally encourage students to use more advanced vocabulary (within reason). More specifically, I encourage students toavoid using very common adjectiveslike good or big. These can easily be replaced with something more infrequent.Lexical Complexity Word Length (perhaps 4% of score)Students are rewarded for using longer words. This is fairly straightforward.Positive Features (Collocations and Prepositions) (perhaps 3% of score)First of all, students are rewarded for collocation use. The e-rater identifies thenumber of good collocations [divided by] the total number of words (source). So, more collocations equals a higher score. I guess there is a limit, however.A giant list ofpossible collocations can be found here. A more learner-friendly listcan be found here.Secondly, students are rewarded for preposition use. This is described as The mean probability of the writer’s prepositions (source) but I am not quite sure what this means.Topic-Specific Vocabulary (perhaps 3% of score)This one is new. The vocabulary in the stu dents essay is compared to vocabulary used in high-scoring essays based on the same prompt (source). Obviously it is hard to prepare students for this, but if they are aware that it is a factor, they can be encouraged to use advanced words that are more closely related to the general theme of the given prompt. For example, if the prompt is related to university life I encourage them to use some advanced words related specifically to attending university.Grammar (7% of score)The grammar macrofeature is broken down into nine microfeatures. All of these features are weighted equally to produce the grammar score. Microfeaturepenalties are determined by dividing the number of related errors by the total number of words in the entire essay. The microfeatures are listed below. The number in parenthesis is the percentage of students who received NO PENALTY during a study of ~95,000 TOEFL independent essays graded by the e-rater. A lower number here indicates a potential area of concern for students studying for the TOEFL. The source of all of these, and the res of themacrofeatureson this page, isthis article. Further descriptions of each microfeaturecan be found inthis article.Sentence Fragments (79.3)Run-on Sentences (73.2)Garbled Sentence five or more errors (89.3)Subject-verb agreement (48.8)Ill-formed verb the wrong verb tense for the given situation (61.3)Pronoun error (97)Possessive error missing apostrophe (85.6)Wrong or missing word (95.4)Proofread this! errors that cannot be analyzed (76.6)The biggest area of concern verbs should come as no surprise to teachers.Usage (8% of score)Usage works the same way. There are nine microfeaturesthat have equal weight. The number in parenthesis is the percentageof students who receivedno penalty during the study period.Determiner noun agreement singular determiner with a plural noun and vice versa. Also a/an errors (63)Article errors wrong, missing and extraneous (9.5)Homophone errors (59.9)Verbs used as nouns (94.1)Fau lty comparisons errors with more and most (96.1)Preposition errors missing, incorrect and extraneous (61.8)Nonstandard word usage gonna, kinda, wanna (99.3)Double negatives (99.6)Wrong parts of speech (97.7)It is no surprise that article errors (including determiner noun agreement) are a problem for students. Likewise it is no surprise that prepositions are a problem area. I dont actually believe that homophone errors are such a problem, but that is what the study reported. Perhaps this is a weakness of the e-rater.Mechanics (10% of score)This works the same as the above categories. The microfeatures are:Spelling errors (2.6%)Capitalization of proper nouns (79.7)Capitalization of first word in a sentence (77.7)Missing question marks (95.8)Missing periods (86)Missing apostrophes (96.4)Missing commas (40.5)Missing hyphens including in number constructions (96.1)Fused words missing space between words (97.7)Compound word errors two words that should be one (63)Duplicates accidentally r epeating words in a row (91.5)Extraneous comma (69)Again, the problem areas for students are not too shocking. Spelling mistakes are a problem for almost everyone. Comma (both types of errors) are also hard. It is somewhat surprising that students are throwing away a lot of points on things like capitalization errors and missing periods. At first glance one would think that ETS is being generous by assigning ten percent of the total score to mechanics but it isnt such a giveaway afterall. I strongly encourage my students to proofread their essays for a minute or two.Style (3% of score)Repetition of words (22.3)Inappropriate words including expletives (99.8)Too many sentences beginning with a coordinate conjunction too many is not defined (96.2)Too many short sentences more than four sentences with fewer than 7 words (94.5)Too many long sentences more than four sentences with more than 55 words (88.4)The use of by passives defined as: sentences containing BE + past participle verb fo rm, followed somewhere later in the sentence by the word by (82.4)The fact that repetition of words is the biggest problem really proves that vocabulary is critically important to a good TOEFL score. The value here might amount to .5 points (out of 30), while the lexical complexity categories make up another 2.4 points. Topic specific vocabulary probably comes out to another 1.1 points. These categories can only be satisfied by using a wide range of words. Teach your students to vary their vocabulary as much as possible.Does this mean anything?Maybe. Heres a few things that inform my teaching of TOEFL writing:An essay with perfect organization and development can score 18 points, even if its grammar is abysmal. Indeed, I very rarely see score reports with a writing score of less than 17points. Even the absolute lowest-levelstudents can score that much. This is important to keep in mind when students have overall target scores (all sections) in the 70s or 80s.The various features rel ated to vocabulary come out to about four or five points. As I said above, I always emphasize range of vocabulary when teaching TOEFL writing.I teach my students to proofread. They can make up for mistakes related to more difficult aspects of writing by fixing up easy punctuation and spelling errors.It is possible now to knowwhich kinds of grammar mistakes students usually make on the test, although none of these should be surprising.I always emphasize to my students that all mistakes are equal. Those sloppy punctuation errors they make hurt just as much as the perplexing verb errors.A longer essay can result in a higher score by diluting the mistakes, but it could obviously lead to more mistakes. It is necessary to work with individual students to discover the best length for them.Teachers have long-held that the e-rater rewards the use oftransitional adverbs. I dont know where they fit into the above categories, but I will continue to emphasize their use.Beating the e-raterIs it p ossible to beat the e-rater? Yes, of course. This was discussed in theNew York Timessome years ago. However, anything off-topic will be flagged by the human rater, so the techniques described by the researcher wontallwork. Moreover, it is likely that only a student with advanced English can actually beat the e-rater. As the article says:E.T.S. officials say that [the researchers] test prep advice is too complex for most students to absorb; if they can, they’re using the higher level of thinking the test seeks to reward anyway. In other words, if they’re smart enough to master such sophisticated test prep, they deserve a [high score].Video Version

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.