Assessing speaking: an ‘inexact science’

What’s the most common collocation: do you speak/understand/read/write English?

That’s right, people tend to focus on speaking when asking this general question about language proficiency. And think of one of your classes and how students perceive one another’s level: you can bet your bottom dollar that, as well as by comparing exam results, it will be on how well they seem to speak. This blog post, the third in our series on language assessment literacy, will focus on assessing speaking.

Practical considerations

Despite the obvious importance of speaking, it’s often the skill which is least assessed. The Spanish university entrance exam (PAU / selectividad)  has no place for speaking, for example (or listening for that matter, with the honorable exceptions of Cataluña and Galicia). Both time and logistical issues are factors here: a speaking test is often administered individually or in pairs rather than to the whole class at the same time, which is the case with the other three skills. Therefore, speaking is often left off achievement tests in schools as well. However, we are doing our learners a disservice if we do not assess them on the skill which is arguably most closely associated with being proficient in a language.

What makes a good speaker at a particular level?

This is a central question to answer if we want to come up with a speaking assessment and there are numerous factors we might take into account. Grammar and vocabulary, the ‘enabling skills’ or ‘building blocks’ are necessary to get a message across and we can assess both the range and accuracy that learners possess in these areas. Clear pronunciation is key to being intelligible, and fluency (which among other factors can have to do with speed and hesitation) is important for effective delivery. Interactive skills are taken into account (can a candidate initiate discourse, take turns or move the discussion along?) as is appropriacy – following rules and conventions, adapting your language to different contexts and using the right register. We’ll also need to apply a relative weighting to all these factors (eg. what’s more important: accuracy or fluency?).

Marking concerns

Once we’ve decided on the criteria and created a rubric for what constitutes adequate performance at a particular level, we’ll need standardisation training for assessors to reduce differences in applying this criteria: this can be both intra-rater reliability (is an assessor treating all candidates the same?) and inter-rater reliability (are different assessors applying the criteria in the same way?).  Given everything we need to take into account and the potential pitfalls, it’s unsurprising that Brown and Abeywickrama have referred to assessing speaking as an ‘inexact science’. One way of making it more reliable in the high stakes arena is to record the exam and have it sent off to be marked as this reduces the chance for bias and allows for a second listen. Recording students in class also makes sense: it’s a great way to demonstrate progress.

What makes a good test?

Let’s consider further what we mean by speaking. When we open our mouth to say something there’s (usually!) a reason for it – it might be to explain or to complain, to ask for or offer help, to agree, disagree or suggest. So another way to look at testing someone’s speaking ability is to test their ability to perform these real-life language functions involved in communication – after all, we’re after a communicative test.

Task types

A number of task types will be very familiar to most teachers. Picture-cued tasks can be used to have learners describe, compare or hypothesise and such tasks can be found in both the PTE General suite of exams as well as the Cambridge main suite. At lower levels we may often find an information gap activity, where two students (or a student and an examiner) are in possession of different pieces of information and must complete their notes – by asking questions and giving explanations. Oral interviews with one or two candidates will often include personal information questions (Where are you from? Why are you learning English) to get a general sense of a student’s discourse competence and are also often used as a follow-up to another task (for example, development questions on the theme of a picture-cue task).

Other task types which teachers often use in class but are less common in exams include role plays and discussions. Role plays are popular in communicative language teaching as they test whether a student can accomplish a real-life task (such as getting a refund in a shop or arranging to meet up). Although the interaction can be guided by the interviewer, the task allows for creativity on the part of the student in accomplishing the task. Discussions provide the chance for real authenticity and spontaneity in a speaking assessment – we can also specify that a candidate takes an opposing view to the interlocutor and has to argue their case – a real life communicative skill. Both these tasks are present on the PTE General exam.

Paired or individual format?

Both test formats are common, with test providers like Pearson and Trinity opting for an individual interview and Cambridge using a paired format. Proponents of the paired format may point to interaction between ‘equals’ or more relaxed candidates, though the dominance, hesitance or level of a partner can also impact how relaxed a candidate feels. Reliability in the paired format can be a factor, with Isaacs noting interlocutor variables (i.e. test-taker characteristics) on the paired interaction task ‘could affect the quality of the interaction that is generated and, consequently, test takers’ individual and collective performance and scoring outcomes.’ Both test formats have advantages and disadvantages.

Conclusion- teaching to the test and washback

Many teachers involved in preparing students for language proficiency exams complain of having to teach to the test – perhaps spending time on developing strategies and tricks at the expense of teaching language. This is an example of negative washback: test preparation having a negative effect on teaching and learning. Of course, a test should replicate the work done in class and vice versa, so the more real-life, communicative tasks we can get in our test, the better, as this will encourage good practice in the classroom – positive washback. This is something to think about when making a decision about the type of speaking test or test tasks we want our learners to do.  

To read the first post in our series on assesment literacy, click here

To read the second post in our series on assessment literacy, click here


Brown and Abeywickrama (2010). Language Assessment: Principles and Classroom Practices (2nd Edition). Pearson Longman

Isaacs (2016) Handbook of second language assessment, chapter 9. Publisher: De Gruyter Mouton


Leave a Reply