ASSESSMENT TOOLS FOR MEASURING ENGLISH  LANGUAGE SKILLS IN SECONDARY SCHOOLS

Nozima Abduxoliqova Qosimjon qizi

doi:10.71337/inlibrary.uz.pedagogs.133003

Mualliflar

Nozima Abduxoliqova Qosimjon qizi

DOI:

https://doi.org/10.71337/inlibrary.uz.pedagogs.133003

Kalit so‘zlar:

Keywords: English language assessment secondary schools language testing validity and reliability formative assessment washback listening assessment reading assessment vocabulary testing speaking assessment writing assessment

Annotasiya

Abstract This article explores assessment tools for measuring English language skills in secondary schools, emphasizing their purposes, theoretical foundations, and practical applications. It discusses the historical evolution of assessment from structuralist and discrete-point testing to communicative and performance-based approaches. Key considerations of validity and reliability are highlighted, with a focus on ensuring fair and accurate interpretations of test scores. The article examines assessment methods for listening, reading, vocabulary, speaking, and writing, using established research to explain appropriate task types and scoring approaches. Additionally, the integration of assessment with instruction is discussed, highlighting the benefits of formative and diagnostic approaches, positive washback, and alternative assessment methods such as portfolios. Recommendations include adopting balanced frameworks, ensuring construct validity, providing rater training, integrating assessment with instruction, and using alternative tools to support learner growth.

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

30

ASSESSMENT TOOLS FOR MEASURING ENGLISH

LANGUAGE SKILLS IN SECONDARY SCHOOLS

Nozima Abduxoliqova Qosimjon qizi

Webster University in Tashkent

Abstract

This article explores assessment tools for measuring English language

skills in secondary schools, emphasizing their purposes, theoretical foundations, and
practical applications. It discusses the historical evolution of assessment from
structuralist and discrete-point testing to communicative and performance-based
approaches. Key considerations of validity and reliability are highlighted, with a focus
on ensuring fair and accurate interpretations of test scores. The article examines
assessment methods for listening, reading, vocabulary, speaking, and writing, using
established research to explain appropriate task types and scoring approaches.
Additionally, the integration of assessment with instruction is discussed, highlighting
the benefits of formative and diagnostic approaches, positive washback, and alternative
assessment methods such as portfolios. Recommendations include adopting balanced
frameworks, ensuring construct validity, providing rater training, integrating
assessment with instruction, and using alternative tools to support learner growth.

Keywords:

English language assessment; secondary schools; language testing;

validity and reliability; formative assessment; washback; listening assessment; reading
assessment; vocabulary testing; speaking assessment; writing assessment

Introduction Assessment plays a central role in English language education

because it provides evidence about learners’ abilities and progress while guiding
teachers’ instructional decisions. In secondary schools, where students are expected to
achieve proficiency for both academic and real-world communication, assessment is
not simply a process of giving tests but a systematic approach to evaluating
performance and informing learning. Modern educational practice differentiates
between

testing

and

assessment

. Testing is typically summative, conducted at fixed

times using standardized instruments to measure achievement against predefined
criteria. Assessment, however, is broader and includes formative approaches that focus
on continuous feedback and instructional improvement (Tsagari & Banerjee, 2014).
Historically, assessment in language education has evolved from early forms of
examinations focused on discrete grammar and vocabulary points to communicative
and performance-based assessments designed to reflect real language use. The
purposes of assessment have also diversified. They now include achievement testing,
which measures learning outcomes of a particular curriculum; proficiency testing,
which evaluates general language competence independent of specific courses;

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

31

aptitude testing, which predicts language learning potential; and diagnostic assessment,
which identifies learner strengths and weaknesses (Hamp-Lyons, 2016). These shifts
demonstrate that assessment is not only about ranking students but also about
supporting learning and ensuring fairness in decision-making.In the context of
secondary school English language education, assessment must balance different
objectives. It must be valid, meaning that it accurately measures what it intends to
measure, and reliable, ensuring consistent results across contexts and raters (Chapelle,
2021). Additionally, assessment must have positive educational impact or

washback

so

that testing encourages effective teaching and learning rather than restricting
instruction to test preparation (Tsagari & Banerjee, 2014). For English, where skills in
listening, reading, writing, speaking, and vocabulary are interdependent, assessment
tools need to be both skill-specific and integrated, providing a comprehensive picture
of learner ability.

Pedagogical implications

Validity and reliability are fundamental concepts in

language assessment because they determine whether test results can be interpreted
accurately and used for fair decision-making. Validity addresses the question of
whether an assessment measures what it claims to measure, while reliability concerns
the consistency and stability of test scores across different administrations, raters, or
tasks.Modern perspectives view validity as a unified concept. Messick’s framework,
widely adopted in educational measurement, defines validity as “an overall evaluative
judgment of the degree to which empirical evidence and theoretical rationales support
the adequacy and appropriateness of interpretations and actions based on test scores”
(as cited in Chapelle, 2021, p. 12). This perspective moves beyond older models that
separated content, criterion, and construct validity into distinct types, instead
emphasizing the need for a holistic evaluation of score interpretations and test
uses.Construct validity is central to language assessment because it ensures that tests
reflect the complex nature of language proficiency. For example, a speaking test must
elicit authentic oral interaction and be scored using criteria that reflect not only
pronunciation and grammar but also pragmatic and strategic competence (O’Sullivan,
2012). Similarly, a writing test must capture organizational, rhetorical, and linguistic
elements rather than simply checking for grammar errors (Weigle, 2012).Reliability is
equally important. Inconsistent scoring or task design undermines the fairness of
assessment decisions. Brown (2004) notes that “multiple measures will always give
you a more reliable and valid assessment than a single measure” (p. 117), emphasizing
that assessments should be triangulated through multiple tasks and contexts to
minimize the effects of temporary performance factors such as anxiety or fatigue.
Moreover, validity and reliability considerations have ethical implications. Chapelle
(2021) highlights that “validation needs to take into account issues of relevance and
utility, value implications, and the social consequences of testing” (p. 13). In secondary

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

32

schools, this means that assessments should not create unnecessary barriers for learners
and should be aligned with educational goals, supporting both achievement
measurement and language learning.

Listening

is a receptive skill that is challenging to assess because its processes

are internal and not directly observable. As Brown (2004) explains, “you cannot
observe the actual act of listening or reading, nor can you see or hear an ‘actual
product’… all assessment of receptive performance must be made by inference” (p.
118). This means that listening assessments often rely on spoken or written responses
to stimuli, such as multiple-choice items, note-taking tasks, or summarization
exercises, to infer comprehension ability.Historically, listening assessment evolved
through three main approaches: discrete-point, integrative, and communicative. The
discrete-point approach, influenced by structuralist linguistics and behaviorist
psychology, emphasized isolated elements of language such as phoneme recognition
or minimal pair discrimination. Buck (2001) notes that “the basic idea of the discrete-
point approach is that it is possible to identify and isolate the separate bits, or elements,
of language – the units of linguistic knowledge – and test each one of these separately”
(p. 61).Later, integrative approaches emerged, using tasks like cloze tests and dictation
to assess comprehension of connected speech. These were intended to capture overall
processing ability rather than individual linguistic points. However, they still did not
fully replicate real-world listening contexts. Communicative approaches, by contrast,
attempt to assess listening as it is used in authentic communication, incorporating
realistic tasks such as information gap activities, classroom instructions, and
interaction-based comprehension checks (Buck, 2001).An important consideration is
that listening tasks must be aligned with learners’ expected language use. For example,
secondary school learners often need to comprehend classroom instructions, short
lectures, and conversational exchanges. Brown (2004) emphasizes that listening
assessments should be “embedded within classroom activities and linked to realistic
language tasks” (p. 119).Task design also affects fairness and reliability. Since
listening performance can be influenced by anxiety, unfamiliar accents, or audio
quality, multiple tasks and repeated measures are recommended. Buck (2009) points
out the “challenges and constraints in language test development,” noting that “context,
delivery, and task design can have profound effects on test performance” (p. 170).
Therefore, assessment tools for secondary schools should balance authenticity with
practicality, ensuring that tests are accessible and yield interpretable results.

Reading

is one of the most important academic skills for secondary school

learners because it enables access to knowledge in all subject areas. As Brown (2004)
notes, “reading, arguably the most essential skill for success in all educational contexts,
remains a skill of paramount importance as we create assessments of general language
ability” (p. 185).Similar to listening, reading is an internal process and cannot be

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

33

observed directly. Therefore, assessment relies on observable responses, such as
answering comprehension questions, summarizing texts, or performing information-
transfer tasks (Brown, 2004). The complexity of reading lies in its multiple
components: decoding, vocabulary knowledge, syntactic processing, and
comprehension strategies. Assessments must therefore consider

both

bottom-up

processes (recognition of words and structures) and

top-down

processes (use of background knowledge and inferencing).Different genres and
purposes also influence reading assessment. Brown (2004) categorizes reading into

academic

,

job

-

related

, and

personal

genres, emphasizing that “the genre of a text

enables readers to apply certain schemata that will assist them in extracting appropriate
meaning” (p. 187). For example, reading comprehension in a science class might
involve interpreting charts and procedural texts, whereas personal reading may focus
on narrative understanding.Effective reading assessments often focus on both

microskills

and

macroskills

. Microskills include recognizing word forms,

understanding grammatical relationships, and interpreting cohesive devices, while
macroskills involve identifying main ideas, distinguishing literal and implied
meanings, and applying strategies like skimming and scanning (Brown, 2004, pp. 188–
189). These skills can be tested through varied tasks such as multiple-choice questions,
short-answer tasks, and authentic reading projects.Reading assessment also intersects
with vocabulary knowledge. Without sufficient vocabulary, comprehension is
impaired. As Read (2012) states, “vocabulary knowledge is a core component of
competence in a second language… and conscious study and memorizing of words is
an indispensable means of building the vocabulary knowledge learners need” (p. 258).
For this reason, reading tests often incorporate vocabulary-focused tasks or require
inferencing word meaning from context.In secondary schools, reading assessment
should therefore move beyond simple recall questions and include tasks that assess
strategic and inferential processing. Such assessments help teachers identify whether
learners struggle due to limited vocabulary, unfamiliar text structures, or inadequate
reading strategies, enabling more targeted instruction.

Vocabulary

knowledge is essential for all language skills because it directly

influences listening comprehension, reading fluency, writing complexity, and speaking
accuracy. Read (2012) notes that “vocabulary knowledge is a core component of
competence in a second language” (p. 258), highlighting its centrality in both academic
and communicative contexts. For secondary school learners, an adequate vocabulary
base is particularly important because it supports both subject learning and everyday
communication.Assessing vocabulary involves complex considerations about what
counts as a “word” and how knowledge should be measured. Traditional tests often
focus on isolated word meanings, but modern approaches view vocabulary as including

word families, collocations, multiword units, and idiomatic expressions

(Read, 2012).

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

34

As Read explains, “it is important to keep in mind this wider perspective on vocabulary
as comprising multiword units as well as individual word forms” (p. 257).Vocabulary
assessment can measure both

breadth

(how many words a learner knows) and

depth

(how well the learner knows those words, including connotations, collocations, and
register). Breadth is often tested using quick recognition tasks or multiple-choice items,
while depth may be assessed through productive tasks, such as providing definitions,
using words in sentences, or identifying collocations.The importance of vocabulary
assessment has shifted over time. During the height of the communicative approach in
the 1980s, explicit vocabulary testing was often dismissed because “it was argued that
a test of learners’ knowledge of individual words was not very informative because
what really counted was the ability to process words rapidly as an integral part of
carrying out authentic comprehension or production activities” (Read, 2012, p. 258).
However, there has been a “decisive comeback” of vocabulary testing since the 1990s
because students and teachers recognize that “conscious study and memorizing of
words is an indispensable means of building the vocabulary knowledge they need”
(Read, 2012, p. 259).

In secondary schools, vocabulary assessment should be integrated into reading

and writing tasks but can also include standalone tests for diagnostic purposes. For
example, teachers may use vocabulary size tests to identify whether a learner knows
high-frequency word families essential for academic success. Similarly, assessing
learners’ ability to use collocations and idiomatic expressions can provide insight into
their readiness for advanced language tasks, such as essay writing or oral presentations.

Speaking

assessment is widely considered one of the most challenging aspects of

language testing because oral communication involves real-time processing,
interaction, and performance factors that can affect reliability. O’Sullivan (2012) notes
that “tests of spoken language ability are the most difficult to develop and administer”
due to issues related to task design, interlocutor effects, and scoring consistency (p.
234).A key challenge is defining the construct of speaking. Spoken competence
involves not only pronunciation, vocabulary, and grammar but also pragmatic abilities
and interactional strategies. As O’Sullivan explains, “the rating scale should be
explicitly linked to what we are trying to say about the test taker – it should link our
definition of the construct and the test task” (2012, p. 236). Therefore, speaking
assessment must specify whether it focuses on transactional communication (e.g.,
giving instructions), interactional ability (e.g., holding a conversation), or academic
discourse (e.g., presenting an argument).

Speaking tests often use one of three formats:
1.

One-to-one interviews

– traditional and widely used in schools.

2.

Interactive pair/group tasks

– focusing on collaborative communication.

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

35

3.

Integrated speaking tasks

– combining listening and speaking (e.g.,

summarizing audio input).

Task design significantly affects performance. O’Sullivan (2012) highlights

research showing that planning time, topic familiarity, and interlocutor characteristics
can influence how students perform: “if we add [planning time], performance
improves; remove it or reduce it, and performance worsens” (p. 235). This means that
reliability can be improved by carefully standardizing task conditions and training
examiners to reduce variability in scoring.Scoring speaking performance is often done
using analytic rubrics that separately evaluate fluency, pronunciation, grammar,
vocabulary, and interactional competence. Rater training is critical because, as
O’Sullivan (2012) points out, “the rater will (we hope) be mostly influenced by the test
taker’s performance… however, he/she may also be influenced by his/her affective
reaction to the task, to the rating scale, and to the test taker” (p. 235).In secondary
schools, speaking assessment should not only measure accuracy but also encourage
meaningful use of English. Tasks such as role plays, interviews, debates, and
presentations are effective because they replicate real-world communication.
Moreover, speaking tests should be used for both formative and summative purposes,
providing feedback that helps students improve their interactive competence while also
producing reliable scores for achievement reporting.

Writing

is a productive skill that reveals a learner’s ability to organize ideas, use

appropriate vocabulary and grammar, and adapt language to specific purposes and
audiences. Hughes (2002) argues that “the best way to test people’s writing ability is
to get them to write… even professional testing institutions are unable to construct
indirect tests that measure writing ability accurately” (p. 83). This supports the
preference for

direct assessment

, in which students produce texts under controlled

conditions.Defining the construct of writing ability is a key step in assessment. Weigle
(2012) emphasizes that writing assessment must address whether the focus is on
language accuracy or on broader rhetorical and organizational skills. She notes that
“language proficiency and writing ability are highly interrelated and often
inseparable… but frequently a student shows a fluent command of the second
language… without being able to organize their writing or address a writing task
adequately” (p. 218). This means that assessment tasks must reflect both micro-level
(grammar and vocabulary) and macro-level (content development and coherence)
abilities.Writing tasks vary in complexity and format depending on their purpose.
Classroom-based assessments often include

short responses, summaries, and essays

,

whereas high-stakes tests may include

academic writing tasks

such as argumentative

or analytical essays. Hughes (2002) stresses that test tasks should be “properly
representative of the population of tasks that we should expect the students to be able
to perform” (p. 83). For example, secondary school writing assessment should reflect

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

36

typical academic needs, including writing reports, narratives, and responses to
literature.Scoring writing tasks is often the most challenging aspect. Holistic scoring
provides a single score based on overall quality, while analytic scoring separates
aspects such as grammar, vocabulary, organization, and content. Weigle (2012)
highlights that scoring reliability requires clear rubrics and rater training: “Even very
experienced teachers often have questions about assessing their students’ writing…
these questions include… how accurately a writing test really represents how well my
students can write” (p. 218).Importantly, writing assessment should be authentic and
constructive. Timed writing tests, though common, may not fully reflect students’
ability because “most real-world writing is not done under timed conditions” (Weigle,
2012, p. 220). Therefore, secondary schools are encouraged to include

process-based

writing assessment

, such as portfolios, which capture students’ development over time

and reduce the pressure of single high-stakes performances.

Conclusion and Recommendations Assessment of English language skills in

secondary schools is a complex but essential process that supports both teaching and
learning. Modern perspectives emphasize that assessment should be

valid

,

reliable

,

and

educationally beneficial

. As Chapelle (2021) highlights, “validation seeks evidence for

the construct meaning of the test score… and must also take into account issues of
relevance and utility, value implications, and the social consequences of testing” (p.
13). This holistic view ensures that assessment tools not only measure language
proficiency accurately but also promote fairness and positive educational impact.Each
language skill requires specific assessment approaches: listening tests must infer
comprehension through observable responses (Brown, 2004), reading tests must
integrate micro- and macro-skills along with vocabulary knowledge (Read, 2012),
vocabulary tests must measure both breadth and depth of knowledge (Read, 2012),
speaking tests must handle interlocutor and rater variability (O’Sullivan, 2012), and
writing tests must capture both linguistic accuracy and rhetorical organization (Weigle,
2012; Hughes, 2002). These assessments need to be supported by

clear scoring rubrics

and

rater training

to ensure reliability and fairness.Importantly, assessment must not

exist in isolation from instruction. As Tsagari and Banerjee (2014) emphasize,
integrating assessment with classroom teaching “enhanc[es] student involvement,
incorporat[es] special language and other needs, and improv[es] teacher literacy in
assessment, as ways of improving good practice in the field” (p. 340). Approaches such
as formative assessment, diagnostic feedback, and alternative assessment tools (e.g.,
portfolios and project-based tasks) encourage positive washback and learner
motivation (Hamp-Lyons, 2016).

“PEDAGOGS”

international research journal ISSN:

2181-3027

_SJIF:

5.449

https://scientific-jl.com/ped

Volume-87, Issue-1, August -2025

37

Recommendations:

1.

Adopt a balanced assessment framework combining summative and formative

tools.
2.

Ensure construct validity by designing tasks that reflect real-world language use.

3.

Provide rater training and clear rubrics

to improve scoring reliability, especially

for speaking and writing tasks.
4.

Integrate assessment with instruction

to

promote continuous learning and

positive washback.
5.

Use alternative assessment methods (e.g., portfolios, peer feedback) to capture

long-term development and encourage learner autonomy.
By applying these principles, secondary schools can create assessment systems that not
only measure English proficiency effectively but also support student growth, teacher
development, and curriculum goals.
References

Brown, H. D. (2004).

Language assessment: Principles and classroom practices

.

Longman.
Buck,

G.

(2001).

Assessing

listening

.

Cambridge

University

Press.

https://doi.org/10.1017/CBO9780511732959
Buck, G. (2009). Challenges and constraints in language test development. In J. C.
Alderson (Ed.),

The politics of language education: Individuals and institutions

(pp.

166–184). Multilingual Matters.
Chapelle, C. A. (2021). Validity in language assessments. In S. M. Gass & A. Mackey
(Eds.),

The Routledge handbook of second language acquisition and language testing

(pp. 11–20). Routledge.
Hamp-Lyons, L. (2016). Purposes of assessment. In D. Tsagari & J. Banerjee (Eds.),

Handbook of second language assessment

(pp. 13–27). De Gruyter, Inc.

Hughes, A. (2002). Testing writing. In

Testing for language teachers

(pp. 83–87).

Cambridge University Press.
O’Sullivan, B. (2012). Assessing speaking. In C. Coombe, P. Davidson, B. O’Sullivan,
& S. Stoynoff (Eds.),

The Cambridge guide to second language assessment

(pp. 234–

246). Cambridge University Press.
Read, J. (2012). Assessing vocabulary. In C. Coombe, P. Davidson, B. O’Sullivan, &
S. Stoynoff (Eds.),

The Cambridge guide to second language assessment

(pp. 257–

263). Cambridge University Press.
Tsagari, D., & Banerjee, J. (2014). Language assessment in the educational context. In
M. Bigelow & J. Ennser-Kananen (Eds.),

The Routledge handbook of educational

linguistics

(pp. 339–352). Taylor & Francis.

Weigle, S. C. (2012). Assessing writing. In C. Coombe, P. Davidson, B. O’Sullivan, &
S. Stoynoff (Eds.),

The Cambridge guide to second language assessment

(pp. 218–

224). Cambridge University Press.

Bibliografik manbalar

Adopt a balanced assessment framework combining summative and formative tools.

Ensure construct validity by designing tasks that reflect real-world language use.

Provide rater training and clear rubrics to improve scoring reliability, especially for speaking and writing tasks.

Integrate assessment with instruction to promote continuous learning and positive washback.

Use alternative assessment methods (e.g., portfolios, peer feedback) to capture long-term development and encourage learner autonomy.

By applying these principles, secondary schools can create assessment systems that not only measure English proficiency effectively but also support student growth, teacher development, and curriculum goals.

References

Brown, H. D. (2004). Language assessment: Principles and classroom practices. Longman.

Buck, G. (2001). Assessing listening. Cambridge University Press. https://doi.org/10.1017/CBO9780511732959

Buck, G. (2009). Challenges and constraints in language test development. In J. C. Alderson (Ed.), The politics of language education: Individuals and institutions (pp. 166–184). Multilingual Matters.

Chapelle, C. A. (2021). Validity in language assessments. In S. M. Gass & A. Mackey (Eds.), The Routledge handbook of second language acquisition and language testing (pp. 11–20). Routledge.

Hamp-Lyons, L. (2016). Purposes of assessment. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment (pp. 13–27). De Gruyter, Inc.

Hughes, A. (2002). Testing writing. In Testing for language teachers (pp. 83–87). Cambridge University Press.

O’Sullivan, B. (2012). Assessing speaking. In C. Coombe, P. Davidson, B. O’Sullivan, & S. Stoynoff (Eds.), The Cambridge guide to second language assessment (pp. 234–246). Cambridge University Press.

Read, J. (2012). Assessing vocabulary. In C. Coombe, P. Davidson, B. O’Sullivan, & S. Stoynoff (Eds.), The Cambridge guide to second language assessment (pp. 257–263). Cambridge University Press.

Tsagari, D., & Banerjee, J. (2014). Language assessment in the educational context. In M. Bigelow & J. Ennser-Kananen (Eds.), The Routledge handbook of educational linguistics (pp. 339–352). Taylor & Francis.

Weigle, S. C. (2012). Assessing writing. In C. Coombe, P. Davidson, B. O’Sullivan, & S. Stoynoff (Eds.), The Cambridge guide to second language assessment (pp. 218–224). Cambridge University Press.