Norm-Referenced vs Criterion-Referenced Tests for Speech-Language Evaluations

When we set out to test students and clients to determine whether or not they have a speech-language impairment, we have a lot of choices in how we go about doing a speech-language assessment. It is best practice to use a combination of formal measures, informal measures, and dynamic assessment. But what published test tools do we use, if any? What about norm-referenced tests vs criterion-referenced tests?

When we do a speech-language evaluation, we have a set amount of time to make a decision, and it never seems like quite enough. Often, we use published tools, such as norm-referenced tests or criterion-referenced tests, to help us efficiently collect speech and language information from our clients. Two critically important things to remember in this process are:

No test is perfect.
Assessment tools do not diagnose speech and language disorders–speech-language pathologists do.

In saying that, we want to use the tools that we have in the very best way that we can. We also want to be mindful about whether our students have cultural or linguistic differences that may impact the perceived outcome of the norm-referenced tests or criterion-referenced tests we use.

Let’s start with a the definitions of norm-referenced vs criterion-referenced tests.

What are Norm-Referenced Tests?

Norm-referenced tests are tests that have been developed and administered to a (hopefully) large group of individuals. The performance of each age group is evaluated, and subsequent test takers are compared to their age group (which could be a 3-month-interval, a 6-month-interval, a year,…). For most language tests, the performance of the people in the standardization sample results in a bell curve, or normal curve that looks like this:

Standard scores tell us how far one falls from the mean

Most people fall in the middle of the curve. Depending on the guidelines you use, scores within +/- 1 standard deviation, +/- 1.5 standard deviations, or +/- 2 standard deviations from the mean are considered average. These guidelines vary by state, city, school district, clinic, insurance company, and so on. Truth be told, we really should use sensitivity and specificity guidelines to make decisions about cut-off points. But that’s a completely separate blog post!

Norm-referenced tests allow for comparison to a reference group

After we administer a standardized, norm-referenced test to a child, we score it and end up with a standard score that yields information about how the test-taker fared relative to same-age peers in the normative sample. The standard scores can tell us how far above or below the mean the child performed and what percentile they fell in (e.g. what percentage of people scored higher or lower than they did).

Do all tests have a normal curve?

Before we move on to criterion-referenced tests, we should discuss standardized tests that do not have a normal curve or bell curve. Sometimes standardization samples yield skewed distributions. Articulation tests, in particular, yield skewed distributions because articulation skills are mastered at a relatively young age. Thus, if we test a group of children who are age 3 to age 11, the older groups of children will get almost every item correct. That results in a negatively skewed distribution that looks like this:

For more information on skewed distributions versus normal distributions, see the blog post:

Articulation testing: Why don’t the percentiles line up with the standard scores the way they are supposed to?

What are Criterion-Referenced Tests?

A criterion referenced test is a test that has a pre-determined set of criteria against which an individual is measured. When measuring language skills, criterion-referenced tests have small age ranges and skills that are typically acquired by children in that age range. They are based on research in the field but not by a single standardization group.

What is an example of a criterion-referenced test for language?

An example of a criterion-referenced language test is the Rossetti Infant-Toddler Language Scale. The Rossetti is a criterion-referenced tool that evaluates Language Comprehension, Language Expression, and Pragmatics, as well as Gesture and Play in 3-month intervals that contain roughly 3-12 skills in each category. The results show areas of strength, weaknesses, and emerging skills.

What are strengths and weaknesses of norm-referenced vs. criterion-referenced tests?

Strengths

Norm-Referenced Tests	Criterion-Referenced Tests
Allow comparison to same-aged peers	Use broad range of developmental norms
Are often efficient to administer	Easy to see what goals come next
Provide standard scores/percentiles	Allows quick view of strengths/weaknesses
Provide specific instructions	Nice for tracking an individual’s skill gains
Provide stimulus materials	More flexible administration

Weaknesses

Norm-Referenced Test	Criterion-Referenced Tests
The normative sample is not all-inclusive	Do not easily allow ranking of individuals
Bias exists in all norm-referenced tests	Bias exists in all criterion-referenced tests
Rigid administration procedures	Many agencies want standard scores
Not designed to be used to track progress	Not designed for comparison to peers

Additional tips that help make solid diagnostic decisions when using published testing tools.

Test below basal and above the ceiling (for standardized, norm-referenced tests)
Test below the child’s age range and above one age-range of zero scores (for criterion-referenced tests.
Analyze missed items by considering whether influence from cultural or linguistic differences could have impacted the response.
Conduct a dynamic assessment (short teaching session) on skills the child did not demonstrate. For more information on this: DA
Know that there is error in every test you administer. This is why it is so important to use multiple sources of input for your speech-language assessment.
Remember that when we use norm-referenced tools, we always want to consider who the test was normed on. Is the standardization sample representative of the student or client you are testing? This is important because we know that test bias is one of the main reasons that students from diverse cultural and linguistic backgrounds are over-referred for special education services. So we need to be sure that we are using tools in a way that we are not penalizing students from different cultural or linguistic backgrounds.

Why is there error in tests?

Language is a broad construct that is not defined in the same way by each person. Think about it. We are coming up with a single number to represent a very broad construct.

“He got an 87 on the Receptive Language subtest.”

Let’s think about what receptive language encompasses. At the most basic level, we want to know–Does the child understand? We can break receptive language down further into language form and language content. And we can break those categories down further into questions like, “Is the child able to identify objects? What about categories? And words relationships. Are they able to follow directions? And then we can take that many steps further. Can they follow directions that are one-step, two-steps, and three-steps. Can they follow instructions that have concepts of inclusion and exclusion, spatial concepts, and quantitative concepts. Does the student understand complex sentences. Can they answer questions about a story, and so on.

So, you can see that, when we are working with a very broad concept like language, it is really impossible to boil it down into a single number without an element of error.

Can I use both a norm-referenced tool and a criterion-referenced tool?

Yes, absolutely. You should use whatever combination of tools and techniques lead you to your most confident diagnostic decision! The decision between choosing a norm-referenced vs criterion-referenced test comes down to thinking about the child and choosing a tool that will best describe her communication. In many instances, we need both styles of evaluation to get the complete picture.

Check out this article by Renaissance.com for more information.