Students from Diverse Backgrounds

Can We Use English Standardized Tests For Assessing Students From Diverse Backgrounds?

Assessing students from diverse backgrounds is something that every speech-language pathologist needs to be able to do well. There is not a single speech-language pathologist who only tests those whose cultural, linguistic, and educational backgrounds match their own. Not one single one.

Students from diverse backgrounds are over-represented in special education programs

Educational research has indicated for decades that students from diverse backgrounds are overrepresented in special education programs in the United States. According to the research, there are a lot of factors that play into this, including assessing students from diverse backgrounds.

“Many school professionals lack the appropriate understanding of testing tools to properly assess and evaluate [students from diverse backgrounds].” (Becker & Deris, 2019)

“When teachers do not know what to do with students due to cultural mismatch… [they] are more likely to refer them for special education.” (Williams, 2008).

“Once referred to an assessment team, the student has a greater than 50% chance of being identified as disabled.” (Becker & Deris, 2019)

Kreskow (2013) found that test bias was one of the number one reasons that assessing students from diverse backgrounds leads to over-referral for special education services.

The last time I wrote about overrepresentation of students from diverse backgrounds in special education, several people reached out to me and commented that oftentimes students have difficulty getting special education services when they need them. This is another problem (one that impacted many in the State of Texas) that I won’t address in this post but here is an excellent resource for those who are denied special education services.

In this post I want to focus on: 

  • Bilingual assessment issues and how they contribute to the problem of overrepresentation of students from diverse backgrounds in special education programs
  • Things that evaluators need to be aware of when using assessment tools with students whose backgrounds do not match those of the normative sample
  • Ways that evaluators can use assessment tools wisely and still get good results when assessing students from diverse backgrounds.

Considerations in Assessing Students from Diverse Backgrounds

How does one’s ethnicity, race, gender, or language affect test performance? 

Every child we test shows up at the table with different experiences. They have different cultural experiences, different linguistic experiences, and different educational experiences. 

We have a handful of speech-language assessment tools that we use to test students. 

We know that the child’s prior experiences influence the way they will respond to different items. With this in mind, we need to recognize that we cannot just look at a static moment in time and a single tool to get the information that we need.

How can we adjust our testing practices to ensure that we are not over-referring students for special education services?

Go beyond the test to gather information for your speech-language evaluation.

If tests show that a child is below average, we have work to do beyond the test.

We need to gather parent information. We need to gather teacher information. We need to look at informal measures and how those look compared to the formal measures. And then we need to incorporate dynamic assessment. 

Things evaluators need to be aware of when assessing students from diverse backgrounds:

No matter what testing tool we use, we need to look beyond their item score of a 1 or zero and ask: 

  • “What were their answers?” 
  • “Could they have gotten that wrong because of their linguistic experiences?” 
  • “Could they have gotten that wrong because of a cultural difference?”

We need to go beyond the expected answers written in the test manual and think about whether the child’s answer is reasonable given their experiences–not our experiences.

Pay Attention to the Purpose of Each Test Item When Making Scoring Decisions

Let me give you an example from a bilingual assessment tool for young children that I used recently. There is an item that asks students to repair semantic absurdities that gives a sentence about a boy eating a car. My student responded, “There’s no way a boy could eat a car. Now, Iron Man, he could eat a car.” 

Now, it is not at all surprising that the test developers did not anticipate that response. It was not in the list of acceptable responses. In the manual, a correct response would be to change “car” to something that could be eaten by a boy. So, do we count this as correct or incorrect? 

Let’s think about the purpose of the item. What is this item intended to measure? It is intended to measure an understanding of words and how they can go together. It is intended to tell if a child can identify something that is absurd or unreasonable in the words we present to them.

Could my student do that? Yes! Does Iron Man actually eat cars? I have no idea but I don’t think that really matters. My student understood that boys don’t eat cars.

Tools are designed to help us make diagnostic decisions but they do not make diagnostic decisions for us.

Don’t Use Scores When Using English Tests for Assessing Bilingual Students

In many cases, when we are using a test that was not designed for a student with the cultural and linguistic background of the student we are testing, we cannot use the scores. This is certainly the case when we translate a test from English to another language.  

I want to walk you through the impact of translating a test. I’m not saying don’t translate a test but I am saying that if you translate a test your test statistics are no longer valid or meaningful and should not be used in making a diagnostic decision. 

What if we are testing students from diverse backgrounds who speak a language we do not have a test for? Can we translate the items on an English test? 

When we translate a test, Peña (2007) suggests that we need to think about:

  • linguistic equivalence
  • functional equivalence
  • cultural equivalence 
  • metric equivalence (is the difficulty level of the items the same in both languages?)

Lack of equivalence in any of these areas can threaten the content validity of the test (Rogler, 1999). Remember that content validity estimates how well the test measures what it purports to measure. For more on that see Why Should We Care About the Reliability and Validity of Speech-Language Tests

Let’s take a look at a test that is used across the United States. There is a Spanish version but not a set of norms for the Spanish version. In a training the publisher did for the State of Texas, they said that because it is a direct translation the English norms can be used…with caution, or course. I couldn’t disagree more.

Examples of Why Test Statistics Do not Stay the Same When A Test is Translated

Let’s take a look at two receptive language items and two expressive language items to illustrate why the items are not equivalent when translated from English to Spanish.

Receptive Item Considerations when assessing students from diverse backgrounds

Further, they are not phonetically equivalent (VC, VC in English but CCVCV and CVCCV in Spanish). They also do not occur in the languages with equal frequency. Out and on have much higher levels of frequency than fuera and sobre do.

Not equivalent items for testing bilingual students

And that doesn’t even mention the fact that in English we are using two words and in Spanish 4 or 5 words. If we compare syllables, we see 4 in English versus 7 in Spanish. That is definitely not an equivalent pair of items.

Let’s move on to a couple of expressive items.

Bilingual assessment concerns when translating test items
Expressive item translated for bilingual assessment - lack of equivalence between languages

If you’ve ever tried to learn a second language that includes a gender system like Spanish, you’ll know that it doesn’t happen overnight. It’s a complex system and when we are testing young children and one is expected to have mastered noun genders and the other isn’t, we definitely do not have test equivalence. 

In all of these examples, the item difficulty level for Spanish speakers would be harder than for English speakers. Let’s think about what that means if we use the scores. Those who take the items in Spanish get more items incorrect. Thus, their scores are lower and they are more likely to be diagnosed with a language disorder. 

basal and ceiling rules for testing diverse students

What happens with basal and ceiling rules?

The items on most of our tests are ordered from easiest to hardest, which is what allows us to effectively use basal and ceiling rules. When we translate a test and the item difficulty levels change, the basal and ceiling rules no longer work because the items are not ordered from easiest to hardest. As an evaluator, we can test below the basal and above the ceiling to get a fuller picture of our student’s skills. 

Can I use standard scores when I translate a test?

We can’t use standard scores when we translate a test. The difficulty level of the items changes. When our item difficulty levels have changed and we no longer have basals and ceilings. We simply cannot use the scores. They are no longer meaningful. We can, however use qualitative information. We can identify strengths and needs in content and form. And that is what drives our diagnostic decision making. 

Adding Dynamic Assessment to Probe Areas of Need When Assessing Students from Diverse Backgrounds

We can use the information from our formal tests to add a component of dynamic assessment or trial therapy. This helps us go beyond the static moment in time and look at learning potential. It helps us reduce test bias by ruling out difficulties because of a lack of understanding of the task or lack of exposure to the content. 

If I give my student a little bit of support and they are able to get it, that tells me that they have the skills I am evaluating. This is an important element because if we think about testing,it’s a snapshot. This powerful tool of dynamic assessment allows us to take that snapshot and add to it, to say, what is this child’s learning potential?

Dynamic assessment takes us from a snapshot to a video of what a child can do with support.

Resources to support you in your bilingual assessment process

Bilingual Speech-Language Evaluation Sample

Bilinguistics Difference or Disorders Essentials Pack (online trainings with the DOD book)

Culturally responsive test administration

Written by: Ellen Kester

6 Comments on “Can We Use English Standardized Tests For Assessing Students From Diverse Backgrounds?”

  1. October 16, 2021 at 1:33 pm #

    If you give a standardized language test to a multi-lingual student and the score places him/her in the average range for monolingual English speakers. Could you use (report) that information to rule out a language delay? Especially if the observations and other data support this finding?

    • October 18, 2021 at 7:56 am #

      Absolutely. This happens all the time. After their speaking English long enough, they perform well enough on the test. We always still do informal testing and probe the second language to get an idea of proficiency but it sounds like he is ready for dismissal or shouldn’t qualify.

  2. November 1, 2021 at 9:16 am #

    There is also the situation where you are testing a bilingual child who is not yet proficient in English but have lost facility with their native language and they score low on BOTH tests. We could say that that is evidence of a language disorder or delay, but you can’t know that. Using other measurements, like parent and teacher interviews and classroom and playground observations, and triangulating ALL the data allows you to get a fuller picture. My question: I have reported scores on English monolingual tests for future reference to tests scores on the same test. (I DO state that those scores are not valid because of the norming population and I do analyze the responses for patterns and report those.) And I use the growth values (CELF-5) for future comparison of what areas have seen growth even though the age-based score may not have changed. Is this valid or problematic? I suspect the latter but would love to get feedback on this.

    • November 3, 2021 at 11:32 am #

      Hi Pam,
      Yes, definitely important to take information from many sources to make our diagnostic decisions. With respect to Growth Scores on the Pearson tests, the publisher does state that Growth Scores can be used to track the development of language skills, determine mastery of a skill, and measure the efficacy of intervention programs addressing those skills. They are not norm-based. As far as the approach of reporting scores for bilingual students on monolingual tests to look at change over time, norms are involved, and I think that presents a problem. I can see using the tools informally (qualitatively) to look at skills they have gained over time, but comparing standard scores over time in that case doesn’t give us the information we need to make any confident analysis about development.

  3. November 2, 2022 at 9:34 am #

    can you test a child who is from a Spanish home and tested by a Spanish evaluator who now suggest you test them with same test in English to compare the scores on the psychoeducation evaluations . Is this valid and what would I gain ? Is there not a easier way to find what language he is more proficient in ?

    • November 13, 2022 at 8:31 am #

      Hi Mel,
      This is a common question as the psych folks test very differently. They compare language strengths and often test in the stronger language. I will say that I think their field is evolving to be more like ours where we test cumulatively. The key here is cumulatively and not comparatively. We fully test English, and then fully test the other language if needed. We add together all the abilities. For example, a child may have correct pronoun use in one language and correct vocabulary in the other. That’s fine. We write goals around abilities that are missing in both languages. One thing to watch though. With vocabulary you only give credit for each concept. So if they say dog, perro, cat, gato – that is only 2 vocabulary words because dog and perro mean the same thing, just in Spanish and English.

      So to answer your two questions: What would you gain? I guess in theory if you concretely knew which language they are more proficient in you would trust your results more and maybe test less??? But I think that is risky to not at least probe in the second language.

      “Is there not a easier way to find what language he is more proficient in ?” Totally, have them tell the same story from a wordless picture book in both languages and compare for complexity and errors. This takes 10 minutes, you have exact examples of errors to write goals on, you know which formal test to use first if a second one is needed at all, you have examples of past, present, and future tense, and you know if they can include narrative elements (person/place/problem/solution)… So much rich information from so little effort.

Leave a Reply