Assessing students from diverse backgrounds is something that every speech-language pathologist needs to be able to do well. There is not a single speech-language pathologist who only tests those whose cultural, linguistic, and educational backgrounds match their own. Not one single one.
Students from diverse backgrounds are over-represented in special education programs
Educational research has indicated for decades that students from diverse backgrounds are overrepresented in special education programs in the United States. According to the research, there are a lot of factors that play into this, including assessing students from diverse backgrounds.
“Many school professionals lack the appropriate understanding of testing tools to properly assess and evaluate [students from diverse backgrounds].” (Becker & Deris, 2019)
“When teachers do not know what to do with students due to cultural mismatch… [they] are more likely to refer them for special education.” (Williams, 2008).
“Once referred to an assessment team, the student has a greater than 50% chance of being identified as disabled.” (Becker & Deris, 2019)
Kreskow (2013) found that test bias was one of the number one reasons that assessing students from diverse backgrounds leads to over-referral for special education services.
The last time I wrote about overrepresentation of students from diverse backgrounds in special education, several people reached out to me and commented that oftentimes students have difficulty getting special education services when they need them. This is another problem (one that impacted many in the State of Texas) that I won’t address in this post but here is an excellent resource for those who are denied special education services.
In this post I want to focus on:
- Bilingual assessment issues and how they contribute to the problem of overrepresentation of students from diverse backgrounds in special education programs
- Things that evaluators need to be aware of when using assessment tools with students whose backgrounds do not match those of the normative sample
- Ways that evaluators can use assessment tools wisely and still get good results when assessing students from diverse backgrounds.
Considerations in Assessing Students from Diverse Backgrounds
How does one’s ethnicity, race, gender, or language affect test performance?
Every child we test shows up at the table with different experiences. They have different cultural experiences, different linguistic experiences, and different educational experiences.
We have a handful of speech-language assessment tools that we use to test students.
We know that the child’s prior experiences influence the way they will respond to different items. With this in mind, we need to recognize that we cannot just look at a static moment in time and a single tool to get the information that we need.
How can we adjust our testing practices to ensure that we are not over-referring students for special education services?
Go beyond the test to gather information for your speech-language evaluation.
If tests show that a child is below average, we have work to do beyond the test.
We need to gather parent information. We need to gather teacher information. We need to look at informal measures and how those look compared to the formal measures. And then we need to incorporate dynamic assessment.
Things evaluators need to be aware of when assessing students from diverse backgrounds:
No matter what testing tool we use, we need to look beyond their item score of a 1 or zero and ask:
- “What were their answers?”
- “Could they have gotten that wrong because of their linguistic experiences?”
- “Could they have gotten that wrong because of a cultural difference?”
We need to go beyond the expected answers written in the test manual and think about whether the child’s answer is reasonable given their experiences–not our experiences.
Pay Attention to the Purpose of Each Test Item When Making Scoring Decisions
Let me give you an example from a bilingual assessment tool for young children that I used recently. There is an item that asks students to repair semantic absurdities that gives a sentence about a boy eating a car. My student responded, “There’s no way a boy could eat a car. Now, Iron Man, he could eat a car.”
Now, it is not at all surprising that the test developers did not anticipate that response. It was not in the list of acceptable responses. In the manual, a correct response would be to change “car” to something that could be eaten by a boy. So, do we count this as correct or incorrect?
Let’s think about the purpose of the item. What is this item intended to measure? It is intended to measure an understanding of words and how they can go together. It is intended to tell if a child can identify something that is absurd or unreasonable in the words we present to them.
Could my student do that? Yes! Does Iron Man actually eat cars? I have no idea but I don’t think that really matters. My student understood that boys don’t eat cars.
Tools are designed to help us make diagnostic decisions but they do not make diagnostic decisions for us.
Don’t Use Scores When Using English Tests for Assessing Bilingual Students
In many cases, when we are using a test that was not designed for a student with the cultural and linguistic background of the student we are testing, we cannot use the scores. This is certainly the case when we translate a test from English to another language.
I want to walk you through the impact of translating a test. I’m not saying don’t translate a test but I am saying that if you translate a test your test statistics are no longer valid or meaningful and should not be used in making a diagnostic decision.
What if we are testing students from diverse backgrounds who speak a language we do not have a test for? Can we translate the items on an English test?
When we translate a test, Peña (2007) suggests that we need to think about:
- linguistic equivalence
- functional equivalence
- cultural equivalence
- metric equivalence (is the difficulty level of the items the same in both languages?)
Lack of equivalence in any of these areas can threaten the content validity of the test (Rogler, 1999). Remember that content validity estimates how well the test measures what it purports to measure. For more on that see Why Should We Care About the Reliability and Validity of Speech-Language Tests.
Let’s take a look at a test that is used across the United States. There is a Spanish version but not a set of norms for the Spanish version. In a training the publisher did for the State of Texas, they said that because it is a direct translation the English norms can be used…with caution, or course. I couldn’t disagree more.
Examples of Why Test Statistics Do not Stay the Same When A Test is Translated
Let’s take a look at two receptive language items and two expressive language items to illustrate why the items are not equivalent when translated from English to Spanish.
Further, they are not phonetically equivalent (VC, VC in English but CCVCV and CVCCV in Spanish). They also do not occur in the languages with equal frequency. Out and on have much higher levels of frequency than fuera and sobre do.
And that doesn’t even mention the fact that in English we are using two words and in Spanish 4 or 5 words. If we compare syllables, we see 4 in English versus 7 in Spanish. That is definitely not an equivalent pair of items.
Let’s move on to a couple of expressive items.
If you’ve ever tried to learn a second language that includes a gender system like Spanish, you’ll know that it doesn’t happen overnight. It’s a complex system and when we are testing young children and one is expected to have mastered noun genders and the other isn’t, we definitely do not have test equivalence.
In all of these examples, the item difficulty level for Spanish speakers would be harder than for English speakers. Let’s think about what that means if we use the scores. Those who take the items in Spanish get more items incorrect. Thus, their scores are lower and they are more likely to be diagnosed with a language disorder.
What happens with basal and ceiling rules?
The items on most of our tests are ordered from easiest to hardest, which is what allows us to effectively use basal and ceiling rules. When we translate a test and the item difficulty levels change, the basal and ceiling rules no longer work because the items are not ordered from easiest to hardest. As an evaluator, we can test below the basal and above the ceiling to get a fuller picture of our student’s skills.
Can I use standard scores when I translate a test?
We can’t use standard scores when we translate a test. The difficulty level of the items changes. When our item difficulty levels have changed and we no longer have basals and ceilings. We simply cannot use the scores. They are no longer meaningful. We can, however use qualitative information. We can identify strengths and needs in content and form. And that is what drives our diagnostic decision making.
Adding Dynamic Assessment to Probe Areas of Need When Assessing Students from Diverse Backgrounds
We can use the information from our formal tests to add a component of dynamic assessment or trial therapy. This helps us go beyond the static moment in time and look at learning potential. It helps us reduce test bias by ruling out difficulties because of a lack of understanding of the task or lack of exposure to the content.
If I give my student a little bit of support and they are able to get it, that tells me that they have the skills I am evaluating. This is an important element because if we think about testing,it’s a snapshot. This powerful tool of dynamic assessment allows us to take that snapshot and add to it, to say, what is this child’s learning potential?
Dynamic assessment takes us from a snapshot to a video of what a child can do with support.
Resources to support you in your bilingual assessment process
Bilingual Speech-Language Evaluation Sample
Bilinguistics Difference or Disorders Essentials Pack (online trainings with the DOD book)
If you give a standardized language test to a multi-lingual student and the score places him/her in the average range for monolingual English speakers. Could you use (report) that information to rule out a language delay? Especially if the observations and other data support this finding?
Absolutely. This happens all the time. After their speaking English long enough, they perform well enough on the test. We always still do informal testing and probe the second language to get an idea of proficiency but it sounds like he is ready for dismissal or shouldn’t qualify.
There is also the situation where you are testing a bilingual child who is not yet proficient in English but have lost facility with their native language and they score low on BOTH tests. We could say that that is evidence of a language disorder or delay, but you can’t know that. Using other measurements, like parent and teacher interviews and classroom and playground observations, and triangulating ALL the data allows you to get a fuller picture. My question: I have reported scores on English monolingual tests for future reference to tests scores on the same test. (I DO state that those scores are not valid because of the norming population and I do analyze the responses for patterns and report those.) And I use the growth values (CELF-5) for future comparison of what areas have seen growth even though the age-based score may not have changed. Is this valid or problematic? I suspect the latter but would love to get feedback on this.
Hi Pam,
Yes, definitely important to take information from many sources to make our diagnostic decisions. With respect to Growth Scores on the Pearson tests, the publisher does state that Growth Scores can be used to track the development of language skills, determine mastery of a skill, and measure the efficacy of intervention programs addressing those skills. They are not norm-based. As far as the approach of reporting scores for bilingual students on monolingual tests to look at change over time, norms are involved, and I think that presents a problem. I can see using the tools informally (qualitatively) to look at skills they have gained over time, but comparing standard scores over time in that case doesn’t give us the information we need to make any confident analysis about development.
Hi! This would also be the case the other way around, right?
I mean, I know Spanish speakers SLP’s who would use a bilingual test (The Receptive and Expressive One-Word Picture Vocabulary Test-4th Edition: Spanish Bilingual (ROWPVT-4:SB)) to assess monolingual Spanish speakers.
The same would apply, scores cannot be used.
Thanks in advance!
Hi Alejandra,
I don’t use vocabulary tests in my language assessment process because they are very tied to SES and life experiences. I’m not sure of the normative samples of the EOWPVT-4 SB or the ROWPVT-4 SB. That would tell you whether there are norms for monolingual Spanish speakers.
I have a question, could I give both the CELF 5 in English, and the CELF 4 Spanish, to assess a bilingual speaker. Or are the tests similar enough that that would impact reliability of the results? Thank you!