Interpreting Sensitivity and Specificity

As speech-language pathologists we often use speech and language tests as diagnostic indicators for whether someone has a speech or language disorder, and we need to consider is the diagnostic accuracy of these tools. In other words, how accurately do these tools discriminate between people with and without language impairments or with and without speech impairments. Sensitivity and specificity are measures of diagnostic accuracy that help us understand how successful tools are at differentiating between those who have language disorders and those who don’t.

Sensitivity and Specificity Definitions

On the whole, sensitivity and specificity are measures of diagnostic accuracy that tell us how well tools accurately classify people on the area of diagnosis. In our case, we are diagnosing language disorders and speech disorders. Thus:

Sensitivity tells us how well a test identifies those WITH language disorders or speech disorders.
Specificity tells us how well a test identifies those WITHOUT language disorders or speech disorders.

These numbers are based on how many individuals are classified correctly.

How to Calculate Sensitivity and Specificity

Fortunately, calculating sensitivity and specificity is probably something you’ll never have to do BUT knowing how sensitivity and specificity are calculated helps us to understand the values associated with them.

These are calculations that test developers run to evaluate their tests. The way it works is that they have a group of individuals who have been classified on the variable of interest prior to taking the test. In our case, we’ll use language disorder as the variable of interest to illustrate our point. A group of individuals, some of whom have been diagnosed with a language disorder and others who have been identified as not having a language disorder, take the test that has been developed. The publisher conducts this study to show how accurate their tool is.

There are four possible outcomes when using a language test.

The test can classify someone with a language disorder as having a language disorder (True Positive).
The test can classify someone without a language disorder as having a language disorder (False Positive).
The test can classify someone with a language disorder as not having a language disorder (False Negative)
The test can classify someone without a language disorder as not having a language disorder (True Negative)

We’ll use the table below to walk through the calculations. Boxes A, B, C, and D correspond to the four options discussed above. A = True Positive, B = False Positive, C = False Negative, D = True Negative.

The publishers want to determine how many of the individuals with language disorders were classified by their test as language disordered (the number of true positives, Box A) and how many without language disorders were classified as not having language disorders (the number of true negatives, Box D). Those are accurate classifications.

What’s left over are the individuals who did not have a language disorder but were classified by the test as having a language disorder (the number of false positives, Box B), and the individuals who had a language disorder but were classified as not having a language disorder (the number of false negatives, Box C). Those are inaccurate classifications.

Sensitivity Formula

Sensitivity is calculated as Box A divided by A+B. In other words, the total number of individuals with language disorders who were correctly classified divided by all of the individuals classified by the test as having a language disorder.

Specificity Formula

Specificity is calculated as Box D divided by C_D. In other words, the total number of individuals without language disorders who were classified correctly divided by all of the individuals classified by the test as not having a language disorder.

Now, an important thing for us to realize is that the publishers can recommend the cut-point (score at which one is classified one way or the other) that yields the highest levels of sensitivity and specificity. The pictures below illustrate this nicely.

The test developer controls the cut-point. If the cut-point is set to where sensitivity is 100% (all individuals with language disorders are classified as language disordered), the impact of that will be that specificity drops. On the other hand, if we set the cutpoint at 100% specificity, then sensitivity drops.

So, the best cutpoint is the one at which the least number of individuals are misclassified. Test developers give you this information in the manual. I know, that information is largely ignored in our field and arbitrary cutpoints like 1.5 standard deviations below the mean are used. But we would have higher diagnostic accuracy if we paid attention to the sensitivity and specificity rates in the test manuals.

Looking at Sensitivity and Specificity for a Speech Assessment Tool

Take a look at the sensitivity and specificity measures for the Bilingual Articulation and Phonology Assessment app.

You can see here that for the Bilinguals in Spanish, when the cutscore is 1 standard deviation below the mean (85), the levels of sensitivity are .94 and .90 but when 1.5 standard deviations below the mean (77.5) is used, sensitivity drops to .72 and specificity goes up a little to .92. For the purposes of maximizing diagnostic accuracy, we would want to use a cut score of 1 standard deviation for that group. For the Bilinguals in English 1.5 standard deviations yields better diagnostic accuracy.

Sensitivity and Specificity for a Language Assessment Tool

One widely used test, the CELF-5, reports that 1.3 standard deviations below the mean (Standard Score of 80) is the cut score at which diagnostic accuracy is the highest for the Core Language Score, the Receptive Language Index and the Expressive Language Index. Yet, many people use a different cut score as their guideline, based on their districts eligibility guidelines. Why? Good question, isn’t it?

What are considered good levels of sensitivity and specificity?

The sensitivity and specificity table below was created based on Plante and Vance’s (1994) guidelines for acceptable levels of sensitivity and specificity in preschool language tests.

Sensitivity and Specificity Values	Acceptability
≥ 90%	Good
80-89%	Adequate
Below 80%	Unacceptable

Positive Predictive Power and Negative Predictive Power

I can’t write a post on diagnostic accuracy without discussing positive predictive power and negative predictive power. I think the best way to describe these measures is to say that positive predictive power is like an estimate of sensitivity that takes into account the expected number of individuals with a language disorder. For example, if one is conducting a universal screening of a kindergarten class, we might expect 8-10% of students to have a language disorder. On the other hand, if we are testing only individuals suspected of having a language disorder (e.g. those referred for evaluation by their teachers), we would expect that percentage to be much higher. Thus, Positive Predictive Power takes into account that number, referred to as the base rate. Similarly, Negative Predictive Power is like specificity that takes the base rate into account.

Some publishers will include one or the other, and some will include both measures of sensitivity and specificity and positive and negative predictive power.

Resources to Make Utilizing Diagnostic Accuracy Measures Easier

I want to close with the fact that I know how busy speech-language pathologists are in the field. We have high caseloads and too much work! And going to dig through test manuals after completing an evaluation is not always at the top of the list. So, I will leave you with two resources that should make that process shorter for you.

One is a sensitivity and specificity table I created that includes many of the frequently used measures in our field. The other is a link to one of our courses on this topic if you want to spend an hour learning more about it. For SLP Impact Members, the course is free in your library.

Diagnostic Accuracy of Select Language Assessment Measures

You can also access this resources in our free resources on the Bilinguistics website.

Here’s an ASHA CEU course on Improving Diagnostic Accuracy (SLP Impact members can access it free inside of SLP Impact)

4 Comments

Michael Sáenz on September 30, 2021 at 11:42 am

Thank you so much for this invaluable discussion—and most of all for the linked PDF with the ranges of cutoff scores for common language tests. It will be a real help in my write-ups of students whose test score indicates a potential disorder, but whose other data suggest not.
- Ellen Kester on September 30, 2021 at 3:07 pm
  
  Thanks, Michael. I’m so glad you found the discussion and the pdf helpful. That’s our goal!
Linda Carozza on October 1, 2021 at 5:09 am

Can you send a re print I could use for teaching purposes?
- Ellen Kester on October 1, 2021 at 2:42 pm
  
  Hi Linda,
  I’m not sure exactly what you are looking for.

Diagnostic Accuracy of Speech-Language Tests