|
Constancy
and Repeatability in Hip Evaluations |
|
. |
Some questions have been raised about how long the PennHIP reading, and the two-year-old OFA (or one-year old
"a" stamp, OVC, or GDC) results are good for.
The terms “precision” and “accuracy” may be used by your PennHIP vet. What he means by the former is that there is
general agreement between “scrutineers/readers” as to
the diagnosis; what he means about the latter is relative to additional
factors. Imagine a sharpshooter’s bull’s-eye target, and half a dozen riflemen
who shoot at it. All of their bullet holes are clustered somewhere in the upper
left quadrant of the target, some distance from the bull’s-eye but all about 2
inches from each other. The people at Penn liken that to the hip-extended
evaluators who agree fairly well, but miss many cases of HD with that position.
Now imagine equally talented shooters with better rifles (scopes, longer
barrel, etc.) who not only cluster their holes in the bull’s-eye (“accurate”),
but also within one-sixteenth inch of each other (precise). Further, imagine
those with the better guns being able to repeat their performance at every
match (reliable). Accuracy, however, is not well defined in the context of
making genetic change toward better hips. For that, you need to add the effect
of heritability. The hip phenotype (as most accurately reflected in DI and
percentile scores) with the highest heritability is the one that should be
considered most accurate. And the distraction method has a much higher
heritability than older methods of viewing hips.
The claim by Penn that OFA is not the best method breeders have for
progress in reducing HD in their line or breed involves the accuracy factor.
They call our attention to the fact that there are many dogs (usually of
certain breeds) that do not develop DJD but are OFA-assessed as dysplastic because of laxity at two years’ age. Even more
importantly, there is the greater number that were
adjudged “normal” at two years but later developed DJD or, if not re-radiographed, produced an unacceptably high percentage of dysplastic descendants. This led to the conclusion that the
accuracy of OFA’s method is gravely flawed. Even if
reliability (by this is usually meant repeatability) were high from younger
ages up to the two-year qualification age for OFA certification, and I do not
think it is, the absence of accuracy is worrisome to breeders, and diminishes
the importance of published reliability figures in 1997.
Remember the difference between reliability and accuracy, described above. As
an example, Penn cites the 1996 OFA-type JAAHA evaluation of military dogs in a
longitudinal study in which all the dogs with “normal” hips at two years had mild
degenerative changes by nine years of age. At the same time, 22 of 52 dogs that
had been judged “positive” for HD at two had similar changes by nine years! The
conclusion is that the OFA-type evaluation at two years does gives a relatively
high rate of misdiagnosis, and blurs the distinction between true
positive-for-HD and true negative (no HD) diagnoses even at the supposedly
“safe” age of two years. Admittedly, the OFA hedges its emphasis on laxity a
little by using the phrase “normal for age and breed” when grading radiographs;
they do allow for some differences between Saint Bernards’
and Borzois’ hips this way, though it is still a totally subjective evaluation.
And the point that some people who wrote to me wanted explained was about OFA-Normals (at 2 yrs) going bad later in life. Read that
study: Banfield CM, Bartels JE, Hudson J. A retrospective study of canine hip dysplasia
in 116 military working dogs; Angle measurements and OFA grading. J Am Anim Hosp Assoc. 1996; 32:413-22. Other reading material
for you: Corley EA, Keller GC, et al. Reliability of early radiographic
evaluations for canine hip dysplasia obtained from
the standard ventrodorsal radiographic projection.
JAVMA 1997;211:1142-1146. But along with that you
should also read the letter to the editor on p. 487-488 in JAVMA's
Volume 212 of
Comparitively, the actual testing error of PennHIP (as
determined by repeated tests of the same dog at one point in time) is extremely
small (<0.05 DI units). The biological variability of a given dog’s hip
laxity over time, however, could be much larger. There is variation in anything
biological, one example being the evaluation of hip laxity, taken and scored such-and-such
on a given day, and then on some future day scored so-and-so.
Consider three points. First, biological variability is evident in
all measured biological parameters, e.g., serum cholesterol, blood pressure,
heart rate, etc., even hip laxity to a smaller degree. Second, if a breeder
believes the naysayers and feels distraction
radiography should not be used because of the perception of too much
variability (error), he should realize that in all studies that compared the DI
with OFA score, the OFA diagnostic test was found to have even more error when
evaluated longitudinally. The data is clear on this issue. Many people have
been lulled into believing that since their dogs receive one OFA score at 2
years of age, that the score is absolute and will not
change. If they would take the time, and spend the money to have repeated OFA
testing done, they would find a troubling amount of error, and more error than
with PennHIP.
Third, to circumvent the potential uncertainties occasioned by
inherent biological variation (using any diagnostic test),
it is wise to average many observations rather than fully rely on any single
observation. Multiple tests will regress to the mean, giving a truer measure of
the phenotype, the same way you will get closer to 50% “heads” the more often
you flip a coin, or approach 12.5% “monorchids” the
more often you breed two “carriers”. In other words, if a breeder is arguing
that PennHIP should not be used because it has too
much “error”, the OFA method should also be abandoned because its error has
been shown to be even greater. Complaining because a biological situation does
not have mathematical precision is simply wrong-headed. By the way, if there
were any differences in reading DI at different times in a dog’s life, I would
suspect that the “margin of error” probably is greater in the higher end of the
scale. It has been my observation that the looser the joint, the greater
variation in readings.