Abstract
Objective:To assess the internal consistency and interrater reliability of a clinical evaluation exercise (CEX) format that was designed to be easily utilized, but sufficiently detailed, to achieve uniform recording of the observed examination.
Design:A comparison of 128 CEXs conducted for 32 internal medicine interns by full-time faculty. This paper reports alpha coefficients as measures of internal consistency and several measures of inter-rater reliability.
Setting:A university internal medicine program. Observations were conducted at the end of the internship year.
Participants:Participants were 32 interns and observers were 12 full-time faculty in the department of medicine. The entire intern group was chosen in order to optimize the spectrum of abilities represented. Patients used for the study were recruited by the chief resident from the inpatient medical service based on their ability and willingness to participate.
Intervention:Each intern was observed twice and there were two examiners during each CEX. The examiners were given a standardized preparation and used a format developed over five years of previous pilot studies.
Measurements and main results:The format appeared to have excellent internal consistency; alpha coefficients ranged from 0.79 to 0.99. However, multiple methods of determining inter-rater reliability yielded similar results; intraclass correlations ranged from 0.23 to 0.50 and generalizability coefficients from a low of 0.00 for the overall rating of the CEX to a high of 0.61 for the physical examination section. Transforming scores to eliminate rater effects and dichotomizing results into pass-fail did not appear to enhance the reliability results.
Conclusions:Although the CEX is a valuable didactic tool, its psychometric properties preclude reliable assessment of clinical skills as a one-time observation.
Similar content being viewed by others
References
Blank LL, Grosso LJ, Benson JA Jr. A survey of clinical skills evaluation practices in internal medicine residency programs. J Med Educ. 1984;59:401–6.
Petersdorf RG, Beck JC. The new procedure for evaluating the clinical competence of candidates to be certified by the American Board of Internal Medicine. Ann Intern Med. 1972;76:491–6.
Woolliscroft JO, Stross JK, Silva J Jr. Clinical competence certification: a critical appraisal. J Med Educ. 1984;59:799–805.
Herbers JE Jr, Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How accurate are faculty evaluations of clinical competence? J Gen Intern Med. 1989;4:202–8.
Kroboth FJ, Kapoor W, Brown FH, Karpf M, Levey GS. A comparative trial of the clinical evaluation exercise. Arch Intern Med. 1985;145:1121–3.
Lipkin M. The medical interview and related skills. In: Branch W. Office practice of medicine, 2nd ed. Philadelphia: W. B. Saunders, 1987;1287–306.
Brennan RL, Kane MT. Generalizability theory: a review. In: Traub RE (ed.). New directions for testing and measurement (no. 4): methodological developments. San Francisco: Jossey-Bass, 1979;33–51.
Shrout PE, Fleiss JL. Intraclass correlations in assessing rater reliability. Psychol Bull. 1979;86:420–8.
Cohen J. Weighted kappa: nominal scale agreement with provisions for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–20.
Cohen J. A co-efficient of agreement for nominal scales. Educ Psychol Measurement. 1960;20:37–46.
Hinz CF. Direct observation as a means of teaching and evaluating clinical skills. J Med Educ. 1966;41:150–61.
Landy FJ, Farr JL. Performance rating. Psychol Bull. 1980;1:72–107.
Thompson WG, Lipkin M Jr, Gilbert DA, Guzzo RA, Roberson L. Evaluating evaluation: assessment of the American Board of Internal Medicine resident evaluation form. J Gen Intern Med. 1990;5:214–7.
Benson JA Jr, Blank LL, Norcini JJ Jr. Examining the ABIM’s evaluation form [letter]. J Gen Intern Med. 1990;5:535–6.
Barrows HS, Abrahamson S. The programmed patient: a technique for appraising student performance in clinical neurology. J Med Educ. 1964;39:802–5.
Owen A, Winkler R. General practitioners and psychosocial problems: an evaluation using pseudopatients. Med J Aust. 1974;2:393–98.
Godkins TR, Duffy D, Greenwood J, Stanhope WD. Utilization of simulated patients to teach the ‘routine’ pelvic examination. J Med Educ. 1974;49:1174–8.
Anderson KK, Meyer TC. The use of instructor-patients to teach physical examination techniques. J Med Educ. 1978;53:831–6.
Elliot DL, Hickman DH. Evaluation of physical examination skills: reliability of faculty observers and patient instructors. JAMA. 1987;258:3405–8.
Stillman PL, Swanson PD, Snee S, et al. Assessing clinical skills of residents with standardized patients. Ann Intern Med. 1986;105:762–71.
Stillman P, Swanson D, Regan MB, et al. Assessment of clinical skills of residents utilizing standardized patients. Ann Intern Med. 1991;114:393–401.
Harden R, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using objective structured clinical examination. Br Med J. 1975;1:447–51.
Robb KV, Rothman AI. The assessment of clinical skills in general medical residents—comparison of the objective structured clinical examination to a conventional oral examination. Ann R Coll Phys Surg Can. 1985;18:235–8.
Petrusa ER, Blackwell TA, Rogers LP, et al. An objective measure of clinical performance. Am J Med. 1987;83:34–41.
Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and validity of an objective structured clinical examination for assessing the clinical performance of residents. Arch Intern Med. 1990;150:573–7.
Newble DI, Swanson DB. Psychometric characteristics of the objective structured clinical examination. Med Educ. 1988;22:325–34.
Weiner BJ. Standardized principles and experimental design. New York: McGraw-Hill, 1971.
Author information
Authors and Affiliations
Additional information
Received from the Division of General Internal Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania.
Supported by a grant from the American Board of Internal Medicine.
Rights and permissions
About this article
Cite this article
Kroboth, F.J., Hanusa, B.H., Parker, S. et al. The inter-rater reliability and internal consistency of a clinical evaluation exercise. J Gen Intern Med 7, 174–179 (1992). https://doi.org/10.1007/BF02598008
Issue Date:
DOI: https://doi.org/10.1007/BF02598008