Abstract
Purpose
Berry et al.’s (J Appl Psychol 96:881–906, 2011) meta-analysis of cognitive ability test validity data across employment, college admissions, and military domains demonstrated that validity is lower for Black and Hispanic subgroups than for Asian and White subgroups. However, Berry et al. relied on observed test-criterion correlations and it is therefore not clear whether validity differences generalize beyond observed validities. The present study investigates the roles that range restriction and criterion contamination play in differential validity.
Design/Methodology/Approach
A large dataset (N > 140,000) containing SAT scores and college grades of Asian, Black, Hispanic, and White test takers was used. Within-race corrections for multivariate range restriction were applied. Differential validity analyses were carried out using freshman GPA versus individual course grades as criteria to control for the contaminating influence of individual differences between students in course choice.
Findings
Observed validities underestimated the magnitude of validity differences between subgroups relative to when range restriction and criterion contamination were controlled. Analyses also demonstrate that validity differences would translate to larger regression slope differences (i.e., differential prediction).
Implications
Subgroup differences in range restriction and/or individual differences in course choice cannot account for lower validity of the SAT for Black and Hispanic subgroups. Controlling for these factors increased subgroup validity differences. Future research must look to other explanations for subgroup validity differences.
Originality
The present study is the first differential validity study to simultaneously control for range restriction and individual differences in course choice, and answers a call to investigate potential causes of differential validity.
Similar content being viewed by others
Notes
The matched course datasets included only those SAT-ICG correlations drawn from courses with both minority and White students. This was the most appropriate way to control for differential course-taking patterns because most courses White students took did not include minority students. However, this greatly reduced the number of courses that would have otherwise been included for the White sample. To investigate the influence of this decision rule, we ran parallel analyses including all SAT-ICG correlations, regardless of whether they included both minority and White students. Validity estimates remained very similar and study conclusions were unchanged. Detailed results are available upon request from the first author.
Parallel ICG analyses were run using subgroup-specific estimates of number of courses taken in the freshman year (which ranged from 9.50 to 10.01 across subgroups) and intercorrelation between ICGs (which ranged between .38 and .49). No results changed by more than 3 correlation points, and the general pattern of results (i.e., Asian and White validities essentially equivalent with Black and Hispanic validities being lower) did not change.
References
Aguinis, H., Culpepper, S. A., & Pierce, C. A. (2010). Revival of test bias research in preemployment testing. Journal of Applied Psychology, 95, 648–680.
Aguinis, H., & Smith, M. A. (2007). Understanding the impact of test validity and bias on selection errors and adverse impact in human resource selection. Personnel Psychology, 60, 165–199.
Beaujean, A. A., Firmin, M. W., Knoop, A. J., Michonski, J. D., Berry, T. P., & Lowrie, R. E. (2006). Validation of the Frey and Detterman (2004) IQ prediction equations using the Reynolds intellectual assessment scales. Personality and Individual Differences, 41, 353–357.
Berry, C. M. (2007). Toward an understanding of evidence of differential validity of cognitive ability tests for racial/ethnic subgroups. Unpublished doctoral dissertation, University of Minnesota.
Berry, C. M., Clark, M. A., & McClure, T. K. (2011). Racial/ethnic differences in the criterion-related validity of cognitive ability tests: A qualitative and quantitative review. Journal of Applied Psychology, 96, 881–906.
Berry, C. M., & Sackett, P. R. (2009). Individual differences in course choice result in underestimation of the validity of college admissions systems. Psychological Science, 20, 822–830.
Breland, H. M. (1979). Population validity and college entrance measures (College Board Research and Development Report RDR 78–79 No. 2). Princeton, N.J.: Educational Testing Service.
Bridgeman, B., McCamley-Jenkins, L., & Ervin, N. (2000). Predictions of freshman grade-point average from the revised and recentered SAT I: Reasoning test (College Board Research Report No. 2000-1; Educational Testing Service Research Report No. 00–1). New York: College Entrance Examination Board.
Campbell, J. P. (1990). An overview of the army selection and classification project (project A). Personnel Psychology, 43, 231–239.
Elliott, R., & Strenta, A. C. (1988). Effects of improving the reliability of the GPA on prediction generally and on comparative predictions for gender and race particularly. Journal of Educational Measurement, 25, 333–347.
Frey, M. C., & Detterman, D. K. (2004). Scholastic assessment or g? The relationship between the scholastic assessment test and general cognitive ability. Psychological Science, 15, 373–378.
Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. San Francisco, CA: W. H. Freeman & Co.
Gulliksen, H. (1950). Test theory of mental tests. New York: Wiley.
Hunter, J. E. (1980). Validity generalization for 12,000 jobs: An application of synthetic validity and validity generalization to the general aptitude test battery (GATB). Washington, D.C.: U.S. Department of Labor, Employment Service.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.). New York: Sage Publications.
Hunter, J. E., Schmidt, F. L., & Hunter, R. (1979). Differential validity of employment tests by race: A comprehensive review and analysis. Psychological Bulletin, 86, 721–735.
Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for meta-analysis methods and findings. Journal of Applied Psychology, 91, 594–612.
Johnson, J. W., Carter, G. W., Davison, H. K., & Oliver, D. H. (2001). A synthetic validity approach to testing differential prediction hypotheses. Journal of Applied Psychology, 86, 774–780.
Keef, S. P., & Roberts, L. A. (2004). The meta-analysis of partial effect sizes. British Journal of Mathematical and Statistical Psychology, 57, 97–129.
Kuncel, N. R., Crede, M., & Thomas, L. L. (2005). The validity of self-reported grade point averages, class ranks, and test scores: A meta-analysis and review of the literature. Review of Educational Research, 75, 63–82.
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2004). Academic performance, career potential, creativity, and job performance: Can one construct predict them all? Journal of Personality and Social Psychology, 86, 148–161.
Lautenschlager, G. J., & Mendoza, J. L. (1986). A step-down hierarchical multiple regression analysis for examining hypotheses about test bias in prediction. Applied Psychological Measurement, 10, 133–139.
Linn, R. L. (1978). Single-group validity, differential validity, and differential predictions. Journal of Applied Psychology, 63, 507–514.
Linn, R. L. (1983). Pearson selection formulas: Implications for studies of predictive bias and estimates of educational effects in selected samples. Journal of Educational Measurement, 20, 1–15.
Mattern, K. D., Patterson, B. F., Shaw, E. J., Kobrin, J. L., & Barbuti, S. M. (2008). Differential validity and prediction of the SAT (College Board Research Report No. 2008-4). New York: College Board.
McDaniel, M. A., Kepes, S., & Banks, G. C. (2011). The Uniform Guidelines are a detriment to the field of personnel selection. Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 494–514.
Mendoza, J. L., Bard, D. E., Mumford, M. D., & Ang, S. C. (2004). Criterion-related validity in multiple-hurdle designs: Estimation and bias. Organizational Research Methods, 7, 418–441.
Morgan, R. (1990). Analyses of predictive validity within student categorizations. In W. W. Willingham, C. Lewis, R. Morgan, & L. Ramist (Eds.), Predicting college grades: An analysis of institutional trends over two decades (pp. 225–238). Princeton: Educational Testing Service.
Naylor, J. C., & Shine, L. C. (1965). A table for determining the increase in mean criterion score obtained by using a selection device. Journal of Industrial Psychology, 3, 33–42.
Ramist, L., Lewis, C., & McCamley, L. (1990). Implications of using freshman GPA as the criterion for the predictive validity of the SAT. In W. W. Willingham, C. Lewis, R. Morgan, & L. Ramist (Eds.), Predicting college grades: An analysis of institutional trends over two decades (pp. 253–288). Princeton, NJ: Educational Testing Service.
Rigdon, J. L., Shen, W., Kuncel, N. R., Sackett, P. R., Beatty, A. S., & Kiger, T. B. (in press). The role of socioeconomic status in SAT-Freshman grade relationships across gender and racial subgroups. Journal of Educational Measurement.
Rotundo, M., & Sackett, P. R. (1999). Effect of rater race on conclusions regarding differential prediction in cognitive ability tests. Journal of Applied Psychology, 84, 815–822.
Ryan, A. M., McFarland, L., Baron, H., & Page, R. (1999). An international look at selection practices: Nation and culture as explanations for variability in practice. Personnel Psychology, 52, 359–391.
Sackett, P. R., & Yang, H. (2000). Correction for range restriction: An expanded typology. Journal of Applied Psychology, 85, 112–118.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262–274.
Schmidt, F. L., Pearlman, K., & Hunter, J. E. (1980). The validity and fairness of employment and educational tests for Hispanic Americans: A review and analysis. Personnel Psychology, 33, 705–724.
Society for Industrial and Organizational Psychology, Inc. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: SIOP.
Trattner, M. H., & O’Leary, B. S. (1980). Sample sizes for specified statistical power in testing for differential validity. Journal of Applied Psychology, 65, 127–134.
Young, J. W., & Kobrin, J. L. (2001). Differential validity, differential prediction, and college admission testing: A comprehensive review and analysis (College Board Research Report No. 2001-6). New York: College Board.
Acknowledgments
This project was supported in part by the Meredith P. Crawford Fellowship from the Human Resources Research Organization. We thank the College Board for providing the data for this project. We also thank Richard Landers and Haoyu Yu for invaluable computer programing support.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Berry, C.M., Sackett, P.R. & Sund, A. The Role of Range Restriction and Criterion Contamination in Assessing Differential Validity by Race/Ethnicity. J Bus Psychol 28, 345–359 (2013). https://doi.org/10.1007/s10869-012-9284-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10869-012-9284-3