The ubiquity of common method variance: The case of the Big Five

https://doi.org/10.1016/j.jrp.2011.05.001Get rights and content

Abstract

The factor structures of the International Personality Item Pool (IPIP) and NEO-FFI Big Five questionnaires were examined via confirmatory factor analyses. Analyses of IPIP data for five samples and NEO data for one sample showed that a CFA model with three method bias factors, one influencing all items, one influencing negatively worded items, and one influencing positively worded items fit the data significantly better than models without method factors or models with only one method factor . With the method factors estimated, our results indicated that the Big Five dimensions may be more nearly orthogonal than previously demonstrated. Implications of the presence of method variance in Big Five scales are discussed.

Highlights

► CFA models with method factors fit Big Five questionnaire data significantly better than models without method factors. ► Models with a general method factor and method factors specific to item wording provided the best fit. ► Big Five dimensions in models estimating method factors were more nearly orthogonal than Big Five scale scores. ► Relationships to other variables suggest that the method factors represent substantive personality characteristics.

Introduction

In the past 30 years there has been resurgence in the study of personality in psychology due primarily to the discovery of a common factor structure underlying measures of personality characteristics. The dominant taxonomy is a lexically based five-factor structure originally developed within countries that use Northern European languages (e.g., Saucier & Goldberg, 2003). Most popularly known as the Big Five, this framework includes the traits of Extraversion (E), Agreeableness (A), Conscientiousness (C), Emotional Stability (S, often measured as Neuroticism), and Openness to Experience (O, sometimes measured as Intellect).

Despite the wide acceptance and application of this personality framework, several measurement-related issues have continued to challenge personality researchers. In particular, although conceived of as orthogonal dimensions of personality, correlations between summated scale scores on most Big Five personality tests are generally moderately positive (e.g., Digman, 1997, Mount et al., 2005). There are at least two explanations for this. The first is that the five factors commonly estimated are actually themselves correlated and perhaps indicators of higher order factors. More specifically, it has been suggested that the Big Five factors are indicators of the higher order factors of stability (as indicated by agreeableness, conscientiousness, and the inverse of neuroticism) and plasticity (as indicated by openness and extraversion) (DeYoung et al., 2001, Digman, 1997). Others have alternatively suggested that there may be one overriding personality factor, deemed evaluation (Goldberg and Somer, 2000, Saucier, 1997), the “Big One” (Musek, 2007) or the general factor of personality (GFP) (Rushton et al., 2008, Van der Linden et al., 2010).

A second explanation for the commonly identified positive relationships among Big Five scale scores is that there is a separate source of influence that affects responses to all items in these questionnaires, and that this influence is somehow distinct from that of the Big Five factors themselves. Often this type of shared influence across scores collected using a specific method is referred to as common method bias (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). The word bias in this context refers to the individual differences that become manifest when the same method is used across multiple personality scales. Associated with this common bias is the notion of common method variance, which, in the present context, can be understood as variance in Big Five scale item responses throughout a measure that is due to the influence of common method bias.

The existence of common method variance has been recognized in questionnaire research for many years (e.g., Cote and Buckley, 1987, Doty and Glick, 1998). Most research on this topic has been based on analyses of multitrait–multimethod data from a single measure, usually an isolated scale or domain score, per trait-method combination. Although helpful in highlighting the potential impact of common method bias, such investigations have not made it possible to separate within-dimension covariance from between-dimension covariance (Tomas, Hontangas, & Oliver, 2000). Indeed, until recently, the study of common method variance based on analyses of individual items or representative item parcels has been neglected. This is unfortunate, given that such analyses are necessary to properly estimate and compare within- and between-dimension variability (Marsh, Scalas, & Nagengast, 2010).

One of the first studies to permit this type of variability separation using Big Five measure data was Schmit and Ryan (1993), in which analyses of multiple item composites from each dimension revealed the potential for measures of the Big Five traits to include common method variance. Schmit and Ryan factor analyzed responses to item composites of the NEO-FFI (Costa & McCrae, 1989) within a work context using applicant and non-applicant samples. An exploratory factor analysis (EFA) of the non-applicant sample demonstrated the expected five-factor solution, but in the applicant sample, a six-factor solution fit the data best. Schmit and Ryan labeled this sixth factor an “ideal employee” factor, noting that it, “included a conglomerate of item composites from across four of the five subscales of the NEO-FFI” (Schmit & Ryan, 1993, p. 971). Interestingly, items from all five NEO-FFI subscales loaded on this factor, suggesting that the “ideal employee factor” represented a form of common method bias.

Beginning in the late 1990s confirmatory factor analyses (CFAs) were conducted of questionnaire items or parcels to identify and study method biases. These studies included analyses of data collected with the Rosenberg Self-Esteem scale (Marsh, 1996, Marsh et al., 2010, Motl and DeStefano, 2002, Tomás and Oliver, 1999) and investigations of the possibility that common method bias may represent or reflect respondent faking or socially desirable responding in certain situations (e.g., Biderman and Nguyen, 2004, Bäckström, 2007, Bäckström et al., 2009, Cellar et al., 1996, Klehe, 2011, Ziegler and Buehner, 2009). In these experimental studies, participants were typically asked to respond to Big Five measure items under faking and no-faking conditions. In the faking conditions of these studies, variance common to all items was represented by a single latent variable similar to what Podsakoff et al. (2003) labeled an “unmeasured method” effect. However, with the exception of Bäckström (2007) and Bäckström et al. (2009) the study of common method variance in Big Five questionnaires in nonapplicant honest conditions has received little attention. This is problematic, given that in most scenarios, participants are instructed to do precisely that – respond honestly.

This limitation of previous research in this area combined with the fact that most personality assessment is by self-reported completion of personality inventories, leaves a major deficit in our understanding of what is actually being assessed when we use common personality measures such as those designed to capture the Big Five traits. Further complicating matters is a common recommendation for developing and/or choosing assessment items for self-report measures, that encourages the inclusion of both positively worded items (“I am the life of the party”) and negatively worded items (e.g., “I don’t talk a lot”) in a single scale. The logic behind including both types of items is that their presence might reduce the effects of participant response tendencies such as acquiescence (DeVellis, 1991, Nunnally, 1978, Spector, 1998). This recommendation has been so widely shared that the practice of using negatively worded items to presumably counteract respondents’ acquiescence can be found throughout most areas of organizational research including personality assessment (e.g., Paulhus, 1991), leadership behavior (e.g., Schriesheim and Eisenbach, 1995, Schriesheim and Hill, 1981), role stress (Rizzo, House, & Lirtzman, 1970), job characteristics (Harvey, Billings, & Nilan, 1985), and organizational commitment (e.g., Meyer & Allen, 1984).

Unfortunately, the negatively worded items that were introduced to counter individuals’ response tendencies have been found to increase systematic and perhaps construct-irrelevant variance in scale scores in studies: (a) of self-esteem (e.g., Hensley and Roberts, 1976, Marsh, 1996, Marsh et al., 2010, Motl and DeStefano, 2002, Tomás and Oliver, 1999), (b) using Rizzo, House, and Lirtzman’s (1970) role conflict and role ambiguity scale (McGee, Ferguson, & Seers, 1989), (c) using Meyer and Allen’s (1984) Organizational Commitment scale (Magazine, Williams, & Williams, 1996), (d) using Spector’s (1988) Work Locus of Control Scale, and (e) using Hackman and Oldham’s (1975) Job Diagnostic Survey (Idaszak & Drasgow, 1987). In addition to increased “noise” interjected by such items and the potential multidimensionality introduced by negatively worded items, the inclusion of such items in leadership behavior measures has been shown to decrease a scale’s reliability and validity (Schriesheim and Eisenbach, 1995, Schriesheim and Hill, 1981).

Recently, Marsh et al. (2010), using confirmatory factor analyses, provided evidence for two conclusions regarding the factorial structure of questionnaires employing negatively worded items. First, Marsh et al. found that a model with two method factors (one influencing only positively worded items and the other influencing only negatively worded items) fit the data of the Rosenberg Self Esteem (RSE) scale better than models without method factors and better than models with only one wording-type factor. Although other researchers had found item wording influences associated with negatively worded items (e.g., DiStefano & Motl, 2006), Marsh et al.’s results provided evidence for analogous influences associated with positively worded items. Marsh et al. further found, based on longitudinal models, that positive and negative influences were not sporadic and spontaneous, but substantive and stable over time. These two findings coupled with other studies in which method factors have been implicated (Cote and Buckley, 1987, Doty and Glick, 1998) suggest that method effects including item-wording specific method effects may be influential whenever personality is assessed using self-report questionnaires.

Given the mounting evidence for the prevalence of common method variance in personality assessment and the increasing usage of personality assessments in organizational research and practice, it is surprising that few attempts have been made to examine the effects of method bias and item wording biases on the factor structure of Big Five measures. As mentioned previously, studies estimating a common method factor have for the most part focused on identifying socially desirable responding. Apart from those above-mentioned studies, there have been no published CFA models of Big Five questionnaire data that have included item-wording factors. For all the reasons already stated, the main purpose of the present study was to closely examine the factor structures of two commonly used Big Five questionnaires, the IPIP and NEO-FFI with the intent of assessing the extent to which responses to items in these questionnaires are influenced by method factors and/or wording-specific method factors. This was done in a fashion similar to that used by Marsh et al. (2010), by comparing CFA models with different assumptions concerning general method factors and wording-specific method factors.

The specific models that were compared in this study are presented in Fig. 1. Within this figure, Model 1 is a basic CFA of a Big Five questionnaire with correlated trait factors but no method factor. If there were a common method influence on all 50 items of this instrument, the presence of such an influence would have to be accommodated in the model by increased positive correlations among the factors (Paglis and William, 1996, Williams and Brown, 1994). In Model 2, a single method factor, M, has been added to the basic CFA of Model 1 (e.g., Bäckström, 2007, Bäckström et al., 2009, Cellar et al., 1996). In this model, M is defined as an “unmeasured” method factor in that it has no unique indicators but rather is estimated from indicators of the Big Five factors. M is a first order factor whose indicators are items which also are indicators of the Big Five factors, not a higher order factor. This type of model in which observed variables indicate multiple factors has been called a bifactor model (e.g., Chen, West, & Sousa, 2006).

Model 3 is a model with two separate first order method factors – one influencing only positively worded items and the other influencing only negatively worded items. This is analogous to the model found to best represent the RSE data previously mentioned. In that study, Marsh et al. (2010) had to restrict the two method factors to orthogonality due to the fact that the RSE assesses only one trait dimension. Because Big Five questionnaires assess five trait dimensions, it is possible to apply a slightly more elaborate model than that of Marsh et al.’s, estimating the covariance between the two item-wording factors. Note that Model 2 is a special case of Model 3 in which the correlation between the two separate item-wording factors is set to one.

Models 4–6 are generalizations of Model 2 with both an overall common method factor and one or more method factors specific to a wording type – either a positive wording method factor in Model 4 or a negative wording factor in Model 5, or both in Model 6. In these models, the correlations between method factors are restricted to zero. Because the method factors were restricted to orthogonality in these models, these models are not generalizations of Model 3 although they are generalizations of Model 2. Because of this, it is not possible to use chi-square difference tests to compare Models 4–6 with Model 3, although they can be compared using this test with Model 2. The last model in the series compared here, Model 7, was a simple generalization of Model 6 in which the correlation between Mp and Mn was estimated. Only Mp and Mn were allowed to covary in Model 7. Mp and Mn were constrained to be orthogonal to M. Model 7 was a generalization of Model 3 and thus chi-square difference tests could be used to compare the fit of Model 7 with that of Model 3.

To be clear, the conceptualization of Models 3 vs. 4 through 7 are quite different. For Model 3, each of the two method biases influence only one type of item – Mp influences only positively worded items and Mn influences only negatively worded items. This acknowledges the distinction between positive and negative wordings, but does not account for any general influence that might operate on all the items, such as faking or socially desirable responding. Models 4 through 7, however, include a general influence while allowing for a second, wording specific influence on either positively worded or negatively worded items or both. Thus, if there are individual differences in a personal characteristic affecting all the items, such as socially desirable responding, for example, those would be represented by M. On the other hand, individual differences specific to item wording would be represented by either Mp or Mn.

In addition to investigating the need to incorporate method bias factors in the analysis of Big Five questionnaire data, we addressed an implication of the presence of method bias for estimated relationships involving Big Five dimensions. Specifically, we investigated the extent to which Big Five dimensions are correlated after common method variance is taken into account. Whether the resulting correlations are essentially zero or even negative is a key issue for higher order factor theories of the Big Five (Digman, 1997, Musek, 2007). If the dimensions remain positively correlated, this leaves open the possibility that higher order factors could account for those correlations. Alternatively, uncorrelated Big Five dimensions after accounting for method variance would be evidence against the possibility that there are substantively meaningful higher order dimensions of which the Big Five factors are indicators.

Based on the above discussion, we entertained the following hypotheses.

  • H1: Adding a common method factor will significantly improve the measurement model fit.

  • H2: Models with separate item wording method factors will have significantly better fit than a model estimating only one method factor.

Although the results of studies of socially desirable responding suggest that common variance is increased when instructions or incentives to fake are present, we have no evidence that such an increase is due to the influence of a single bias factor, M, or to the joint influences of both Mp and Mn. For this reason, we had no specific reason to expect differences between Model 3 vs. Models 4 through 7 so no hypotheses concerning differences in fit between them are presented and the analyses comparing them are treated as exploratory.

As argued above, the presence of interitem correlations would affect estimates of correlations between the Big Five latent variables in models without common method factors. Failure to account for influences of such factors would positively bias estimates of correlations between the Big Five dimensions. As it is also not possible to account for method factors when correlating scale scores, we would expect correlations between scale scores to be similarly positively biased. Accounting for interitem correlations with method factors, however, should reduce the positive bias in estimates of the Big Five dimensions. For this reason we propose the following hypothesis.

  • H3: Correlations between factors in models that include common method factors will be less positive than the correlations between raw scale scores.

On the assumption that H3 would be supported, we explored the extent to which the estimates of Big Five dimensions changed when those estimates were obtained within the context of method bias models.

Section snippets

Participants

Data for this research came from five separate samples.

Results

Space limitations prevent the presentation of covariance matrices of all individual items, although those are available from the first author. To provide some indication of the comparability of these five datasets with others, Table 1 presents correlations of Big Five scale scores for all six administrations of questionnaires – IPIP questionnaires from all five samples and the NEO from one sample – along with means, standard deviations, and reliability coefficients for these scales. Inspection

Discussion

The objectives of the present study were to investigate the extent to which the data of two widely used Big Five personality trait questionnaires were affected by common method biases and to examine the nature and implications of such biases. Results suggest that the measurement model of the Big Five should take into account two types of method bias – one general bias factor influencing all items and a second type of bias factor influencing items worded either positively or negatively. The

Conclusions

Common method variance has been an aspect of responses to personality and other questionnaires of which investigators have been long aware but at the same time has been long neglected. Now may be the time to begin to understand it and to investigate the extent to which it affects other aspects of behavior. The payoff may be the understanding of characteristics of personality that have been hidden in personality questionnaires all along.

References (74)

  • M. Bäckström

    Higher-order factors in a five-factor personality inventory and its relation to social desirability

    European Journal of Psychological Assessment

    (2007)
  • Biderman, M. D., & Nguyen, N. T. (2004). Structural equation models of faking ability in repeated measures designs. In...
  • Biderman, M. D., & Nguyen, N. T. (2009). Measuring faking propensity. In Paper presented at the 24rd annual conference...
  • Biderman, M. D., Nguyen, N. T., Mullins, B., & Luna, J. (2008). A method factor predictor of performance ratings. In...
  • Biderman, M. D., Nguyen, N. T., & Cunningham, C. L. (2009). Common method variance in NEO-FFI and IPIP personality...
  • Biderman, M. D., Nguyen, N. T., & Cunningham, C. L. (2011). A method factor measure of self-concept. In Paper accepted...
  • K.A. Bollen et al.

    Detection and determinants of bias in subjective measures

    American Sociological Review

    (1998)
  • D.F. Cellar et al.

    Comparison of factor structures and criterion-related validity coefficients for two measures of personality based on the five factor model

    Journal of Applied Psychology

    (1996)
  • F.F. Chen et al.

    A comparison of bifactor and second-order models of quality of life

    Multivariate Behavioral Research

    (2006)
  • J.L. Cordery et al.

    Responses to the original and revised Job Diagnostic Survey: Is education a factor in responses to negatively worded items?

    Journal of Applied Psychology

    (1993)
  • P.T. Costa et al.

    The NEO PI/FFI manual supplement

    (1989)
  • C.G. Costello et al.

    Scales for measuring depression and anxiety

    The Journal of Psychology

    (1967)
  • J.A. Cote et al.

    Estimating trait, method, and error variance. Generalizing across 70 construct validation studies

    Journal of Marketing Research

    (1987)
  • C.J.L. Cunningham

    Need for recovery and ineffective self-management

    Dissertation Abstracts International: Section B: The Sciences and Engineering

    (2007)
  • Damron, J. (2004). An examination of the fakeability of personality questionnaires: Faking for specific jobs....
  • R.F. DeVellis

    Scale development: Theory and applications

    (1991)
  • C.G. DeYoung et al.

    Higher-order factors of the big five predict conformity: Are there neuroses of health?

    Personality and Individual Differences

    (2001)
  • J.M. Digman

    Higher order factors of the Big Five

    Journal of Personality and Social Psychology

    (1997)
  • C. DiStefano et al.

    Further investigating method effects associated with negatively worded items on self-report surveys

    Structural Equation Modeling

    (2006)
  • D.H. Doty et al.

    Common methods bias: Does common methods variance really bias results?

    Organizational Research Methods

    (1998)
  • L.R. Goldberg

    A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models

  • L.R. Goldberg et al.

    The hierarchical structure of common Turkish person-descriptive adjectives

    European Journal of Personality

    (2000)
  • J. Grice

    Computing and evaluating factor scores

    Psychological Methods

    (2001)
  • J.R. Hackman et al.

    Development of the job diagnostic survey

    Journal of Applied Psychology

    (1975)
  • R.J. Hall et al.

    Item parceling strategies in SEM: Investigating the subtle effects of unmodeled secondary constructs

    Organizational Research Methods

    (1999)
  • R.J. Harvey et al.

    Confirmatory factor analysis of the Job Diagnostic Survey: Good news and bad news

    Journal of Applied Psychology

    (1985)
  • W.E. Hensley et al.

    Dimensions of Rosenberg’s self-esteem scale

    Psychological Reports

    (1976)
  • Cited by (64)

    • Accounting for the evaluative factor in self-ratings provides a more accurate estimate of the relationship between personality traits and well-being

      2021, Journal of Research in Personality
      Citation Excerpt :

      Chen et al. (1997) found rated item popularity to be strongly related to endorsing positive and negative affect items. Biderman et al. (2011) extracted common method variance from personality items and found that controlling for this variance reduced the correlations between the Big Five traits and positive and negative affect. Taken together, the estimated influence of social desirability appears to vary across different methodologies, and item-level analyses might be particularly sensitive at detecting the presence of social desirability.

    • A preregistered study of the relationship between childhood socioeconomic background, life history strategies and conformity

      2021, Journal of Research in Personality
      Citation Excerpt :

      This could be occurring for several reasons. First the relationship between subjective SES and life history strategies can be magnified due to common method bias (Biderman, Nguyen, & Cunningham, 2011), which applies more to items that require subjective assessments (such as subjective SES items and Life history strategy items) than to items that ask for a more objective assessment. A second reason may be that participants cannot accurately calculate or recall the income which may result in a less accurate objective measure and lead to a weaker link between objective SES and life history strategies.

    • Construct validity of global life-satisfaction judgments: A look into the black box of self–informant agreement

      2020, Journal of Research in Personality
      Citation Excerpt :

      These bias factors were therefore a stand in for correlated residuals of all ratings made by each unique rater. Although other ways to model bias exist such as the bi-factor model (see Biderman, Nguyen, Cunningham, & Ghorbani, 2011), since all items were positively worded, such modelling of shared-method bias was not appropriate here. The bias parameters for the domain satisfaction judgments were constrained to be equal, while the bias loadings for the personality scales were allowed to vary freely.

    View all citing articles on Scopus
    View full text