Original Articles

The Controversy over Null Hypothesis Significance Testing Revisited

Nekane Balluerka

University of the Basque Country, San Sebastian, Spain

Search for more papers by this author

Juana Gómez

University of Barcelona, Spain

Search for more papers by this author

, and

Dolores Hidalgo

University of Murcia, Spain

Search for more papers by this author

Published Online:September 01, 2006https://doi.org/10.1027/1614-1881.1.2.55

Abstract

Abstract. Null hypothesis significance testing (NHST) is one of the most widely used methods for testing hypotheses in psychological research. However, it has remained shrouded in controversy throughout the almost seventy years of its existence. The present article reviews both the main criticisms of the method as well as the alternatives which have been put forward to complement or replace it. It focuses basically on those alternatives whose use is recommended by the Task Force on Statistical Inference (TFSI) of the APA (Wilkinson and TFSI, 1999) in the interests of improving the working methods of researchers with respect to statistical analysis and data interpretation. In addition, the arguments used to reject each of the criticisms levelled against NHST are reviewed and the main problems with each of the alternatives are pointed out. It is concluded that rigorous research activity requires use of NHST in the appropriate context, the complementary use of other methods which provide information about aspects not addressed by NHST, and adherence to a series of recommendations which promote its rational use in psychological research.

References

Allen, M. , Preiss, R. (1993). Replication and meta-analysis: A necessary connection. Journal of Social Behavior and Personality, 8(6), 9– 20 First citation in article Google Scholar
Abelson, R.P. (1995). Statistics as principled argument . Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Abelson, R.P. (1997). A retrospective on the significance test ban of 1999 (if there were no significance tests, they would be invented). In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 117-144). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
American Psychological Association (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author First citation in article Google Scholar
Bakan, D. (1966). The tests of significance in psychological research. Psychological Bulletin, 66, 423– 437 First citation in article Crossref, Google Scholar
Baril, G.L. , Cannon, J.T. (1995). What is the probability that null hypothesis testing is meaningless?. American Psychologist, 50, 1098– 1099 First citation in article Crossref, Google Scholar
Berger, J.O. , Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of P values and evidence. Journal of the American Statistical Association, 82, 112– 122 First citation in article Google Scholar
Berkson, J. (1938). Some difficulties of interpretation encountered in the application of the χ² test. Journal of the American Statistical Association, 33, 526– 542 First citation in article Crossref, Google Scholar
Binder, A. (1963). Further considerations on testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review, 70, 107– 115 First citation in article Crossref, Google Scholar
Bleymüller, J. , Gehlert, G. , Gülicher, H. (1988). Statistik für Wirtschaftswissenschaften (5. Aufl) . München: Vahlen First citation in article Google Scholar
Bracey, G.W. (1991). Sense, non-sense, and statistics. PhiDelta Kappan, 73, 335– First citation in article Google Scholar
Branstätter, E. (1999). Confidence intervals as an alternative to significance testing. Methods of Psychological Research Online, 4(2), 33– 46 First citation in article Google Scholar
Brewer, J.K. (1985). Behavioral statistics textbooks: Source of myths and misconceptions?. Journal of Educational Statistics, 10, 252– 268 First citation in article Crossref, Google Scholar
Carver, R.P. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378– 399 First citation in article Crossref, Google Scholar
Carver, R.P. (1993). The case against statistical significance testing, revisited. Journal of Experimental Education, 61, 287– 292 First citation in article Crossref, Google Scholar
Chow, S.L. (1987). Experimental psychology: Rationale, procedures and issues . Calgary, Alberta, Canada: Detselig Enterprises First citation in article Google Scholar
Chow, S.L. (1988). Significance test or effect size?. Psychological Bulletin, 103, 105– 110 First citation in article Crossref, Google Scholar
Chow, S.L. (1989). Significance tests and deduction: Reply to Folger (1989). Psychological Bulletin, 106, 161– 165 First citation in article Crossref, Google Scholar
Chow, S.L. (1991). Some reservations about power analysis. American Psychologist, 46, 1088– 1089 First citation in article Crossref, Google Scholar
Chow, S.L. (1996). Statistical significance: Rationale, validity, and utility . Beverly Hills, CA: Sage First citation in article Google Scholar
Chow, S.L. (1998a). Précis of statistical significance: Rationale, validity, and utility. Behavioral and Brain Sciences, 21, 169– 239 First citation in article Crossref, Google Scholar
Chow, S.L. (1998b). What statistical significance means. Theory and Psychology, 8, 323– 330 First citation in article Crossref, Google Scholar
Cleveland, W.S. (1993). Visualizing data . Summit, NJ: Hobart First citation in article Google Scholar
Cleveland, W.S. , McGill, M.E. Eds. (1988). Dynamic graphics for statistics . Belmont, CA: Wadsworth First citation in article Google Scholar
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145– 153 First citation in article Crossref, Google Scholar
Cohen, J. (1987). Statistical power analysis for the behavioral sciences (rev. ed.). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304– 1312 First citation in article Crossref, Google Scholar
Cohen, J. (1994). The earth is round (p < .5). American Psychologist, 49, 997– 1003 First citation in article Crossref, Google Scholar
Cook, T.D. , Campbell, D.T. (1979). Quasi-experimentation: Design and analysis issues for field settings . Chicago: Rand McNally First citation in article Google Scholar
Cooper, H.M. (1979). Statistically combining independent studies: A meta-analysis of sex differences in conformity research. Journal of Personality and Social Psychology, 37, 131– 146 First citation in article Crossref, Google Scholar
Cooper, H.M. , Rosenthal, R. (1980). Statistical versus traditional procedures for summarizing research findings. Psychological Bulletin, 87, 442– 449 First citation in article Crossref, Google Scholar
Cortina, J.M. , Dunlap, W.P. (1997). On the logic and purpose of significance testing. Psychological Methods, 2, 161– 172 First citation in article Crossref, Google Scholar
Cowles, M. (1989). Statistics in psychology: An historical perspective . Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Cowles, M. , Davis, C. (1982). On the origins of the .5 level of statistical significance. American Psychologist, 37, 553– 558 First citation in article Crossref, Google Scholar
Cox, D.R. (1977). The role of significance tests. Scandinavian Journal of Statistics, 4, 49– 70 First citation in article Google Scholar
Cronbach, L.J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30, 116– 127 First citation in article Crossref, Google Scholar
Cronbach, L.J. , Snow, R.E. (1977). Aptitudes and instructional methods: A handbook for research on interactions . New York: Irvington First citation in article Google Scholar
Crow, E.L. (1991). Response to Rosenthal's comment “How are we doing in soft psychology. rdquo; American Psychologist, 46, 1083– First citation in article Crossref, Google Scholar
Cumming, G. , Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532– 574 First citation in article Crossref, Google Scholar
Dar, R. (1987). Another look at Meehl, Lakatos, and the scientific practices of psychologists. American Psychologist, 42, 145– 151 First citation in article Crossref, Google Scholar
Dar, R. , Serlin, R.C. , Omer, H. (1994). Misuse of statistical tests in three decades of psychotherapy research. Journal of Consulting and Clinical Psychology, 62, 75– 82 First citation in article Crossref, Google Scholar
Dixon, P. (1998). Why scientists value p values. Psychonomic Bulletin and Review, 5, 390– 396 First citation in article Crossref, Google Scholar
Dooling, D. , Danks, J.H. (1975). Going beyond tests of significance: Is psychology ready?. Bulletin of the Psychonomic Society, 5, 15– 17 First citation in article Crossref, Google Scholar
Edwards, W. (1965). Tactical note on the relation between scientific and statistical hypotheses. Psychological Bulletin, 63, 400– 402 First citation in article Crossref, Google Scholar
Erwin, E. (1998). The logic of null hypothesis testing. Behavioral and Brain Sciences, 21, 197– 198 First citation in article Crossref, Google Scholar
Falk, R. (1986). Misconceptions of statistical significance. Journal of Structural Learning, 9, 83– 96 First citation in article Google Scholar
Falk, R. , Greenbaum, C.W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory and Psychology, 5, 75– 98 First citation in article Crossref, Google Scholar
Fidler, F. (2002). The fifth edition of the APA publication manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement, 62 (5), 749– 770 First citation in article Crossref, Google Scholar
Fidler, F. , Thompson, B. (2001). Computing correct confidence intervals for ANOVA fixed-and random-effects effect sizes. Educational and Psychological Measurement, 61, 575– 604 First citation in article Google Scholar
Finch, S. , Cumming, G. , Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61, 181– 210 First citation in article Google Scholar
Fisher, R.A. (1925). Statistical methods for research workers . London: Oliver & Boyd First citation in article Google Scholar
Fisher, R.A. (1931). Introduction. In J.R. Airey (Ed.), Table of Hh functions (pp. xxvi-xxxv). London: British Association First citation in article Crossref, Google Scholar
Fisher, R.A. (1935). The design of experiments . London: Oliver & Boyd First citation in article Google Scholar
Folger, R. (1989). Significance tests and the duplicity of binary decisions. Psychological Bulletin, 106, 155– 160 First citation in article Crossref, Google Scholar
Frick, R.W. (1995). Accepting the null hypothesis. Memory & Cognition, 23(1), 132– 138 First citation in article Crossref, Google Scholar
Frick, R.W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1, 379– 390 First citation in article Crossref, Google Scholar
Gigerenzer, G. (1993). The Superego, the Ego, and the Id in statistical reasoning. In G. Keren, & C. Lewis (Eds.), A handbook for data analysis in the behavioral science: Volume 1. Methodological issues (pp. 311-339). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Gigerenzer, G. , Murray, D.J. (1987). Cognition as intuitive statistics . Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Gigerenzer, G. , Swijtink, Z. , Porter, T, , Daston, L, , Beatty, J. , Krüger, L. (1989). The empire of chance: How probability changed science and everyday life . Cambridge, UK: Cambridge University Press First citation in article Crossref, Google Scholar
Glass, G.V. (1976). Primary, secondary and meta-analysis of research. Educational Researcher, 5, 3– 8 First citation in article Crossref, Google Scholar
Glass, G.V. , McGaw, B, , Smith, M.L. (1981). Meta-analysis in social research . Beverly Hills, CA: Sage First citation in article Google Scholar
Gorsuch, R.L. (1991). Things learned from another perspective (so far). American Psychologist, 46, 1089– 1090 First citation in article Crossref, Google Scholar
Grant, D.A. (1962). Testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review, 69, 54– 61 First citation in article Crossref, Google Scholar
Greenland, S. (1998). Meta-analysis. In K. Rothman & S. Greenland (Eds.). Modern epidemiology. Philadelphia: Lippincott-Raven First citation in article Google Scholar
Greenwald, A.G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1– 20 First citation in article Crossref, Google Scholar
Greenwald, A.G. (1993). Consequences of prejudice against the null hypothesis. In G. Kerens, & C. Lewis (Eds.). A handbook for data analysis in the behavioral sciences: Volume 1. Methodological issues (pp. 419-448). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Greenwald, A.G. , Gonzalez, R. , Harris, R.J. , Guthrie, D. (1996). Effect sizes and p-values: What should be reported and what should be replicated?. Psychophysiology, 33, 175– 183 First citation in article Crossref, Google Scholar
Guttman, L. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic Models and Data Analysis, 1, 3– 10 First citation in article Crossref, Google Scholar
Hagen, R.L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52(1), 15– 24 First citation in article Crossref, Google Scholar
Haller, H. , Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers?. Methods of Psychological Research Online, 7(1), 1– 20 First citation in article Google Scholar
Harris, R.J. (1991). Significance tests are not enough: The role of effect-size estimation in theory corroboration. Theory and Psychology, 1, 375– 382 First citation in article Crossref, Google Scholar
Hayes, W.L. (1963). Statistics for psychologists . New York: Holt, Rinehart & Winston First citation in article Google Scholar
Hayes, A.F. (1998). Reconnecting data analysis and research designs: Who needs a confidence interval?. Behavioral and Brain Sciences, 21, 203– 204 First citation in article Crossref, Google Scholar
Hays, W.L. (1994). Statistics (4th ed.). New York: Holt, Rinehart and Winston First citation in article Google Scholar
Howard, G.S. , Maxwell, S.E. , Fleming, K.J. (2000). The proof of the pudding: An illustration of the relative strengths of null hypothesis, meta-analysis, and bayesian analysis. Psychological Methods, 5, 315– 332 First citation in article Crossref, Google Scholar
Hubbard, R. (1995). The Earth is highly significantly round (p < .001). American Psychologist, 50, 1098– First citation in article Crossref, Google Scholar
Hubbard, R. , Armstrong, J.S. (1994). Replications and extensions in Marketing: Rarely published but quite contrary. International Journal of Research in Marketing, 11, 233– 248 First citation in article Crossref, Google Scholar
Hubbard, R. , Parsa, A.R. , Luthy, M.R. (1997). The spread of statistical significance testing in psychology: The case of the Journal of Applied Psychology, 1917-1994. Theory and Psychology, 7, 545– 554 First citation in article Crossref, Google Scholar
Hubbard, R. , Ryan, P.A. (2000). The historical growth of statistical significance testing in psychology and its future prospects. Educational and Psychological Measurement, 60, 661– 681 First citation in article Google Scholar
Huberty, C.J. (1987). On statistical testing. Educational Researcher, 16(8), 4– 9 First citation in article Crossref, Google Scholar
Hunter, J.E. (1997). Need: A ban on the significance test. Psychological Science, 8, 3– 7 First citation in article Crossref, Google Scholar
Hunter, J.E. , Schmidt, F.L. (1990). Methods of meta-analysis: Correcting error and bias in research findings . Newbury Park, CA: Sage First citation in article Google Scholar
Jeffreys, H. (1934). Probability and scientific method. Proceedings of the Royal Society of London, Series A, 146, 9– 16 First citation in article Crossref, Google Scholar
Johnson, D.H. (1999). The insignificance of statistical significance testing. Journal of Wildlife Management, 63, 763– 772 First citation in article Crossref, Google Scholar
Kazdin, A.E. , Bass, D. (1989). Power to detect differences between alternative treatments in comparative psychotherapy outcome research. Journal of Consulting and Clinical Psychology, 57, 138– 147 First citation in article Crossref, Google Scholar
Kirk, R.E. (1996). Practical significance: a concept whose time has come. Educational and Psychological Measurement, 56, 746– 759 First citation in article Crossref, Google Scholar
Kirk, R.E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61, 213– 218 First citation in article Crossref, Google Scholar
Krueger, J. (2001). Null hypothesis significance testing. On the survival of a flawed method. American Psychologist, 56, 16– 26 First citation in article Crossref, Google Scholar
Kupfersmid, J. (1988). Improving what is published: A model in search of an editor. American Psychologist, 43, 635– 642 First citation in article Crossref, Google Scholar
Levin, J.R. (1998). To test or not to test H₀?. Educational and Psychological Measurement, 58, 313– 333 First citation in article Crossref, Google Scholar
Lindgren, B.W. (1976). Statistical theory (3rd ed.). New York: Macmillan First citation in article Google Scholar
Lindley, D.V. (1957). A statistical paradox. Biometrika, 44, 187– 192 First citation in article Crossref, Google Scholar
Lindsay, R.M. , Ehrenberg, A.S.C. (1993). The design of replicated studies. American Statistician, 47, 217– 228 First citation in article Google Scholar
Loftus, G.R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36, 102– 105 First citation in article Crossref, Google Scholar
Loftus, G.R. (1993). A picture is worth a thousand p values: On the irrelevance of hypothesis testing in the microcomputer age. Behavior Research Methods, Instruments and Computers, 25, 250– 256 First citation in article Crossref, Google Scholar
Loftus, G.R. (1995). Data analysis as insight: Reply to Morrison and Weaver. Behavior Research Methods, Instruments and Computers, 27, 57– 59 First citation in article Crossref, Google Scholar
Loftus, G.R. (1996). Psychology will be a much better science when we change the way to analyse data. Current Directions in Psychological Science, 5, 161– 171 First citation in article Crossref, Google Scholar
Loftus, G.R. , Masson, M.E. (1994). Using confidence intervals in within-subject designs. Psychonomic Bulletin and Review, 1, 476– 490 First citation in article Crossref, Google Scholar
Lykken, D. (1968). Statistical significance in psychological research. Psychological Bulletin, 70, 151– 159 First citation in article Crossref, Google Scholar
Markus, K.A. (2001). The converse inequality argument against tests of statistical significance. Psychological Methods, 6, 147– 160 First citation in article Crossref, Google Scholar
McGaw, K.O. (1991). Problems with the BESD: A comment on Rosenthal's “How are we doing in soft psychology?”. American Psychologist, 46(10), 1084– 1086 First citation in article Crossref, Google Scholar
McGaw, K.O. (1995). Determining false alarm rates in null hypothesis testing research. American Psychologist, 50, 1099– 1100 First citation in article Crossref, Google Scholar
Meehl, P.E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103– 115 First citation in article Crossref, Google Scholar
Meehl, P.E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806– 834 First citation in article Crossref, Google Scholar
Meehl, P.E. (1990a). Appraising and amending theories: The strategy of Lakatosian defence and two principles that warrant it. Psychological Inquiry, 1, 108– 141 First citation in article Crossref, Google Scholar
Meehl, P.E. (1990b). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66, 195– 244 First citation in article Crossref, Google Scholar
Meehl, P.E. (1991). Why summaries of research on psychological theories are often uninterpretable. In R.E. Snow, & D.E. Wilet (Eds.), Improving inquiry in social science: A volume in honor of Lee J. Cronbach. (pp. 13-59). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Meehl, P.E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 391-423). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Morrison, D.E. , Henkel, R.E. (1970). Eds. The significance test controversy: A reader . Chicago: Aldire First citation in article Google Scholar
Murphy, K.R. (1990). If the null hypothesis is impossible, why test it?. American Psychologist, 45, 403– 404 First citation in article Crossref, Google Scholar
Murphy, K.R. , Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. Journal of Applied Psychology, 84, 234– 248 First citation in article Crossref, Google Scholar
Neyman, J. , Pearson, E.S. (1928a). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20A, 175– 263 First citation in article Google Scholar
Neyman, J. , Pearson, E.S. (1928b). On the use and interpretation of certain test criteria for purposes of statistical inference: Part II. Biometrika, 20A, 264– 294 First citation in article Google Scholar
Neyman, J. , Pearson, E.S. (1933). On the testing of statistical hypotheses in relation to probabilities a priori. Proceedings of the Cambridge Philosophical Society, 28, 492– First citation in article Crossref, Google Scholar
Nickerson, R.S. (2000). Null hypothesis significance testing: A review of and old and continuing controversy. Psychological Methods, 5, 241– 301 First citation in article Crossref, Google Scholar
Nunnally, J. (1960). The place of statistics in psychology. Educational and Psychological Measurement, 20, 641– 650 First citation in article Crossref, Google Scholar
Oakes, M. (1986). Statistical inference: A commentary for social and behavioral sciences . New York: Wiley First citation in article Google Scholar
Parker, S. (1995). The “difference of means” may not be the “effect size.”. American Psychologist, 50, 1101– 1102 First citation in article Crossref, Google Scholar
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50, 157– 175 First citation in article Crossref, Google Scholar
Pearson, E. , Hartley, H. (1972). Biometrika tables for statisticians (Vol. 2). Cambridge, UK: Cambridge University Press First citation in article Google Scholar
Pollard, P. (1993). How significant is “significance”?. In G. Keren, & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Volume 1. Methodological issues. Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Popper, K.R. (1959). The logic of scientific discovery . New York: Basic Books First citation in article Google Scholar
Robinson, D. , Levin, J. (1997). Reflections on statistical and substantive significance, with a slice of replication. Educational Researcher, 26(5), 21– 26 First citation in article Google Scholar
Robinson, D.H. , Wainer, H. (2001). On the past and future of null hypothesis significance testing . Princeton: Statistics & Research Division First citation in article Google Scholar
Rosenthal, R. (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4– 13 First citation in article Crossref, Google Scholar
Rosenthal, R. (1984). Meta-analytic procedures for social research . Beverly Hills, CA: Sage First citation in article Google Scholar
Rosenthal, R. (1993). Cumulating evidence. In G. Keren, & C. Lewis (Eds.), A handbook of data analysis in the behavioral sciences: Volume 1. Methodological issues (pp. 519-559). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Rosenthal, R. , Rubin, D.B. (1994). The counternull value of an effect size: A new Statistic. Psychological Science, 5, 329– 334 First citation in article Crossref, Google Scholar
Rosnow, R.L. , Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276– 1284 First citation in article Crossref, Google Scholar
Rossi, J.S. (1990). Statistical power of psychological research: What have we gained in 20 years?. Journal of Consulting and Clinical Psychology, 58, 646– 656 First citation in article Crossref, Google Scholar
Rossi, J.S. (1997). A case study in the failure of psychology as a cumulative science: The spontaneous recovery of verbal learning. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 175-197). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Rouanet, H. (1996). Bayesian methods for assessing importance of effects. Psychological Bulletin, 119, 149– 158 First citation in article Crossref, Google Scholar
Rozeboom, W.W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin, 57, 416– 428 First citation in article Crossref, Google Scholar
Schmidt, F.L. (1992). What do data really mean?. American Psychologist, 47, 1173– 1181 First citation in article Crossref, Google Scholar
Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115– 129 First citation in article Crossref, Google Scholar
Schmidt, F.L. (2002). Are there benefits from NHST?. American Psychologist, 57, 65– 71 First citation in article Crossref, Google Scholar
Schmidt, F.L. , Hunter, J.E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 37-64). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Sedlmeier, P. , Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies?. Psychological Bulletin, 105, 309– 316 First citation in article Crossref, Google Scholar
Serlin, R.C. , Lapsley, D.K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 73– 83 First citation in article Crossref, Google Scholar
Serlin, R.C. , Lapsley, D.K. (1993). Rational appraisal of psychological research and the good-enough principle. In G. Keren, & C. Lewis (Eds.), A handbook of data analysis in behavioral sciences: Volume 1. Methodological issues, (pp. 199- 228). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Schafer, W.D. (1993). Interpreting statistical significance and nonsignificance. Journal of Experimental Education, 61, 383– 387 First citation in article Crossref, Google Scholar
Shafer, G. (1982). Lindley's paradox. Journal of the American Statistical Association, 77, 325– 334 First citation in article Crossref, Google Scholar
Shaver, J. (1985). Chance and nonsense: A conversation about interpreting tests of statistical significance. PhiDelta Kappan, 67(1), 138– 141 First citation in article Google Scholar
Shaver, J. (1993). What statistical significance testing is, and what is not. Journal of Experimental Education, 61, 293– 316 First citation in article Crossref, Google Scholar
Snyder, P. , Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334– 349 First citation in article Crossref, Google Scholar
Snow, R.E. (1998). Inductive strategy and statistical tactics. Behavioral and Brain Sciences, 21, 219– First citation in article Crossref, Google Scholar
Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61, 305– 632 First citation in article Crossref, Google Scholar
Steiger, J.H. , Fouladi, R.T. (1997). Noncentrally interval estimation and the evaluation of statistical models. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 221-258). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Strahan, R.F. (1991). Remarks on the binomial effect size display. American Psychologist, 46, 1083– 1084 First citation in article Crossref, Google Scholar
Student, [W.S. Gosset] (1908). The probable error of a mean. Biometrika, 6, 1– 25 First citation in article Crossref, Google Scholar
Thompson, B. (1992). Two and one-half decades of leadership in measurement and evaluation. Journal of Consulting and Clinical Psychology, 70, 434– 438 First citation in article Google Scholar
Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education, 61, 361– 377 First citation in article Crossref, Google Scholar
Thompson, B. (1994). Guidelines for authors. Educational and Psychological Measurement, 54, 837– 847 First citation in article Google Scholar
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25(2), 26– 30 First citation in article Google Scholar
Thompson, B. (1997). Editorial policies regarding statistical significance tests: Further comments. Educational Researcher, 26(5), 29– 32 First citation in article Crossref, Google Scholar
Thompson, B. (2002). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider?. Journal of Counseling and Development, 80, 64– 71 First citation in article Crossref, Google Scholar
Thompson, B. , Snyder, P.A. (1998). Statistical significance and reliability analyses in recent Journal of Counseling & Development research articles. Journal of Counseling and Development, 76, 436– 441 First citation in article Crossref, Google Scholar
Tryon, W.W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological Methods, 6, 371– 386 First citation in article Crossref, Google Scholar
Tufte, E.R. (1983). The visual display of quantitative information . Cheshire, CT: Graphics Press First citation in article Google Scholar
Tufte, E.R. (1990). Envisioning information . Cheshire, CT: Graphics Press First citation in article Google Scholar
Tukey, J.W. (1962). The future of data analysis. Annals of Mathematical Statistics, 33, 1– 67 First citation in article Crossref, Google Scholar
Tukey, J.W. (1969). Analyzing data: Sanctification or detective work?. American Psychologist, 24, 83– 91 First citation in article Crossref, Google Scholar
Tukey, J.W. (1977). Exploratory data analysis . Reading, MA: Addison-Wesley First citation in article Google Scholar
Tukey, J.W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100– 116 First citation in article Crossref, Google Scholar
Tversky, A. , Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105– 110 First citation in article Crossref, Google Scholar
Wainer, H. , Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103– 118 First citation in article Crossref, Google Scholar
Weitzman, R.A. (1984). Seven treacherous pitfalls of statistics, illustrated. Psychological Reports, 54, 355– 363 First citation in article Crossref, Google Scholar
Wilkinson, L. the Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594– 604 First citation in article Crossref, Google Scholar
Wilson, W. , Miller, H.L. , Lower, J.S. (1967). Much ado about the null hypothesis. Psychological Bulletin, 68, 188– 196 First citation in article Crossref, Google Scholar

Volume 1Issue 2January 2005

ISSN: 1614-1881eISSN: 1614-2241

Licenses & Copyright

Keywords

Acknowledgments:

This study was supported by funds from the Spanish Department of Science and Technology (Grant code: BSO2001-3751 C02-02).

PDF download

Verify Phone

Congrats!

The Controversy over Null Hypothesis Significance Testing Revisited

Abstract

References

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

The Controversy over Null Hypothesis Significance Testing Revisited

Abstract

References

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners