Skip to main content
Log in

Genetic risk assessment of the joint effect of several genes: Critical appraisal

  • Published:
Russian Journal of Genetics Aims and scope Submit manuscript

Abstract

When assessing the combined action of genes on the quantitative or qualitative phenotype we encounter a phenomenon that could be named the “paradox of the risk score summation.” It arises when the search of risk allele and assessment of their combined action are performed with the same single dataset. Too often such methodological error occurs when calculating the so called genetic risk score (GRS), which refers to the total number of alleles associated with the disease. Examples from numerous published genetic association studies are considered in which the claimed statistically significant effects can be attributed to the “risk score summation paradox.” In the second section of the review we discuss the current modifications of multiple regression analysis addressed to the so called “np problem” (the number of points is much smaller than the number of possible predictors). Various algorithms for the model selection (searching the significant predictor combinations) are considered, beginning from the common marginal screening of the “top” predictors to LASSO and other modern algorithms of compressed sensing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Freedman, D.A., A note on screening regression equations, Am. Stat., 1983, vol. 37, no. 2, pp. 152–155.

    Google Scholar 

  2. Lukacs, P.M., Burnham, K.P., and Anderson, D.R., Model selection bias and Freedman’s paradox, Ann. Inst. Stat. Math., 2010, vol. 62, no. 1, pp. 117–125. doi 10.1007/s10463-009-0234-4

    Article  Google Scholar 

  3. Wray, N.R., Yang, J., Hayes, B.J., et al., Pitfalls of predicting complex traits from SNPs, Nat. Rev. Genet., 2013, vol. 14, no. 7, pp. 507–515. doi 10.1038/nrg3457.5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Vral, A., Willems, P., Claes, K., et al., Combined effect of polymorphisms in Rad51 and Xrcc3 on breast cancer risk and chromosomal radiosensitivity, Mol. Med. Rep., 2011, vol. 4, no. 5, pp. 901–912. doi 10.3892/mmr.2011.523

    CAS  PubMed  Google Scholar 

  5. Nagaraja, H.N., Some nondegenerate limit laws for the selection differential, Ann. Stat., 1982, vol. 10, no. 4, pp. 1306–1310.

    Article  Google Scholar 

  6. Yiannakouris, N., Trichopoulou, A., Benetou, V., et al., A direct assessment of genetic contribution to the incidence of coronary infarct in the general population Greek EPIC cohort, Eur. J. Epidemiol., 2006, vol. 21, pp. 859–867. doi 10.1007/s10654-006-9070-5

    Article  PubMed  Google Scholar 

  7. Joubert, B.R., Reif, D.M., Edwards, S.W., et al., Evaluation of genetic susceptibility to childhood allergy and asthma in an African American urban population, BMC Med. Genet., 2011, vol. 12, no. 25, pp. 1–11. doi 10.1186/1471-2350-12-25

    Google Scholar 

  8. Lluís-Ganella, C., Lucas, G., Subirana, I. et al., Additive effects of multiple genetic variants on the risk of coronary artery disease, Rev. Esp. Cardiol., 2010, vol. 63, no. 8, pp. 925–933. doi 10.1016/S1885-5857(10)70186-9

    Article  PubMed  Google Scholar 

  9. Hu, P., Muise, A.M., Xing, X.J., et al., Association between a multi-locus genetic risk score and inflammatory bowel disease, Bioinf. Biol. Insights, 2013, vol. 7, pp. 143–152. doi 10.4137/BBI.S11601

    Article  Google Scholar 

  10. Ribeiro, R.J., Monteiro, C.P., Azevedo, A.S., et al., Performance of an adipokine pathway-based multilocus genetic risk score for prostate cancer risk prediction, PLoS One, 2012, vol. 7, no. 6. e39236. doi 10.1371/journalpone.0039236

    Article  Google Scholar 

  11. Smailhodzic, D., Muether, P.S., Chen, J., et al., Cumulative effect of risk alleles in CFH, ARMS2, and VEGFA on the response to ranibizumab treatment in age-related macular degeneration, Ophthalmology, 2012, vol. 119, no. 11, pp. 2304–2311. doi 10.1016/jophtha.2012.05.040

    PubMed  Google Scholar 

  12. Lång, A., Wegman, P., and Wingren, S., The significance of MDM2 SNP309 and p53 Arg72Pro in young women with breast cancer, Oncol. Rep., 2009, vol. 22, no. 3, pp. 575–579. doi 10.3892/or_00000474

    PubMed  Google Scholar 

  13. Moumad, K., Lascorz, J., Bevier, M., et al., Genetic polymorphisms in host innate immune sensor genes and the risk of nasopharyngeal narcinoma in North Africa, G3 (Bethesda), 2013, vol. 3, no. 6, pp. 971–977. doi 10.1534/g3.112.005371

    Article  Google Scholar 

  14. Signorello, L.B., Shi, J., Cai, Q., et al., Common variation in vitamin D pathway genes predicts circulating 25-hydroxyvitamin D levels among African Americans, PLoS One, 2011, vol. 6, no. 12, doi 10.1371/journal. pone.0028623

  15. Lu M., Liu, Z., Yu, H., et al., Combined effects of E2F1 and E2F2 polymorphisms on risk and early onset of squamous cell carcinoma of the head and neck, Mol. Carcinog., 2012, vol. 51, suppl. 1, pp. E132–E141. doi 10.1002/mc.2188210.1002/mc.21882

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Petukhova, L., Duvic, M., Hordinsky, M., et al., Genome-wide association study in alopecia areata implicates both innate and adaptive immunity, Nature, 2010, vol. 466, pp. 113–117. doi 10.1038/nature09114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Skol, A.D., Scott, L.J., Abecasis, G.R., and Boehnke, M., Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies, Nat. Genet., 2006, vol. 38, no. 2, pp. 209–213. doi 10.1038/ng1706

    Article  CAS  PubMed  Google Scholar 

  18. Rubanovich, A.V. and Khromov-Borisov, N.N., Theoretical analysis of the predictability indices of the binary genetic tests, Ekol. Genet., 2013, vol. 11, no. 1, pp. 77–90. doi 10.1134/S2079059714020087

    Google Scholar 

  19. Rencher, A.C. and Pun, F.C., Inflation of R2 in best subset regression, Technometrics, 1980, vol. 22, no. 1, pp. 49–53. doi 10.2307/1268382

    Article  Google Scholar 

  20. Foster, D.P. and Stine, R.A., Honest confidence intervals for the error variance in stepwise regression, J. Econ. Soc. Meas., 2006, vol. 31, nos. 1, 2, pp. 89–102.

    Google Scholar 

  21. Segal, M.R., Dahlquist, K.D., and Conklin, B.R., Regression approaches for microarray data analysis, J. Comput. Biol., 2003, vol. 10, no. 6, pp. 961–980. doi 10.1089/106652703322756177

    Article  CAS  PubMed  Google Scholar 

  22. Loh, W., Variable selection for classification and regression in large p, small n problems, Probab. Approximations Beyond, Ser. Lect. Notes Stat., 2011, vol. 205, pp. 135–159. doi 10.1007/978-1-4614-1966-2_10

    Article  Google Scholar 

  23. Hastie, T. and Tibshirani, R., Expression arrays and the problem, 2003. https://webstanfordedu/ ~hastie/Papers/pgtnpdf

    Google Scholar 

  24. Genovese, C.R., Jin, J., and Wasserman, L., Revisiting marginal regression, arXiv:0911.4080v1 [math.ST] 20 Nov 2009. p @ n

    Google Scholar 

  25. Genovese, C.R., Jin, J., Wasserman, L., and Yao, Z., A comparison of the lasso and marginal regression, J. Mach. Learn. Res., 2012, vol. 13, no. 1, pp. 2107–2143.

    Google Scholar 

  26. Whittingham, M.J., Stephens, P.A., Bradbury, R.B., and Freckleton, R.P., Why do we still use stepwise modelling in ecology and behaviour?, J. Anim. Ecol., 2006, vol. 75, no. 5, pp. 1182–1189. doi 10.1111/j.1365-2656.2006.01141x

    Article  PubMed  Google Scholar 

  27. Fan, J. and Lv, J., Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc., Ser. B Stat. Methodol., 2008, vol. 70, no. 5, pp. 849–911.

    Article  Google Scholar 

  28. Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B Stat. Methodol., 1996, vol. 58, no. 1, pp. 267–288.

    Google Scholar 

  29. Friedman, J.H., Hastie, T., and Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Statist. Softw., 2009, vol. 33, no. 1. doi 10.18637/jssv033i01

  30. Wainwright, M.J., Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting, IEEE Trans. Inf. Theory, 2009, vol. 55, no. 12. doi 10.1109/TIT.2009.2032816

  31. Donoho, D. and Stodden, V., Breakdown point of model selection when the number of variables exceeds the number of observations, Proceedings of International Joint Conference on Neural Networks, Vancouver, 2006, pp. 1916–1921. doi 10.1109/IJCNN.2006.246934

    Google Scholar 

  32. Wimmer, V., Lehermeier, C., Albrecht, T., et al., Genetic architecture through efficient variable selection, Genetics, 2013, vol. 195, no. 2, pp. 573–587. doi 10.1534/genetics.113.150078/-/DC1

    Article  PubMed  PubMed Central  Google Scholar 

  33. Goeman, J.J., L1 penalized estimation in the Cox proportional hazards model, Biom. J., 2010, vol. 52, no. 1, pp. 70–84. doi 10.1002/bimj.200900028

    PubMed  Google Scholar 

  34. Lange, K., Papp, J.C., Sinsheimer, J.S., and Sobel, E.M., Next-generation statistical genetics: modeling, penalization, and optimization in high-dimensional data, Annu. Rev. Stat. Appl., 2014, vol. 1, pp. 279–300. doi 10.1146/annurev-statistics-022513-115638

    Article  PubMed  Google Scholar 

  35. Buhlmann, P., Kalisch, M., and Meier, L., Highdimensional statistics with a view toward applications in biology, Annu. Rev. Stat. Appl., 2014, vol. 1, pp. 255–278. doi 10.1146/annurev-statistics-022513-115545

    Article  Google Scholar 

  36. Wu, T.T., Chen, Y.F., Hastie, T., et al., Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics, 2009, vol. 25, no. 6, pp. 714–721. doi 10.1093/bioinformatics/btp041

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Usai, M.G., Goddard, M.E., and Hayes, B.J., LASSO with cross-validation for genomic selection, Genet. Res. (Camb.), 2009, vol. 91, no. 6, pp. 427–436. doi doi 10.1017/S0016672309990334

    Article  CAS  Google Scholar 

  38. Friedman, J.H., Hastie, T., Simon, N., and Tibshirani, R., Package ‘glmnet,’ 2015. https://cranr-projectorg/web/packages/glmnet

    Google Scholar 

  39. Goeman, J., Meijer, R., and Chaturvedi, N., Package ‘penalized’, 2015. https://cranr-projectorg/web/packages/penalized/

    Google Scholar 

  40. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R., Least angle regression, Ann. Stat., 2004, vol. 32, no. 2, pp. 407–499. doi 10.1214/009053604000000067

    Article  Google Scholar 

  41. Lockhart, R., Taylor, J., Tibshirani, R.J., and Tibshirani, R., A significance test for the lasso, Ann. Statist., 2014, vol. 42, no. 2, pp. 413–468. doi 10.1214/13-AOS1175

    Article  Google Scholar 

  42. Foucart, S. and Rauhut, H., A Mathematical Introduction to Compressive Sensing, Basel: Birkhäuser, 2013. doi 10.1007/978-0-8176-4948-7

    Book  Google Scholar 

  43. Candes, E. and Tao, T., The Dantzig selector: statistical estimation when p is much larger than n, Ann. Stat., 2007, vol. 35, no. 6, pp. 2313–2351. doi 10.1214/009053606000001523

    Article  Google Scholar 

  44. Ho, C.M. and Hsua, S.D., Determination of nonlinear genetic architecture using compressed sensing. arXiv:1408.6583v1 [q-bio.GN]. 19 Jul 2015.

    Google Scholar 

  45. Vattikuti, S., Lee, J.J., Chang, C.C., et al., Applying compressed sensing to genome-wide association studies. GigaScience, 2014, vol. 3, no. 10, paper 3.

    Google Scholar 

  46. Boulesteix, A.L. and Strimmer, K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data, Brief Bioinf., 2007, vol. 8, no. 1, pp. 32–44. doi 10.1093/bib/bbl016

  47. Huang, C.C., Tu, S.H., Huang, C.S., et al., Multiclass prediction with partial least square regression for gene expression data: applications in breast cancer intrinsic taxonomy, Biomed. Res. Int., 2013. Article ID248648. doi 10.1155/2013/248648

    Google Scholar 

  48. Feng, Z.Z., Yang, X., Subedi, S., and McNicholas, P.D., The LASSO and sparse least square regression methods for SNP selection in predicting quantitative traits, IEEE/ACM Trans Comput. Biol. Bioinf., 2012, vol. 9, no. 2, pp. 629–636. doi 10.1109/TCBB.2011.139

    Article  Google Scholar 

  49. Yang, J., Benyamin, B., McEvoy, B.P., et al., Common SNPs explain a large proportion of the heritability for human height, Nat. Genet., 2010, vol. 42, no. 7, pp. 565–569. doi 10.1038/ng.608

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yang, J., Lee, S.H., Goddard, M.E., and Visscher, P.M., GCTA: a tool for genome-wide complex trait analysis, Am. J. Hum. Genet., 2011, vol. 88, no. 1, pp. 76–82. doi 10.1016/jajhg.2010.11.011

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Lee, S.H., Wray, N.R., Goddard, M.E., and Visscher, P.M., Estimating missing heritability for disease from genome-wide association studies, Am. J. Hum. Genet., 2011, vol. 88, no. 3, pp. 294–305. doi 10.1016/jajhg.2011.02.002

    Article  PubMed  PubMed Central  Google Scholar 

  52. Yang, J., Zaitlen, N.A., Goddard, M.E., et al., Mixed model association methods: advantages and pitfalls, Nat. Genet., 2014, vol. 46, no. 2, pp. 100–106. doi 10.1038/ng.2876

    Article  PubMed  PubMed Central  Google Scholar 

  53. Charney, E., Still chasing hosts: a new genetic methodology will not find the “missing heritability,” Indep. Sci. News, 2013, 19 September.

  54. Kumar, K.S., Feldman, M.W., Rehkopf, D.H., and Tuljapurkar, S., Limitations of GCTA as a solution to the missing heritability problem, Proc. Natl. Acad. Sci. U.S.A., 2016, vol. 113, no. 1, pp. E61–E70. doi 10.1073/pnas.1520109113

    Article  CAS  Google Scholar 

  55. Yang, J., Lee, S.H., Wray, N.R., et al., Commentary on “Limitations of GCTA as a solution to the missing heritability problem,” bioRxiv 036574. Jan 20 2016. http://dxdoiorg/10.1101/036574

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Rubanovich.

Additional information

Original Russian Text © A.V. Rubanovich, N.N. Khromov-Borisov, 2016, published in Genetika, 2016, Vol. 52, No. 7, pp. 865–878.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rubanovich, A.V., Khromov-Borisov, N.N. Genetic risk assessment of the joint effect of several genes: Critical appraisal. Russ J Genet 52, 757–769 (2016). https://doi.org/10.1134/S1022795416070073

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1022795416070073

Keywords

Navigation