Abstract
Doubly truncated data arise when an individual is potentially observed only if its failure-time lies within a certain interval, unique to that individual. In this paper, we consider the pseudo-observations approach for estimating regression coefficients when data is subject to double truncation. The pseudo-observations generated from the nonparametric maximum likelihood estimates (NPMLE) of the survival function are used as response variables in a generalized estimating equation to estimate the parameters of the model. We look at two estimators for regression parameters of survival probabilities based on different ways of defining pseudo-observations, namely, the simple pseudo-observations (SPO) and stopped pseudo-observations (STPO). We establish asymptotic properties of the two estimators under some conditions. Simulations results show that the proportion of failed estimation based on STPO are smaller than that based on SPO. The estimator based on STPO performs adequately for finite samples while the estimator based on SPO can be very unstable when sample size is not large enough.
Similar content being viewed by others
References
Andersen, P. K., Klein, J. P., & Rosthøj, S. (2003). Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika, 90, 15–27.
Andersen, P. K., Hansen, M. G., & Klein, J. P. (2004). Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Analysis, 10, 335–350.
Bennett, S. (1983). Analysis of survival data by the proportional odds model. Statistics in Medicine, 2, 273–277.
Cheng, S. C., Wei, L. J., & Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika, 82, 835–845.
Cox, D. (1972). Regression models and life tables (with Discussion). Journal of the Royal Statistical Society B, 34, 187–220.
de Uña-Álvarez, J., & Van Keilegom, I. (2021). Efron-Petrosian integrals for doubly truncated data with covariates: An asymptotic analysis. Bernoulli, 27, 249–273.
Dörre, A., & Emura, T. (2019). Analysis of doubly truncated data an introduction. Berlin: Springer Nature Singapore Pte Ltd.
Dudley, R. M., & Norvais̆a, R. (2011). Concrete Functional Calculus. New York: Springer.
Efron, B., & Petrosian, V. (1999). Nonparametric methods for doubly truncated data. Journal of the American Statistical Association, 94, 824–834.
Emura, T., Konno, Y., & Michimae, H. (2015a). Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Analysis, 21, 397–418.
Emura, T., Hu, Y.-H., & Konno, Y. (2015b). Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation. Statical Papers, 58, 877–909.
Frank, G., Chae, M., & Kim, Y. (2019). Additive time-dependent hazard model with doubly truncated data. Journal of the Korean Statistical Society, 48, 179–193.
Graw, F., Gerds, T. A., & Schumacher, M. (2009). On pseudo-values for regression analysis in competing risks models. Lifetime Data Analysis, 15, 241–255.
Grand, M. K., Putter, H., Allignol, A., & Andersen, P. K. (2019). A note on pseudo-observations and left-truncation. Biometrical Journal, 61, 290–298.
Han, S., Andrei, A.-C., Tsui, K.-W. (2014). A semiparametric regression method for interval-censored data. Communication in Statistics-Simulation and Computation, 43, 18–30.
Hu, Y.-H., & Emura, T. (2015). Maximum likelihood estimation for a special exponential family under random double-truncation. Computational Statistics, 30, 1199–1229.
Jacobsen, M., & Martinussen, T. (2016). A note on the large sample properties of estimators based on generalized linear models for correlated pseudo-observations. Scandinavian Journal of Statistics, 43, 845–862.
Kalbfleisch, J. D., & Lawless, J. F. (1989). Inferences based on retrospective ascertainment: An analysis of data on transfusion-related AIDS. Journal of the American Statistical Association, 84, 360–372.
Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 457–481.
Kim, S., & Kim, Y.-J. (2016). Regression analysis of interval censored competing risk data using a pseudo-value approach. Communications for Statistical Applications and Methods, 23, 555–562.
Mandel, M., de Uña-Álvarez, J., Simon, D. K., & Betensky, R. A. (2018). Inverse probability weighted Cox regression for doubly truncated data. Biometrics, 74, 481–487.
Medley, G. F., Anderson, R. M., Cox, D. R., & Billard, L. (1987). Incubation period of AIDS in patients infected via blood transfusion. Nature, 328, 719–721.
Medley, G. F., Billard, L., Cox, D. R., & Anderson, R. A. (1988). The distribution of the incubation period for the Acquired Immunodeficiency Syndrome (AIDS). Proceedings of the Royal Society of London, Ser. B, 233, 367–377.
Moreira, C., & de Uña-Álvarez, J. (2010a). Bootstrapping the NPMLE for doubly truncated data. Journal of Nonparametric Statistics, 22(5), 567–583.
Moreira, C., & de Uña-Álvarez, J. (2010b). A semiparametric estimator of survival for doubly truncated data. Statistics in Medicine, 29(30), 3147–3159.
Moreira, C., de Uña-Álvarez, J., & Rosa M Crujeiras, R. M. (2010). DTDA: An R package to analyze randomly truncated data. Journal of Statistical Software, 37(7), 1–20.
Moreira, C., & Van Keilegom, I. (2013). Bandwidth selection for kernel density estimation with doubly truncated data. Computational Statistics and Data Analysis, 61, 107–123.
Murphy, S. A., Rossini, A. J., & van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association, 92, 968–976.
Overgaard, M., Thorlund, E., & Petersen, J. (2017). Asymptotic theory of generalized estimating equations based on Jack-knife pseudo-observations. The Annals of Statistics, 45, 1988–2015.
Overgaard, M., Thorlund, E., & Petersen, J. (2018). Estimating the variance in a pseudo-observation scheme with competing risks. Scandinavian Journal of Statistics, 45, 923–940.
Rennert, L., & Xie, S. X. (2018). Cox regression model with doubly truncated data. Biometrics, 74, 725–733.
Shen, P.-S. (2003). The product-limit estimate as an inverse-probability-weighted average. Communications in Statistics-Theory and Methods, 32, 1119–1133.
Shen, P.-S. (2010a). Nonparametric analysis of doubly truncated data. Annals of the Institute Statistical Mathematics, 62(5), 835–853.
Shen, P.-S. (2010b). Semiparametric analysis of doubly truncated data. Communications in Statistics-Theory and Methods, 39, 3178–3190.
Shen, P.-S. (2013). Regression analysis of interval censored and doubly truncated data with linear transformation models. Computational Statistics, 28, 581–596.
Shen, P.-S. (2016). Analysis of transformation models with doubly truncated data. Statistical Methodology, 30, 15–30.
Shen, P.-S. & Hsu, H. (2019). Conditional maximum likelihood estimation for semiparametric transformation models with doubly truncated data. Computational Statistics and Data Analysis, (accepted) https://doi.org/10.1016/j.csda.2019.106862.
Shen, P.-S., & Liu, Y. (2019a). Pseudo maximum likelihood estimation for the Cox model with doubly truncated data. Statistical Papers, 60, 1207–1224.
Shen, P. S., & Liu, Y. (2019b). Pseudo MLE for semiparametric transformation model with doubly truncated data. Journal of the Korean Statistical Society, 48, 384–395.
Tsai, W.-Y., Jewell, N. P., & Wang, M.-C. (1987). A note on the product-limit estimator under right censoring and left truncation. Biometrika, 74, 883–886.
Woodroofe, M. (1985). Estimating a distribution function with truncated data. Annals of Statistics, 13, 163–177.
Ying, Z., Yu, W., Zha, Z., & Zheng, M. (2019). Regression analysis of doubly truncated data. Journal of the American Statistical Association,. https://doi.org/10.1080/01621459.2019.1585252.
Zhang, Z., Sun, L., Zhao, X., & Sun, J. (2005). Regression analysis of intervalcensored failure time data with linear transformation models. Canadian Journal of Statistics, 33, 61–70.
Zhang, X. (2015). Nonparametric inference for an inverse-probability-weighted estimator with doubly truncated data. Communications in Statistics: Simulation and Computation, 44, 489–504.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix:
Appendix:
Proof of Theorem 1
Since \({{\mathcal {E}}}_n\) is a vector of empirical function, condition (3.5) of Overgaard et al. (2017) holds, i.e., \(||{{\mathcal {E}}}_n-{{\mathcal {E}}}||_p=o_p(n^{-\lambda })\) for some \(\lambda \in [{1\over 4},{1\over 2})\) and \(p\in [1,2)\), where \(||f||_p=\sup \sum _{i=2}^{k}|f(t_{i-1})-f(t_{i})|^p+||f||_{\infty }\), over \(\zeta _1\le t_1<\dots ,<t_k\le \zeta _2\) in the interval \([\zeta _1,\zeta _2]\), where \(||\cdot ||_{\infty }\) is the supremum norm. Let
and
Under (C1) and model (3), we have
It follows that by (3.42) of Overgaard et al. (2017)
for \(\lambda \in [1/4,1/2)\), where
where \({{\mathcal {E}}}_{n,s}={{\mathcal {E}}}+s({{\mathcal {E}}}_n-{{\mathcal {E}}})\) and \({{\mathcal {E}}}_{n,s}^{-i}={{\mathcal {E}}}+s({{\mathcal {E}}}_n^{-i}-{{\mathcal {E}}})\). Thus, \(n^{-1/2}{\hat{U}}_{n}(\beta _{t,0})\) and \(n^{-1/2}{\hat{U}}_{n}^{*}(\beta _{t})\) are asymptotically equivalent. Furthermore, \({\hat{U}}^{*}(\beta _{t,0})\) can be expressed as
The factor n aside, this is a U-statistic of order 2. It follows by Theorem 3.3 of Overgaard et al. (2017) that \(n^{-1/2}{\hat{U}}_{n}(\beta _{t})\) converges in distribution to \(N(0,\Sigma (\beta _{{t},0}))\), where \(\Sigma (\beta _{t,0})) =E[h({X}_1,Z_1,{X}_2,Z_2)h({X}_1,Z_1,{X}_3,Z_3)^T]\). Under assumptions (A1)-(A5), it follows that \(\sqrt{n}({\hat{\beta }}_{t}-\beta _{ t,0})\) converges in distribution to \(N(0,M(\beta _{t,0})^{-1}\Sigma (\beta _{t,0})M(\beta _{t,0}))\) as \(n\rightarrow \infty\). The proof is complete.
Proof of Theorem 2
Let
and
It follows that by (3.42) of Overgaard et al. (2017)
for \(\lambda \in [1/4,1/2)\), where
Thus, \(n_t^{-1/2}{\tilde{U}}(\beta _{t,0})\) and \(n_t^{-1/2}{\tilde{U}}^{*}(\beta _{t,0})\) are asymptotically equivalent. Furthermore, \({\tilde{U}}^{*}(\beta _{t,0})\) can be expressed as
It follows that \(n_t^{-1/2}{\tilde{U}}(\beta _{t})\) converges in distribution to \(N(0,\Sigma _d(\beta _{t,0}))\). Under assumptions (A1)–(A5), \(\sqrt{n_t}({\tilde{\beta }}_{t}-\beta _{{t},0})\) converges in distribution to \(N(0,M(\beta _{{t},0})^{-1}\Sigma _d(\beta _{{t},0})M(\beta _{{t_j},0}))\) as \(n_t\rightarrow \infty\). The proof is complete.
Rights and permissions
About this article
Cite this article
Shen, Ps. Regression analysis of doubly truncated data based on pseudo-observations. J. Korean Stat. Soc. 50, 1197–1218 (2021). https://doi.org/10.1007/s42952-021-00113-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-021-00113-9