Skip to main content

Advertisement

Log in

Regression analysis of doubly truncated data based on pseudo-observations

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Doubly truncated data arise when an individual is potentially observed only if its failure-time lies within a certain interval, unique to that individual. In this paper, we consider the pseudo-observations approach for estimating regression coefficients when data is subject to double truncation. The pseudo-observations generated from the nonparametric maximum likelihood estimates (NPMLE) of the survival function are used as response variables in a generalized estimating equation to estimate the parameters of the model. We look at two estimators for regression parameters of survival probabilities based on different ways of defining pseudo-observations, namely, the simple pseudo-observations (SPO) and stopped pseudo-observations (STPO). We establish asymptotic properties of the two estimators under some conditions. Simulations results show that the proportion of failed estimation based on STPO are smaller than that based on SPO. The estimator based on STPO performs adequately for finite samples while the estimator based on SPO can be very unstable when sample size is not large enough.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Andersen, P. K., Klein, J. P., & Rosthøj, S. (2003). Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika, 90, 15–27.

    MathSciNet  MATH  Google Scholar 

  • Andersen, P. K., Hansen, M. G., & Klein, J. P. (2004). Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Analysis, 10, 335–350.

    MathSciNet  MATH  Google Scholar 

  • Bennett, S. (1983). Analysis of survival data by the proportional odds model. Statistics in Medicine, 2, 273–277.

    Google Scholar 

  • Cheng, S. C., Wei, L. J., & Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika, 82, 835–845.

    MathSciNet  MATH  Google Scholar 

  • Cox, D. (1972). Regression models and life tables (with Discussion). Journal of the Royal Statistical Society B, 34, 187–220.

    MathSciNet  MATH  Google Scholar 

  • de Uña-Álvarez, J., & Van Keilegom, I. (2021). Efron-Petrosian integrals for doubly truncated data with covariates: An asymptotic analysis. Bernoulli, 27, 249–273.

    MathSciNet  MATH  Google Scholar 

  • Dörre, A., & Emura, T. (2019). Analysis of doubly truncated data an introduction. Berlin: Springer Nature Singapore Pte Ltd.

    MATH  Google Scholar 

  • Dudley, R. M., & Norvais̆a, R. (2011). Concrete Functional Calculus. New York: Springer.

    Google Scholar 

  • Efron, B., & Petrosian, V. (1999). Nonparametric methods for doubly truncated data. Journal of the American Statistical Association, 94, 824–834.

    MathSciNet  MATH  Google Scholar 

  • Emura, T., Konno, Y., & Michimae, H. (2015a). Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Analysis, 21, 397–418.

    MathSciNet  MATH  Google Scholar 

  • Emura, T., Hu, Y.-H., & Konno, Y. (2015b). Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation. Statical Papers, 58, 877–909.

    MathSciNet  MATH  Google Scholar 

  • Frank, G., Chae, M., & Kim, Y. (2019). Additive time-dependent hazard model with doubly truncated data. Journal of the Korean Statistical Society, 48, 179–193.

    MathSciNet  MATH  Google Scholar 

  • Graw, F., Gerds, T. A., & Schumacher, M. (2009). On pseudo-values for regression analysis in competing risks models. Lifetime Data Analysis, 15, 241–255.

    MathSciNet  MATH  Google Scholar 

  • Grand, M. K., Putter, H., Allignol, A., & Andersen, P. K. (2019). A note on pseudo-observations and left-truncation. Biometrical Journal, 61, 290–298.

    MathSciNet  MATH  Google Scholar 

  • Han, S., Andrei, A.-C., Tsui, K.-W. (2014). A semiparametric regression method for interval-censored data. Communication in Statistics-Simulation and Computation, 43, 18–30.

  • Hu, Y.-H., & Emura, T. (2015). Maximum likelihood estimation for a special exponential family under random double-truncation. Computational Statistics, 30, 1199–1229.

    MathSciNet  MATH  Google Scholar 

  • Jacobsen, M., & Martinussen, T. (2016). A note on the large sample properties of estimators based on generalized linear models for correlated pseudo-observations. Scandinavian Journal of Statistics, 43, 845–862.

    MathSciNet  MATH  Google Scholar 

  • Kalbfleisch, J. D., & Lawless, J. F. (1989). Inferences based on retrospective ascertainment: An analysis of data on transfusion-related AIDS. Journal of the American Statistical Association, 84, 360–372.

    MathSciNet  MATH  Google Scholar 

  • Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 457–481.

    MathSciNet  MATH  Google Scholar 

  • Kim, S., & Kim, Y.-J. (2016). Regression analysis of interval censored competing risk data using a pseudo-value approach. Communications for Statistical Applications and Methods, 23, 555–562.

    MathSciNet  Google Scholar 

  • Mandel, M., de Uña-Álvarez, J., Simon, D. K., & Betensky, R. A. (2018). Inverse probability weighted Cox regression for doubly truncated data. Biometrics, 74, 481–487.

    MathSciNet  MATH  Google Scholar 

  • Medley, G. F., Anderson, R. M., Cox, D. R., & Billard, L. (1987). Incubation period of AIDS in patients infected via blood transfusion. Nature, 328, 719–721.

    Google Scholar 

  • Medley, G. F., Billard, L., Cox, D. R., & Anderson, R. A. (1988). The distribution of the incubation period for the Acquired Immunodeficiency Syndrome (AIDS). Proceedings of the Royal Society of London, Ser. B, 233, 367–377.

    Google Scholar 

  • Moreira, C., & de Uña-Álvarez, J. (2010a). Bootstrapping the NPMLE for doubly truncated data. Journal of Nonparametric Statistics, 22(5), 567–583.

    MathSciNet  MATH  Google Scholar 

  • Moreira, C., & de Uña-Álvarez, J. (2010b). A semiparametric estimator of survival for doubly truncated data. Statistics in Medicine, 29(30), 3147–3159.

    MathSciNet  Google Scholar 

  • Moreira, C., de Uña-Álvarez, J., & Rosa M Crujeiras, R. M. (2010). DTDA: An R package to analyze randomly truncated data. Journal of Statistical Software, 37(7), 1–20.

    Google Scholar 

  • Moreira, C., & Van Keilegom, I. (2013). Bandwidth selection for kernel density estimation with doubly truncated data. Computational Statistics and Data Analysis, 61, 107–123.

    MathSciNet  MATH  Google Scholar 

  • Murphy, S. A., Rossini, A. J., & van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association, 92, 968–976.

    MathSciNet  MATH  Google Scholar 

  • Overgaard, M., Thorlund, E., & Petersen, J. (2017). Asymptotic theory of generalized estimating equations based on Jack-knife pseudo-observations. The Annals of Statistics, 45, 1988–2015.

    MathSciNet  MATH  Google Scholar 

  • Overgaard, M., Thorlund, E., & Petersen, J. (2018). Estimating the variance in a pseudo-observation scheme with competing risks. Scandinavian Journal of Statistics, 45, 923–940.

    MathSciNet  MATH  Google Scholar 

  • Rennert, L., & Xie, S. X. (2018). Cox regression model with doubly truncated data. Biometrics, 74, 725–733.

    MathSciNet  MATH  Google Scholar 

  • Shen, P.-S. (2003). The product-limit estimate as an inverse-probability-weighted average. Communications in Statistics-Theory and Methods, 32, 1119–1133.

    MathSciNet  MATH  Google Scholar 

  • Shen, P.-S. (2010a). Nonparametric analysis of doubly truncated data. Annals of the Institute Statistical Mathematics, 62(5), 835–853.

    MathSciNet  MATH  Google Scholar 

  • Shen, P.-S. (2010b). Semiparametric analysis of doubly truncated data. Communications in Statistics-Theory and Methods, 39, 3178–3190.

    MathSciNet  MATH  Google Scholar 

  • Shen, P.-S. (2013). Regression analysis of interval censored and doubly truncated data with linear transformation models. Computational Statistics, 28, 581–596.

    MathSciNet  MATH  Google Scholar 

  • Shen, P.-S. (2016). Analysis of transformation models with doubly truncated data. Statistical Methodology, 30, 15–30.

    MathSciNet  MATH  Google Scholar 

  • Shen, P.-S. & Hsu, H. (2019). Conditional maximum likelihood estimation for semiparametric transformation models with doubly truncated data. Computational Statistics and Data Analysis, (accepted) https://doi.org/10.1016/j.csda.2019.106862.

  • Shen, P.-S., & Liu, Y. (2019a). Pseudo maximum likelihood estimation for the Cox model with doubly truncated data. Statistical Papers, 60, 1207–1224.

    MathSciNet  MATH  Google Scholar 

  • Shen, P. S., & Liu, Y. (2019b). Pseudo MLE for semiparametric transformation model with doubly truncated data. Journal of the Korean Statistical Society, 48, 384–395.

    MathSciNet  MATH  Google Scholar 

  • Tsai, W.-Y., Jewell, N. P., & Wang, M.-C. (1987). A note on the product-limit estimator under right censoring and left truncation. Biometrika, 74, 883–886.

    MATH  Google Scholar 

  • Woodroofe, M. (1985). Estimating a distribution function with truncated data. Annals of Statistics, 13, 163–177.

    MathSciNet  MATH  Google Scholar 

  • Ying, Z., Yu, W., Zha, Z., & Zheng, M. (2019). Regression analysis of doubly truncated data. Journal of the American Statistical Association,. https://doi.org/10.1080/01621459.2019.1585252.

    Article  Google Scholar 

  • Zhang, Z., Sun, L., Zhao, X., & Sun, J. (2005). Regression analysis of intervalcensored failure time data with linear transformation models. Canadian Journal of Statistics, 33, 61–70.

    MathSciNet  MATH  Google Scholar 

  • Zhang, X. (2015). Nonparametric inference for an inverse-probability-weighted estimator with doubly truncated data. Communications in Statistics: Simulation and Computation, 44, 489–504.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pao-sheng Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

Appendix:

Proof of Theorem 1

Since \({{\mathcal {E}}}_n\) is a vector of empirical function, condition (3.5) of Overgaard et al. (2017) holds, i.e., \(||{{\mathcal {E}}}_n-{{\mathcal {E}}}||_p=o_p(n^{-\lambda })\) for some \(\lambda \in [{1\over 4},{1\over 2})\) and \(p\in [1,2)\), where \(||f||_p=\sup \sum _{i=2}^{k}|f(t_{i-1})-f(t_{i})|^p+||f||_{\infty }\), over \(\zeta _1\le t_1<\dots ,<t_k\le \zeta _2\) in the interval \([\zeta _1,\zeta _2]\), where \(||\cdot ||_{\infty }\) is the supremum norm. Let

$$\begin{aligned} {\hat{S}}_{n,i}^{*}(t)=\Psi ({{\mathcal {E}}})+{\dot{\Psi }}_j({X}_i) +{1\over {n-1}}\sum _{i\ne i^{'}}\ddot{\Psi }({X}_i,{X}_{i^{'}}) \end{aligned}$$

and

$$\begin{aligned} {\hat{U}}_{n}^{*}(\beta _{t,0})=\sum _{i=1}^{n} A(\beta _{t,0};Z_i)^T \left( {\hat{S}}_{n,i}^{*}(t)-E[{\hat{S}}_{n,i}^{*}(t)|Z_i]\right) . \end{aligned}$$

Under (C1) and model (3), we have

$$\begin{aligned}&E[{\hat{S}}_{n,i}^{*}(t)|Z_i]=\Psi ({{\mathcal {E}}})+E[{\dot{\Psi }}({X}_i)|Z_i] \\&\quad =\Psi ({{\mathcal {E}}})+E[{\hat{S}}_n(t)|Z_i]-S(t_j)=E[{\hat{S}}_n(t)|Z_i]=\phi ^{-1}(\beta _{t}^T Z_i)+o(n^{-1/2}). \end{aligned}$$

It follows that by (3.42) of Overgaard et al. (2017)

$$\begin{aligned}&n^{-1/2}|{\hat{U}}_{n}(\beta _{t,0})-{\hat{U}}_{n}^{*}(\beta _{t})| =\sum _{i=1}^{n}\left| A(\beta _{t,0};Z_i)^T \left( [{\hat{S}}_{n,i}(t)-{\hat{S}}_{n,i}^{*}(t)]+o(n^{-1/2})\right) \right| \\&\quad \le \max _{i}|{{\mathcal {R}}}_{ij}|n^{-1/2}\sum _{i=1}^{n}\left| A(\beta _{t,0};Z_i)^T+o(n^{-1/2})\right| = o_p(n^{1/2-2\lambda })+ o(1), \end{aligned}$$

for \(\lambda \in [1/4,1/2)\), where

$$\begin{aligned} {{\mathcal {R}}}_{ij}= & {} {1\over 2}\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})+ {1\over {2(n-1)}}\Psi _{{{\mathcal {E}}}}^{''}(\delta _{{X}_i}-{{\mathcal {E}}}_n,\delta _{{X}_i}-{{\mathcal {E}}}_n) \\&+\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_{n}^{-i}-{{\mathcal {E}}})+ \int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+\int _{0}^{1}(1-s)(n-1)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}_{n,s}^{-i},j}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n)ds \\&+\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})(\delta _{{X}_i}-{{\mathcal {E}}}_n,{{\mathcal {E}}}_n^{-i}-{{\mathcal {E}}})ds, \end{aligned}$$

where \({{\mathcal {E}}}_{n,s}={{\mathcal {E}}}+s({{\mathcal {E}}}_n-{{\mathcal {E}}})\) and \({{\mathcal {E}}}_{n,s}^{-i}={{\mathcal {E}}}+s({{\mathcal {E}}}_n^{-i}-{{\mathcal {E}}})\). Thus, \(n^{-1/2}{\hat{U}}_{n}(\beta _{t,0})\) and \(n^{-1/2}{\hat{U}}_{n}^{*}(\beta _{t})\) are asymptotically equivalent. Furthermore, \({\hat{U}}^{*}(\beta _{t,0})\) can be expressed as

$$\begin{aligned} {\hat{U}}^{*}(\beta _{t,0})=n{1\over {n\atopwithdelims ()2}}\sum _{i=1}^{n}\sum _{i^{'}<i} {1\over 2}h({X}_i,Z_i,{X}_{i^{'}},Z_{i^{'}}). \end{aligned}$$

The factor n aside, this is a U-statistic of order 2. It follows by Theorem 3.3 of Overgaard et al. (2017) that \(n^{-1/2}{\hat{U}}_{n}(\beta _{t})\) converges in distribution to \(N(0,\Sigma (\beta _{{t},0}))\), where \(\Sigma (\beta _{t,0})) =E[h({X}_1,Z_1,{X}_2,Z_2)h({X}_1,Z_1,{X}_3,Z_3)^T]\). Under assumptions (A1)-(A5), it follows that \(\sqrt{n}({\hat{\beta }}_{t}-\beta _{ t,0})\) converges in distribution to \(N(0,M(\beta _{t,0})^{-1}\Sigma (\beta _{t,0})M(\beta _{t,0}))\) as \(n\rightarrow \infty\). The proof is complete.

Proof of Theorem 2

Let

$$\begin{aligned} {\tilde{S}}_{n,i}^{*}(t)=\Psi ({{\mathcal {E}}})+{\dot{\Psi }}({X}_i) +{1\over {n_t-1}}\sum _{i\ne i^{'},i,i^{'}\in {{\mathcal {C}}}_d(t)}\ddot{\Psi }({X}_i,{ X}_{i^{'}}) \end{aligned}$$

and

$$\begin{aligned} {\tilde{U}}^{*}(\beta _{t,0})=\sum _{i\in {{\mathcal {C}}}_d(t)}A(\beta _{t,0};Z_i)^T \left( {\tilde{S}}_{n,i}^{*}(t)-E[{\tilde{S}}_{n,i}^{*}(t)|Z_i]\right) . \end{aligned}$$

It follows that by (3.42) of Overgaard et al. (2017)

$$\begin{aligned}&n_t^{-1/2}|{\tilde{U}}(\beta _{t,0})-{\tilde{U}}^{*}(\beta _{t,0})| =\sum _{i\in {{\mathcal {C}}}_d(t)}^{n}\left| A(\beta _{t,0};Z_i)^T \left( [{\tilde{S}}_{n,i}(t)-{\tilde{S}}_{n,i}^{*}(t)]+o(n_t^{-1/2})\right) \right| \\&\quad \le \max _{i\in {{\mathcal {C}}}_d(t)}|{{\mathcal {D}}}_{ij}|n^{-1/2}\sum _{i\in {{\mathcal {C}}}_d(t)}^{n}\left| A(\beta _{t,0};Z_i)^T+o(n_t^{-1/2})\right| = o_p(n_t^{1/2-2\lambda })+ o(1), \end{aligned}$$

for \(\lambda \in [1/4,1/2)\), where

$$\begin{aligned} {{\mathcal {D}}}_{ij}= & {} {1\over 2}\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})+ {{n_t-1}\over {2(n-1)^2}}\Psi _{{{\mathcal {E}}}}^{''}(\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n,\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n) \\&+{{n_t-1}\over {n-1}}\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_{n}^{-i}-{{\mathcal {E}}})+{{n_t-1}\over {n-1}} \int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+\int _{0}^{1}(1-s)(n_t-1)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+{{n_t-1}\over {n-1}}\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n)ds \\&+{{n_t-1}\over {n-1}}\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})(\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n,{{\mathcal {E}}}_n^{-i}-{{\mathcal {E}}})ds. \end{aligned}$$

Thus, \(n_t^{-1/2}{\tilde{U}}(\beta _{t,0})\) and \(n_t^{-1/2}{\tilde{U}}^{*}(\beta _{t,0})\) are asymptotically equivalent. Furthermore, \({\tilde{U}}^{*}(\beta _{t,0})\) can be expressed as

$$\begin{aligned} {\tilde{U}}^{*}(\beta _{t,0})=n_t{1\over {n_t\atopwithdelims ()2}}\sum _{i\in {{\mathcal {C}}}_d(t)}\sum _{i^{'}<i, i^{'}\in {{\mathcal {C}}}_d(t)}{1\over 2}h({X}_i,Z_i,{X}_{i^{'}},Z_{i^{'}}). \end{aligned}$$

It follows that \(n_t^{-1/2}{\tilde{U}}(\beta _{t})\) converges in distribution to \(N(0,\Sigma _d(\beta _{t,0}))\). Under assumptions (A1)–(A5), \(\sqrt{n_t}({\tilde{\beta }}_{t}-\beta _{{t},0})\) converges in distribution to \(N(0,M(\beta _{{t},0})^{-1}\Sigma _d(\beta _{{t},0})M(\beta _{{t_j},0}))\) as \(n_t\rightarrow \infty\). The proof is complete.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, Ps. Regression analysis of doubly truncated data based on pseudo-observations. J. Korean Stat. Soc. 50, 1197–1218 (2021). https://doi.org/10.1007/s42952-021-00113-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-021-00113-9

Keywords

Navigation