Regression analysis of doubly truncated data based on pseudo-observations

Shen, Pao-sheng

doi:10.1007/s42952-021-00113-9

Regression analysis of doubly truncated data based on pseudo-observations

Research Article
Published: 16 March 2021

Volume 50, pages 1197–1218, (2021)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

Pao-sheng Shen¹

160 Accesses
1 Citation
Explore all metrics

Abstract

Doubly truncated data arise when an individual is potentially observed only if its failure-time lies within a certain interval, unique to that individual. In this paper, we consider the pseudo-observations approach for estimating regression coefficients when data is subject to double truncation. The pseudo-observations generated from the nonparametric maximum likelihood estimates (NPMLE) of the survival function are used as response variables in a generalized estimating equation to estimate the parameters of the model. We look at two estimators for regression parameters of survival probabilities based on different ways of defining pseudo-observations, namely, the simple pseudo-observations (SPO) and stopped pseudo-observations (STPO). We establish asymptotic properties of the two estimators under some conditions. Simulations results show that the proportion of failed estimation based on STPO are smaller than that based on SPO. The estimator based on STPO performs adequately for finite samples while the estimator based on SPO can be very unstable when sample size is not large enough.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation in the Complementary Exponential Geometric Distribution Based on Progressive Type-II Censored Data

Article 26 July 2019

Özlem Gürünlü Alma & Reza Arabi Belaghi

Empirical likelihood and variable selection for partially linear single-index EV models with missing censoring indicators

Article 03 April 2020

Yuye Zou, Guoliang Fan & Riquan Zhang

An algorithm for estimating survival under a copula-based dependent truncation model

Article 10 March 2015

T. Emura & K. Murotani

References

Andersen, P. K., Klein, J. P., & Rosthøj, S. (2003). Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika, 90, 15–27.
MathSciNet MATH Google Scholar
Andersen, P. K., Hansen, M. G., & Klein, J. P. (2004). Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Analysis, 10, 335–350.
MathSciNet MATH Google Scholar
Bennett, S. (1983). Analysis of survival data by the proportional odds model. Statistics in Medicine, 2, 273–277.
Google Scholar
Cheng, S. C., Wei, L. J., & Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika, 82, 835–845.
MathSciNet MATH Google Scholar
Cox, D. (1972). Regression models and life tables (with Discussion). Journal of the Royal Statistical Society B, 34, 187–220.
MathSciNet MATH Google Scholar
de Uña-Álvarez, J., & Van Keilegom, I. (2021). Efron-Petrosian integrals for doubly truncated data with covariates: An asymptotic analysis. Bernoulli, 27, 249–273.
MathSciNet MATH Google Scholar
Dörre, A., & Emura, T. (2019). Analysis of doubly truncated data an introduction. Berlin: Springer Nature Singapore Pte Ltd.
MATH Google Scholar
Dudley, R. M., & Norvais̆a, R. (2011). Concrete Functional Calculus. New York: Springer.
Google Scholar
Efron, B., & Petrosian, V. (1999). Nonparametric methods for doubly truncated data. Journal of the American Statistical Association, 94, 824–834.
MathSciNet MATH Google Scholar
Emura, T., Konno, Y., & Michimae, H. (2015a). Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Analysis, 21, 397–418.
MathSciNet MATH Google Scholar
Emura, T., Hu, Y.-H., & Konno, Y. (2015b). Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation. Statical Papers, 58, 877–909.
MathSciNet MATH Google Scholar
Frank, G., Chae, M., & Kim, Y. (2019). Additive time-dependent hazard model with doubly truncated data. Journal of the Korean Statistical Society, 48, 179–193.
MathSciNet MATH Google Scholar
Graw, F., Gerds, T. A., & Schumacher, M. (2009). On pseudo-values for regression analysis in competing risks models. Lifetime Data Analysis, 15, 241–255.
MathSciNet MATH Google Scholar
Grand, M. K., Putter, H., Allignol, A., & Andersen, P. K. (2019). A note on pseudo-observations and left-truncation. Biometrical Journal, 61, 290–298.
MathSciNet MATH Google Scholar
Han, S., Andrei, A.-C., Tsui, K.-W. (2014). A semiparametric regression method for interval-censored data. Communication in Statistics-Simulation and Computation, 43, 18–30.
Hu, Y.-H., & Emura, T. (2015). Maximum likelihood estimation for a special exponential family under random double-truncation. Computational Statistics, 30, 1199–1229.
MathSciNet MATH Google Scholar
Jacobsen, M., & Martinussen, T. (2016). A note on the large sample properties of estimators based on generalized linear models for correlated pseudo-observations. Scandinavian Journal of Statistics, 43, 845–862.
MathSciNet MATH Google Scholar
Kalbfleisch, J. D., & Lawless, J. F. (1989). Inferences based on retrospective ascertainment: An analysis of data on transfusion-related AIDS. Journal of the American Statistical Association, 84, 360–372.
MathSciNet MATH Google Scholar
Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 457–481.
MathSciNet MATH Google Scholar
Kim, S., & Kim, Y.-J. (2016). Regression analysis of interval censored competing risk data using a pseudo-value approach. Communications for Statistical Applications and Methods, 23, 555–562.
MathSciNet Google Scholar
Mandel, M., de Uña-Álvarez, J., Simon, D. K., & Betensky, R. A. (2018). Inverse probability weighted Cox regression for doubly truncated data. Biometrics, 74, 481–487.
MathSciNet MATH Google Scholar
Medley, G. F., Anderson, R. M., Cox, D. R., & Billard, L. (1987). Incubation period of AIDS in patients infected via blood transfusion. Nature, 328, 719–721.
Google Scholar
Medley, G. F., Billard, L., Cox, D. R., & Anderson, R. A. (1988). The distribution of the incubation period for the Acquired Immunodeficiency Syndrome (AIDS). Proceedings of the Royal Society of London, Ser. B, 233, 367–377.
Google Scholar
Moreira, C., & de Uña-Álvarez, J. (2010a). Bootstrapping the NPMLE for doubly truncated data. Journal of Nonparametric Statistics, 22(5), 567–583.
MathSciNet MATH Google Scholar
Moreira, C., & de Uña-Álvarez, J. (2010b). A semiparametric estimator of survival for doubly truncated data. Statistics in Medicine, 29(30), 3147–3159.
MathSciNet Google Scholar
Moreira, C., de Uña-Álvarez, J., & Rosa M Crujeiras, R. M. (2010). DTDA: An R package to analyze randomly truncated data. Journal of Statistical Software, 37(7), 1–20.
Google Scholar
Moreira, C., & Van Keilegom, I. (2013). Bandwidth selection for kernel density estimation with doubly truncated data. Computational Statistics and Data Analysis, 61, 107–123.
MathSciNet MATH Google Scholar
Murphy, S. A., Rossini, A. J., & van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. Journal of the American Statistical Association, 92, 968–976.
MathSciNet MATH Google Scholar
Overgaard, M., Thorlund, E., & Petersen, J. (2017). Asymptotic theory of generalized estimating equations based on Jack-knife pseudo-observations. The Annals of Statistics, 45, 1988–2015.
MathSciNet MATH Google Scholar
Overgaard, M., Thorlund, E., & Petersen, J. (2018). Estimating the variance in a pseudo-observation scheme with competing risks. Scandinavian Journal of Statistics, 45, 923–940.
MathSciNet MATH Google Scholar
Rennert, L., & Xie, S. X. (2018). Cox regression model with doubly truncated data. Biometrics, 74, 725–733.
MathSciNet MATH Google Scholar
Shen, P.-S. (2003). The product-limit estimate as an inverse-probability-weighted average. Communications in Statistics-Theory and Methods, 32, 1119–1133.
MathSciNet MATH Google Scholar
Shen, P.-S. (2010a). Nonparametric analysis of doubly truncated data. Annals of the Institute Statistical Mathematics, 62(5), 835–853.
MathSciNet MATH Google Scholar
Shen, P.-S. (2010b). Semiparametric analysis of doubly truncated data. Communications in Statistics-Theory and Methods, 39, 3178–3190.
MathSciNet MATH Google Scholar
Shen, P.-S. (2013). Regression analysis of interval censored and doubly truncated data with linear transformation models. Computational Statistics, 28, 581–596.
MathSciNet MATH Google Scholar
Shen, P.-S. (2016). Analysis of transformation models with doubly truncated data. Statistical Methodology, 30, 15–30.
MathSciNet MATH Google Scholar
Shen, P.-S. & Hsu, H. (2019). Conditional maximum likelihood estimation for semiparametric transformation models with doubly truncated data. Computational Statistics and Data Analysis, (accepted) https://doi.org/10.1016/j.csda.2019.106862.
Shen, P.-S., & Liu, Y. (2019a). Pseudo maximum likelihood estimation for the Cox model with doubly truncated data. Statistical Papers, 60, 1207–1224.
MathSciNet MATH Google Scholar
Shen, P. S., & Liu, Y. (2019b). Pseudo MLE for semiparametric transformation model with doubly truncated data. Journal of the Korean Statistical Society, 48, 384–395.
MathSciNet MATH Google Scholar
Tsai, W.-Y., Jewell, N. P., & Wang, M.-C. (1987). A note on the product-limit estimator under right censoring and left truncation. Biometrika, 74, 883–886.
MATH Google Scholar
Woodroofe, M. (1985). Estimating a distribution function with truncated data. Annals of Statistics, 13, 163–177.
MathSciNet MATH Google Scholar
Ying, Z., Yu, W., Zha, Z., & Zheng, M. (2019). Regression analysis of doubly truncated data. Journal of the American Statistical Association,. https://doi.org/10.1080/01621459.2019.1585252.
Article Google Scholar
Zhang, Z., Sun, L., Zhao, X., & Sun, J. (2005). Regression analysis of intervalcensored failure time data with linear transformation models. Canadian Journal of Statistics, 33, 61–70.
MathSciNet MATH Google Scholar
Zhang, X. (2015). Nonparametric inference for an inverse-probability-weighted estimator with doubly truncated data. Communications in Statistics: Simulation and Computation, 44, 489–504.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Tunghai University, Taichung, 40704, Taiwan
Pao-sheng Shen

Authors

Pao-sheng Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pao-sheng Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

Proof of Theorem 1

Since ${{\mathcal {E}}}_n$ is a vector of empirical function, condition (3.5) of Overgaard et al. (2017) holds, i.e., $||{{\mathcal {E}}}_n-{{\mathcal {E}}}||_p=o_p(n^{-\lambda })$ for some $\lambda \in [{1\over 4},{1\over 2})$ and $p\in [1,2)$, where $||f||_p=\sup \sum _{i=2}^{k}|f(t_{i-1})-f(t_{i})|^p+||f||_{\infty }$, over $\zeta _1\le t_1<\dots ,<t_k\le \zeta _2$ in the interval $[\zeta _1,\zeta _2]$, where $||\cdot ||_{\infty }$ is the supremum norm. Let

$$\begin{aligned} {\hat{S}}_{n,i}^{*}(t)=\Psi ({{\mathcal {E}}})+{\dot{\Psi }}_j({X}_i) +{1\over {n-1}}\sum _{i\ne i^{'}}\ddot{\Psi }({X}_i,{X}_{i^{'}}) \end{aligned}$$

and

$$\begin{aligned} {\hat{U}}_{n}^{*}(\beta _{t,0})=\sum _{i=1}^{n} A(\beta _{t,0};Z_i)^T \left( {\hat{S}}_{n,i}^{*}(t)-E[{\hat{S}}_{n,i}^{*}(t)|Z_i]\right) . \end{aligned}$$

Under (C1) and model (3), we have

$$\begin{aligned}&E[{\hat{S}}_{n,i}^{*}(t)|Z_i]=\Psi ({{\mathcal {E}}})+E[{\dot{\Psi }}({X}_i)|Z_i] \\&\quad =\Psi ({{\mathcal {E}}})+E[{\hat{S}}_n(t)|Z_i]-S(t_j)=E[{\hat{S}}_n(t)|Z_i]=\phi ^{-1}(\beta _{t}^T Z_i)+o(n^{-1/2}). \end{aligned}$$

It follows that by (3.42) of Overgaard et al. (2017)

$$\begin{aligned}&n^{-1/2}|{\hat{U}}_{n}(\beta _{t,0})-{\hat{U}}_{n}^{*}(\beta _{t})| =\sum _{i=1}^{n}\left| A(\beta _{t,0};Z_i)^T \left( [{\hat{S}}_{n,i}(t)-{\hat{S}}_{n,i}^{*}(t)]+o(n^{-1/2})\right) \right| \\&\quad \le \max _{i}|{{\mathcal {R}}}_{ij}|n^{-1/2}\sum _{i=1}^{n}\left| A(\beta _{t,0};Z_i)^T+o(n^{-1/2})\right| = o_p(n^{1/2-2\lambda })+ o(1), \end{aligned}$$

for $\lambda \in [1/4,1/2)$, where

$$\begin{aligned} {{\mathcal {R}}}_{ij}= & {} {1\over 2}\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})+ {1\over {2(n-1)}}\Psi _{{{\mathcal {E}}}}^{''}(\delta _{{X}_i}-{{\mathcal {E}}}_n,\delta _{{X}_i}-{{\mathcal {E}}}_n) \\&+\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_{n}^{-i}-{{\mathcal {E}}})+ \int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+\int _{0}^{1}(1-s)(n-1)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}_{n,s}^{-i},j}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n)ds \\&+\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})(\delta _{{X}_i}-{{\mathcal {E}}}_n,{{\mathcal {E}}}_n^{-i}-{{\mathcal {E}}})ds, \end{aligned}$$

where ${{\mathcal {E}}}_{n,s}={{\mathcal {E}}}+s({{\mathcal {E}}}_n-{{\mathcal {E}}})$ and ${{\mathcal {E}}}_{n,s}^{-i}={{\mathcal {E}}}+s({{\mathcal {E}}}_n^{-i}-{{\mathcal {E}}})$. Thus, $n^{-1/2}{\hat{U}}_{n}(\beta _{t,0})$ and $n^{-1/2}{\hat{U}}_{n}^{*}(\beta _{t})$ are asymptotically equivalent. Furthermore, ${\hat{U}}^{*}(\beta _{t,0})$ can be expressed as

$$\begin{aligned} {\hat{U}}^{*}(\beta _{t,0})=n{1\over {n\atopwithdelims ()2}}\sum _{i=1}^{n}\sum _{i^{'}<i} {1\over 2}h({X}_i,Z_i,{X}_{i^{'}},Z_{i^{'}}). \end{aligned}$$

The factor n aside, this is a U-statistic of order 2. It follows by Theorem 3.3 of Overgaard et al. (2017) that $n^{-1/2}{\hat{U}}_{n}(\beta _{t})$ converges in distribution to $N(0,\Sigma (\beta _{{t},0}))$, where $\Sigma (\beta _{t,0})) =E[h({X}_1,Z_1,{X}_2,Z_2)h({X}_1,Z_1,{X}_3,Z_3)^T]$. Under assumptions (A1)-(A5), it follows that $\sqrt{n}({\hat{\beta }}_{t}-\beta _{ t,0})$ converges in distribution to $N(0,M(\beta _{t,0})^{-1}\Sigma (\beta _{t,0})M(\beta _{t,0}))$ as $n\rightarrow \infty$. The proof is complete.

Proof of Theorem 2

Let

$$\begin{aligned} {\tilde{S}}_{n,i}^{*}(t)=\Psi ({{\mathcal {E}}})+{\dot{\Psi }}({X}_i) +{1\over {n_t-1}}\sum _{i\ne i^{'},i,i^{'}\in {{\mathcal {C}}}_d(t)}\ddot{\Psi }({X}_i,{ X}_{i^{'}}) \end{aligned}$$

and

$$\begin{aligned} {\tilde{U}}^{*}(\beta _{t,0})=\sum _{i\in {{\mathcal {C}}}_d(t)}A(\beta _{t,0};Z_i)^T \left( {\tilde{S}}_{n,i}^{*}(t)-E[{\tilde{S}}_{n,i}^{*}(t)|Z_i]\right) . \end{aligned}$$

It follows that by (3.42) of Overgaard et al. (2017)

$$\begin{aligned}&n_t^{-1/2}|{\tilde{U}}(\beta _{t,0})-{\tilde{U}}^{*}(\beta _{t,0})| =\sum _{i\in {{\mathcal {C}}}_d(t)}^{n}\left| A(\beta _{t,0};Z_i)^T \left( [{\tilde{S}}_{n,i}(t)-{\tilde{S}}_{n,i}^{*}(t)]+o(n_t^{-1/2})\right) \right| \\&\quad \le \max _{i\in {{\mathcal {C}}}_d(t)}|{{\mathcal {D}}}_{ij}|n^{-1/2}\sum _{i\in {{\mathcal {C}}}_d(t)}^{n}\left| A(\beta _{t,0};Z_i)^T+o(n_t^{-1/2})\right| = o_p(n_t^{1/2-2\lambda })+ o(1), \end{aligned}$$

for $\lambda \in [1/4,1/2)$, where

$$\begin{aligned} {{\mathcal {D}}}_{ij}= & {} {1\over 2}\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})+ {{n_t-1}\over {2(n-1)^2}}\Psi _{{{\mathcal {E}}}}^{''}(\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n,\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n) \\&+{{n_t-1}\over {n-1}}\Psi _{{{\mathcal {E}}}}^{''}({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_{n}^{-i}-{{\mathcal {E}}})+{{n_t-1}\over {n-1}} \int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+\int _{0}^{1}(1-s)(n_t-1)(\Psi _{{{\mathcal {E}}}_{n,s}}^{''}- \Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},{{\mathcal {E}}}_n-{{\mathcal {E}}})ds \\&+{{n_t-1}\over {n-1}}\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})({{\mathcal {E}}}_n-{{\mathcal {E}}},\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n)ds \\&+{{n_t-1}\over {n-1}}\int _{0}^{1}(1-s)(\Psi _{{{\mathcal {E}}}_{n,s}^{-i}}^{''} -\Psi _{{{\mathcal {E}}}}^{''})(\delta _{\mathbf{X}_i}-{{\mathcal {E}}}_n,{{\mathcal {E}}}_n^{-i}-{{\mathcal {E}}})ds. \end{aligned}$$

Thus, $n_t^{-1/2}{\tilde{U}}(\beta _{t,0})$ and $n_t^{-1/2}{\tilde{U}}^{*}(\beta _{t,0})$ are asymptotically equivalent. Furthermore, ${\tilde{U}}^{*}(\beta _{t,0})$ can be expressed as

$$\begin{aligned} {\tilde{U}}^{*}(\beta _{t,0})=n_t{1\over {n_t\atopwithdelims ()2}}\sum _{i\in {{\mathcal {C}}}_d(t)}\sum _{i^{'}<i, i^{'}\in {{\mathcal {C}}}_d(t)}{1\over 2}h({X}_i,Z_i,{X}_{i^{'}},Z_{i^{'}}). \end{aligned}$$

It follows that $n_t^{-1/2}{\tilde{U}}(\beta _{t})$ converges in distribution to $N(0,\Sigma _d(\beta _{t,0}))$. Under assumptions (A1)–(A5), $\sqrt{n_t}({\tilde{\beta }}_{t}-\beta _{{t},0})$ converges in distribution to $N(0,M(\beta _{{t},0})^{-1}\Sigma _d(\beta _{{t},0})M(\beta _{{t_j},0}))$ as $n_t\rightarrow \infty$. The proof is complete.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, Ps. Regression analysis of doubly truncated data based on pseudo-observations. J. Korean Stat. Soc. 50, 1197–1218 (2021). https://doi.org/10.1007/s42952-021-00113-9

Download citation

Received: 11 March 2020
Accepted: 04 March 2021
Published: 16 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s42952-021-00113-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regression analysis of doubly truncated data based on pseudo-observations

Abstract

Access this article

Similar content being viewed by others

Estimation in the Complementary Exponential Geometric Distribution Based on Progressive Type-II Censored Data

Empirical likelihood and variable selection for partially linear single-index EV models with missing censoring indicators

An algorithm for estimating survival under a copula-based dependent truncation model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix:

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regression analysis of doubly truncated data based on pseudo-observations

Abstract

Access this article

Similar content being viewed by others

Estimation in the Complementary Exponential Geometric Distribution Based on Progressive Type-II Censored Data

Empirical likelihood and variable selection for partially linear single-index EV models with missing censoring indicators

An algorithm for estimating survival under a copula-based dependent truncation model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix:

Appendix:

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation