Abstract
Doubly-truncated data often appear in lifetime data analysis, where samples are collected under certain time constraints. Nonparametric methods for doubly-truncated data have been studied well in the literature. Alternatively, this paper considers parametric inference when samples are subject to double-truncation. Efron and Petrosian (J Am Stat Assoc 94:824–834, 1999) proposed to fit a parametric family, called the special exponential family, with doubly-truncated data. However, non-trivial technical aspects, such as parameter space, support of the density, and computational algorithms, have not been discussed in the literature. This paper fills this gap by providing the technical aspects, including adequate choices of parameter space as well as support, and reliable computational algorithms. Simulations are conducted to verify the suggested techniques, and real data are used for illustration.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F. In: Proceedings of the 2nd international symposium on information theory, Akademia Kiado, Budapest, pp 267–281
Andersen PK, Keiding N (2002) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115
Balakrishnan N, Asit Basu P (1996) The exponential distribution: theory, methods and applications. Taylor & Francis Ltd, USA
Burden RL, Faires JD (2011) Numerical analysis. Cengage Learning, Boston
Chen YH (2009) Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models. Biometrika 96:235–251
Cohen AC (1991) Truncated and censored samples. Marcel Dekker, New York
Casella G, Berger RL (2002) Statistical inference. Duxbury Thomson Learning, Australia
Castillo JD (1994) The singly truncated normal distribution: a non-steep exponential family. Ann Inst Stat Math 46:57–66
Commenges D (2002) Inference for multi-state models from interval-censored data. Stat Methods Med Res 11:167–182
Efron B, Petrosian R (1999) Nonparametric methods for doubly truncated data. J Am Stat Assoc 94:824–834
Emura T, Konno Y (2012a) Multivariate normal distribution approaches for dependently truncated data. Stat Pap 53:133–149
Emura T, Konno Y (2012b) A goodness-of-fit tests for parametric models based on dependently truncated data. Comput Stat Data Anal 56:2237–2250
Emura T, Konno Y, Michimae H (2014) Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal. doi:10.1007/s10985-014-9297-5
Knight K (2000) Mathematical statistics. Chapman and Hall, Boca Raton
Lagakos SW, Barraj LM, De Gruttola V (1988) Non-parametric analysis of truncated survival data with application to AIDS. Biometrika 75:515–523
Long TH, Emura T (2014) A control chart using copula-based Markov chain models. J Chin Stat Assoc 52:466–496
Mandrekar SJ, Nandrekar JN (2003) Are our data symmetric? Stat Methods Med Res 12:505–513
Moreira C, de Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. J Nonparametric Stati 22:567–583
Moreira C, de Uña-Álvarez J, Van Keilegom I (2014) Goodness-of-fit tests for a semiparametric model under random double truncation. Comput Stat. doi:10.1007/s00180-014-0496-z
Moreira C, de Uña-Álvarez J (2012) Kernel density estimation with doubly-truncated data. Electron J Stat 6:501–521
Moreira C, Van Keilegom I (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Comput Stat Data Anal 61:107–123
R Development Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, R version 3:2
Robertson HT, Allison DB (2012) A novel generalized normal distribution for human longevity and other negatively skewed data. PLoS One 7:e37025
Sankaran PG, Sunoj SM (2004) Identification of models using failure rate and mean residual life of doubly truncated random variables. Stat Pap 45:97–109
Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62:835–853
Stovring H, Wang MC (2007) A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incidence events. BMC Med Res Methodol 7:53
Strzalkowska-Kominiak E, Stute W (2013) Empirical copulas for consequtive survival data: copulas in survival analysis. TEST 22:688–714
Stute W, González-Manteiga W, Quindimil MP (1993) Bootstrap based goodness-of-fit-tests. Metrika 40:243–256
Zhu H, Wang MC (2012) Analyzing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika 99:345–361
Acknowledgments
We are grateful to the comments and suggestions from associate editor and two anonymous referees that greatly improved the manuscript. This work is supported by the research grant funded by the National Science Council of Taiwan (NSC 101-2118-M-008-002-MY2) and the Ministry of Science and Technology of Taiwan (MOST 103-2118-M-008-MY2).
Author information
Authors and Affiliations
Corresponding author
Appendix: The data generations for simulations
Appendix: The data generations for simulations
1.1 Data generations for one-parameter SEF
First, consider the case \(\eta >0\). We let
In this case, data are generated by using inverse transformations
where \(W_1, W_2, W_3 \sim U(0, 1)\). Then, the inclusion probability is
\(\eta _u =0\) yields \(P(-\infty <Y\le V)=\eta _v / (\eta _v +\eta )\) and \(\eta _v =\infty \) yields \(P(U\le Y<\tau _2 )=\eta / (\eta +\eta _u )\). One can get \(P(U\le Y\le V)\approx 0.5\) by making \(P(U\le Y)=0.75\) and \(P(Y\le V)=0.75\).
For instance, with fixed \(\eta =3\), we find \(\eta _u \) and \(\eta _v \) as follows:
-
1.
Set \(P(U\le Y)=\eta / (\eta +\eta _u )=0.75\), and then obtain \(\eta _u =1\).
-
2.
Set \(P(Y\le V)=\eta _v / (\eta _v +\eta )=0.75\), and then obtain \(\eta _v =9\).
Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5031447\).
Another case is \(\eta <0\), where the range of \(Y\) is \(y\in [\tau _1, \infty )\). We consider
In this case, data are generated by using inverse transformations
where \(W_1, W_2, W_3 \sim U(0, 1)\). Then, the inclusion probability is
\(\eta _u =\infty \) yields \(P(0\le Y\le V)=\eta /(\eta +\eta _v )\) and \(\eta _v =0\) yields \(P(U\le Y<\infty )=\eta _u /(\eta +\eta _u )\). One can get \(P(U\le Y\le V)\approx 0.5\) by making \(P(U\le Y)=0.75\) and \(P(Y\le V)=0.75\).
For instance, with fixed \(\eta =-1\), we find \(\eta _u \) and \(\eta _v \) as follows:
-
1.
Set \(P(U\le Y)=\eta _u /(\eta +\eta _u )=0.75\), and then obtain \(\eta _u =-3\).
-
2.
Set \(P(Y\le V)=\eta /(\eta +\eta _v )=0.75\), and then obtain \(\eta _v =-1/3\).
Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5235602\).
1.2 Data generation for two-parameter SEF
We consider \(U\sim N(\mu _u, 1)\), \(V\sim N(\mu _v, 1)\) and \(Y\sim N(\mu , 1)\). One can obtain \(P(U\le Y\le V)\approx 0.5\) by making left-truncated percentage is equal to right-truncated percentage. Since the normal distribution is symmetric, we set \(\mu _u =\mu -\Delta \) and \(\mu _v =\mu +\Delta \). Then,
The desired value \(\Delta >0\) is chosen numerically. For instance, with fixed \(\mu =5\), the desired value is \(\Delta \)= 0.91, which makes \(P(U\le Y\le V)\approx 0.5\) (Fig. 8). Then, we choose \(\mu _u =4.09\) and \(\mu _v =5.91\). With this setting, we have \(P(U\le Y\le V)=0.5076142\).
1.3 Data generations for cubic SEF
For the cubic SEF with \( \eta _3 >0\), we consider \(U\sim N(\mu _u , 1)\), \(V\sim N(\mu _v, 1)\) and
where \(\phi (\varvec{\eta })=\log \{\int _{\mathcal {Y}} {\exp (\eta _1 y+\eta _2 y^{2}+\eta _3 y^{3})\, dy} \}\). Data are generated by using an inverse transformation, which numerically solves \(1-S_\eta (y)=W\), where \(W\sim U(0, 1)\). One can get \(P(U\le Y\le V)\approx 0.5\) by making left-truncated and right-truncated percentages equal by setting \(\mu _u =\eta _1 -\Delta \), \(\mu _v =\eta _1 +\Delta \). Then,
The desired value \(\Delta >0\) is chosen numerically. For instance, for fixed \(\eta _1 =5\), \(\eta _2 =-0.5\), \(\eta _3 =0.005\) and \(\tau _2 =8\), the value of \(\Delta \) is 1.01 (Fig. 9). Hence, we choose \(\mu _u =3.99\) and \(\mu _v =6.01\). Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5035228\).
The other case \( \eta _3 <0\) is similar. Under \(\eta _1 =5\), \(\eta _2 =-0.5\), \(\eta _3 =-0.005\), and \(\tau _1 =2\), the desired value is \(\Delta =0.91\) (see Fig. 9). Hence, we choose \(\mu _u =4.09\) and \(\mu _v =5.91\). Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5027334\).
Rights and permissions
About this article
Cite this article
Hu, YH., Emura, T. Maximum likelihood estimation for a special exponential family under random double-truncation. Comput Stat 30, 1199–1229 (2015). https://doi.org/10.1007/s00180-015-0564-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0564-z