Maximum likelihood estimation for a special exponential family under random double-truncation

Hu, Ya-Hsuan; Emura, Takeshi

doi:10.1007/s00180-015-0564-z

Maximum likelihood estimation for a special exponential family under random double-truncation

Original Paper
Published: 12 February 2015

Volume 30, pages 1199–1229, (2015)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Ya-Hsuan Hu¹ &
Takeshi Emura¹

497 Accesses
22 Citations
Explore all metrics

Abstract

Doubly-truncated data often appear in lifetime data analysis, where samples are collected under certain time constraints. Nonparametric methods for doubly-truncated data have been studied well in the literature. Alternatively, this paper considers parametric inference when samples are subject to double-truncation. Efron and Petrosian (J Am Stat Assoc 94:824–834, 1999) proposed to fit a parametric family, called the special exponential family, with doubly-truncated data. However, non-trivial technical aspects, such as parameter space, support of the density, and computational algorithms, have not been discussed in the literature. This paper fills this gap by providing the technical aspects, including adequate choices of parameter space as well as support, and reliable computational algorithms. Simulations are conducted to verify the suggested techniques, and real data are used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation

Article 19 December 2015

Parametric Estimation Under Exponential Family

Asymptotic Results for Truncated-censored and Associated Data

Article 21 January 2019

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F. In: Proceedings of the 2nd international symposium on information theory, Akademia Kiado, Budapest, pp 267–281
Andersen PK, Keiding N (2002) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115
Article MATH Google Scholar
Balakrishnan N, Asit Basu P (1996) The exponential distribution: theory, methods and applications. Taylor & Francis Ltd, USA
Google Scholar
Burden RL, Faires JD (2011) Numerical analysis. Cengage Learning, Boston
Google Scholar
Chen YH (2009) Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models. Biometrika 96:235–251
Google Scholar
Cohen AC (1991) Truncated and censored samples. Marcel Dekker, New York
Book MATH Google Scholar
Casella G, Berger RL (2002) Statistical inference. Duxbury Thomson Learning, Australia
Google Scholar
Castillo JD (1994) The singly truncated normal distribution: a non-steep exponential family. Ann Inst Stat Math 46:57–66
Article MATH Google Scholar
Commenges D (2002) Inference for multi-state models from interval-censored data. Stat Methods Med Res 11:167–182
Article MATH Google Scholar
Efron B, Petrosian R (1999) Nonparametric methods for doubly truncated data. J Am Stat Assoc 94:824–834
Article MathSciNet MATH Google Scholar
Emura T, Konno Y (2012a) Multivariate normal distribution approaches for dependently truncated data. Stat Pap 53:133–149
Article MathSciNet MATH Google Scholar
Emura T, Konno Y (2012b) A goodness-of-fit tests for parametric models based on dependently truncated data. Comput Stat Data Anal 56:2237–2250
Article MathSciNet MATH Google Scholar
Emura T, Konno Y, Michimae H (2014) Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal. doi:10.1007/s10985-014-9297-5
Google Scholar
Knight K (2000) Mathematical statistics. Chapman and Hall, Boca Raton
MATH Google Scholar
Lagakos SW, Barraj LM, De Gruttola V (1988) Non-parametric analysis of truncated survival data with application to AIDS. Biometrika 75:515–523
Long TH, Emura T (2014) A control chart using copula-based Markov chain models. J Chin Stat Assoc 52:466–496
Google Scholar
Mandrekar SJ, Nandrekar JN (2003) Are our data symmetric? Stat Methods Med Res 12:505–513
Article MathSciNet Google Scholar
Moreira C, de Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. J Nonparametric Stati 22:567–583
Article MATH Google Scholar
Moreira C, de Uña-Álvarez J, Van Keilegom I (2014) Goodness-of-fit tests for a semiparametric model under random double truncation. Comput Stat. doi:10.1007/s00180-014-0496-z
Moreira C, de Uña-Álvarez J (2012) Kernel density estimation with doubly-truncated data. Electron J Stat 6:501–521
Moreira C, Van Keilegom I (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Comput Stat Data Anal 61:107–123
Article Google Scholar
R Development Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, R version 3:2
Robertson HT, Allison DB (2012) A novel generalized normal distribution for human longevity and other negatively skewed data. PLoS One 7:e37025
Article Google Scholar
Sankaran PG, Sunoj SM (2004) Identification of models using failure rate and mean residual life of doubly truncated random variables. Stat Pap 45:97–109
Article MathSciNet MATH Google Scholar
Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62:835–853
Article Google Scholar
Stovring H, Wang MC (2007) A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incidence events. BMC Med Res Methodol 7:53
Article Google Scholar
Strzalkowska-Kominiak E, Stute W (2013) Empirical copulas for consequtive survival data: copulas in survival analysis. TEST 22:688–714
Article MathSciNet MATH Google Scholar
Stute W, González-Manteiga W, Quindimil MP (1993) Bootstrap based goodness-of-fit-tests. Metrika 40:243–256
Article MathSciNet MATH Google Scholar
Zhu H, Wang MC (2012) Analyzing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika 99:345–361
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We are grateful to the comments and suggestions from associate editor and two anonymous referees that greatly improved the manuscript. This work is supported by the research grant funded by the National Science Council of Taiwan (NSC 101-2118-M-008-002-MY2) and the Ministry of Science and Technology of Taiwan (MOST 103-2118-M-008-MY2).

Author information

Authors and Affiliations

Graduate Institute of Statistics, National Central University, Taoyuan, Taiwan
Ya-Hsuan Hu & Takeshi Emura

Authors

Ya-Hsuan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Emura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takeshi Emura.

Appendix: The data generations for simulations

1.1 Data generations for one-parameter SEF

First, consider the case $\eta >0$. We let

$$\begin{aligned} U\sim f_U (u)&= \eta _u \exp \{\eta _u (u-\tau _2 )\}, -\infty <u<\tau _2, \eta _u >0, \\ V\sim f_V (v)&= \eta _v \exp \{\eta _v (v-\tau _2 )\}, -\infty <v<\tau _2, \eta _v >0, \\ Y\sim f_Y (y)&= \eta \exp \{\eta (y-\tau _2 )\}, -\infty <y<\tau _2, \eta >0. \end{aligned}$$

In this case, data are generated by using inverse transformations

$$\begin{aligned} U=\tau _2 +\frac{1}{\eta _u }\log (W_1), \quad V=\tau _2 +\frac{1}{\eta _v }\log (W_2), \quad Y=\tau _2 +\frac{1}{\eta }\log (W_3), \end{aligned}$$

where $W_1, W_2, W_3 \sim U(0, 1)$. Then, the inclusion probability is

$$\begin{aligned} P(U\le Y\le V)\!=\!\int \limits _{-\infty }^{\tau _2 } {\int \limits _{-\infty }^v {\int \limits _{-\infty }^y {f_Y (y)f_V (v)} } } f_U (u) dudydv =\frac{\eta \cdot \eta _v }{(\eta +\eta _u )(\eta +\eta _u +\eta _v )}. \end{aligned}$$

$\eta _u =0$ yields $P(-\infty <Y\le V)=\eta _v / (\eta _v +\eta )$ and $\eta _v =\infty $ yields $P(U\le Y<\tau _2 )=\eta / (\eta +\eta _u )$. One can get $P(U\le Y\le V)\approx 0.5$ by making $P(U\le Y)=0.75$ and $P(Y\le V)=0.75$.

For instance, with fixed $\eta =3$, we find $\eta _u $ and $\eta _v $ as follows:

1.
Set $P(U\le Y)=\eta / (\eta +\eta _u )=0.75$, and then obtain $\eta _u =1$.
2.
Set $P(Y\le V)=\eta _v / (\eta _v +\eta )=0.75$, and then obtain $\eta _v =9$.

Accordingly, the inclusion probability becomes $P(U\le Y\le V)=0.5031447$.

Another case is $\eta <0$, where the range of $Y$ is $y\in [\tau _1, \infty )$. We consider

$$\begin{aligned} U\sim f_U (u)&= -\eta _u \exp \{\eta _u (u-\tau _1 )\}, \tau _1 <u<\infty , \eta _u <0, \\ V\sim f_V (v)&= -\eta _v \exp \{\eta _v (v-\tau _1 )\}, \tau _1 <v<\infty , \eta _v <0, \\ Y\sim f_Y (y)&= -\eta \exp \{\eta (y-\tau _1 )\}, \tau _1 <y<\infty , \eta <0. \end{aligned}$$

In this case, data are generated by using inverse transformations

$$\begin{aligned} U=\tau _1 +\frac{1}{\eta _u }\log (1-W_1), \quad V=\tau _1 +\frac{1}{\eta _v }\log (1-W_2), \quad Y=\tau _1 +\frac{1}{\eta }\log (1-W_3), \end{aligned}$$

where $W_1, W_2, W_3 \sim U(0, 1)$. Then, the inclusion probability is

$$\begin{aligned} P(U\le Y\le V)=\int \limits _\tau ^\infty {\int \limits _u^\infty {\int \limits _y^\infty {f_Y (y)f_V (v)} } } f_U (u) dvdydu =\frac{\eta \cdot \eta _u }{(\eta +\eta _v )(\eta +\eta _u +\eta _v )}. \end{aligned}$$

$\eta _u =\infty $ yields $P(0\le Y\le V)=\eta /(\eta +\eta _v )$ and $\eta _v =0$ yields $P(U\le Y<\infty )=\eta _u /(\eta +\eta _u )$. One can get $P(U\le Y\le V)\approx 0.5$ by making $P(U\le Y)=0.75$ and $P(Y\le V)=0.75$.

For instance, with fixed $\eta =-1$, we find $\eta _u $ and $\eta _v $ as follows:

1.
Set $P(U\le Y)=\eta _u /(\eta +\eta _u )=0.75$, and then obtain $\eta _u =-3$.
2.
Set $P(Y\le V)=\eta /(\eta +\eta _v )=0.75$, and then obtain $\eta _v =-1/3$.

Accordingly, the inclusion probability becomes $P(U\le Y\le V)=0.5235602$.

1.2 Data generation for two-parameter SEF

We consider $U\sim N(\mu _u, 1)$, $V\sim N(\mu _v, 1)$ and $Y\sim N(\mu , 1)$. One can obtain $P(U\le Y\le V)\approx 0.5$ by making left-truncated percentage is equal to right-truncated percentage. Since the normal distribution is symmetric, we set $\mu _u =\mu -\Delta $ and $\mu _v =\mu +\Delta $. Then,

$$\begin{aligned} P(U\le Y\le V)=\int \limits _{-\infty }^\infty {\varphi (y-\mu )\cdot \Phi (y-\mu +\Delta )\cdot \{1-\Phi (y-\mu -\Delta )\}} \, dy. \end{aligned}$$

The desired value $\Delta >0$ is chosen numerically. For instance, with fixed $\mu =5$, the desired value is $\Delta $= 0.91, which makes $P(U\le Y\le V)\approx 0.5$ (Fig. 8). Then, we choose $\mu _u =4.09$ and $\mu _v =5.91$. With this setting, we have $P(U\le Y\le V)=0.5076142$.

1.3 Data generations for cubic SEF

For the cubic SEF with $ \eta _3 >0$, we consider $U\sim N(\mu _u , 1)$, $V\sim N(\mu _v, 1)$ and

$$\begin{aligned} Y\sim f_{\varvec{\eta }} (y)=\exp [\eta _1 y+\eta _2 y^{2}+\eta _3 y^{3}-\phi (\varvec{\eta })],\quad y\in {\mathcal {Y}}=(-\infty , \tau _2 ], \end{aligned}$$

where $\phi (\varvec{\eta })=\log \{\int _{\mathcal {Y}} {\exp (\eta _1 y+\eta _2 y^{2}+\eta _3 y^{3})\, dy} \}$. Data are generated by using an inverse transformation, which numerically solves $1-S_\eta (y)=W$, where $W\sim U(0, 1)$. One can get $P(U\le Y\le V)\approx 0.5$ by making left-truncated and right-truncated percentages equal by setting $\mu _u =\eta _1 -\Delta $, $\mu _v =\eta _1 +\Delta $. Then,

$$\begin{aligned} P(U\le Y\le V)=\int \limits _{-\infty }^{\tau _2 } {\{1-\Phi (y-\eta _1 -\Delta )\}\Phi (y-\eta _1 +\Delta )} f_\eta (y)\, dy. \end{aligned}$$

The desired value $\Delta >0$ is chosen numerically. For instance, for fixed $\eta _1 =5$, $\eta _2 =-0.5$, $\eta _3 =0.005$ and $\tau _2 =8$, the value of $\Delta $ is 1.01 (Fig. 9). Hence, we choose $\mu _u =3.99$ and $\mu _v =6.01$. Accordingly, the inclusion probability becomes $P(U\le Y\le V)=0.5035228$.

The other case $ \eta _3 <0$ is similar. Under $\eta _1 =5$, $\eta _2 =-0.5$, $\eta _3 =-0.005$, and $\tau _1 =2$, the desired value is $\Delta =0.91$ (see Fig. 9). Hence, we choose $\mu _u =4.09$ and $\mu _v =5.91$. Accordingly, the inclusion probability becomes $P(U\le Y\le V)=0.5027334$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, YH., Emura, T. Maximum likelihood estimation for a special exponential family under random double-truncation. Comput Stat 30, 1199–1229 (2015). https://doi.org/10.1007/s00180-015-0564-z

Download citation

Received: 23 June 2014
Accepted: 28 January 2015
Published: 12 February 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s00180-015-0564-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum likelihood estimation for a special exponential family under random double-truncation

Abstract

Access this article

Similar content being viewed by others

Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation

Parametric Estimation Under Exponential Family

Asymptotic Results for Truncated-censored and Associated Data

References

Acknowledgments