Skip to main content
Log in

Maximum likelihood estimation for a special exponential family under random double-truncation

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Doubly-truncated data often appear in lifetime data analysis, where samples are collected under certain time constraints. Nonparametric methods for doubly-truncated data have been studied well in the literature. Alternatively, this paper considers parametric inference when samples are subject to double-truncation. Efron and Petrosian (J Am Stat Assoc 94:824–834, 1999) proposed to fit a parametric family, called the special exponential family, with doubly-truncated data. However, non-trivial technical aspects, such as parameter space, support of the density, and computational algorithms, have not been discussed in the literature. This paper fills this gap by providing the technical aspects, including adequate choices of parameter space as well as support, and reliable computational algorithms. Simulations are conducted to verify the suggested techniques, and real data are used for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F. In: Proceedings of the 2nd international symposium on information theory, Akademia Kiado, Budapest, pp 267–281

  • Andersen PK, Keiding N (2002) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115

    Article  MATH  Google Scholar 

  • Balakrishnan N, Asit Basu P (1996) The exponential distribution: theory, methods and applications. Taylor & Francis Ltd, USA

    Google Scholar 

  • Burden RL, Faires JD (2011) Numerical analysis. Cengage Learning, Boston

    Google Scholar 

  • Chen YH (2009) Weighted Breslow-type and maximum likelihood estimation in semiparametric transformation models. Biometrika 96:235–251

    Google Scholar 

  • Cohen AC (1991) Truncated and censored samples. Marcel Dekker, New York

    Book  MATH  Google Scholar 

  • Casella G, Berger RL (2002) Statistical inference. Duxbury Thomson Learning, Australia

    Google Scholar 

  • Castillo JD (1994) The singly truncated normal distribution: a non-steep exponential family. Ann Inst Stat Math 46:57–66

    Article  MATH  Google Scholar 

  • Commenges D (2002) Inference for multi-state models from interval-censored data. Stat Methods Med Res 11:167–182

    Article  MATH  Google Scholar 

  • Efron B, Petrosian R (1999) Nonparametric methods for doubly truncated data. J Am Stat Assoc 94:824–834

    Article  MathSciNet  MATH  Google Scholar 

  • Emura T, Konno Y (2012a) Multivariate normal distribution approaches for dependently truncated data. Stat Pap 53:133–149

    Article  MathSciNet  MATH  Google Scholar 

  • Emura T, Konno Y (2012b) A goodness-of-fit tests for parametric models based on dependently truncated data. Comput Stat Data Anal 56:2237–2250

    Article  MathSciNet  MATH  Google Scholar 

  • Emura T, Konno Y, Michimae H (2014) Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal. doi:10.1007/s10985-014-9297-5

    Google Scholar 

  • Knight K (2000) Mathematical statistics. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  • Lagakos SW, Barraj LM, De Gruttola V (1988) Non-parametric analysis of truncated survival data with application to AIDS. Biometrika 75:515–523

  • Long TH, Emura T (2014) A control chart using copula-based Markov chain models. J Chin Stat Assoc 52:466–496

    Google Scholar 

  • Mandrekar SJ, Nandrekar JN (2003) Are our data symmetric? Stat Methods Med Res 12:505–513

    Article  MathSciNet  Google Scholar 

  • Moreira C, de Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. J Nonparametric Stati 22:567–583

    Article  MATH  Google Scholar 

  • Moreira C, de Uña-Álvarez J, Van Keilegom I (2014) Goodness-of-fit tests for a semiparametric model under random double truncation. Comput Stat. doi:10.1007/s00180-014-0496-z

  • Moreira C, de Uña-Álvarez J (2012) Kernel density estimation with doubly-truncated data. Electron J Stat 6:501–521

  • Moreira C, Van Keilegom I (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Comput Stat Data Anal 61:107–123

    Article  Google Scholar 

  • R Development Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, R version 3:2

  • Robertson HT, Allison DB (2012) A novel generalized normal distribution for human longevity and other negatively skewed data. PLoS One 7:e37025

    Article  Google Scholar 

  • Sankaran PG, Sunoj SM (2004) Identification of models using failure rate and mean residual life of doubly truncated random variables. Stat Pap 45:97–109

    Article  MathSciNet  MATH  Google Scholar 

  • Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62:835–853

    Article  Google Scholar 

  • Stovring H, Wang MC (2007) A new approach of nonparametric estimation of incidence and lifetime risk based on birth rates and incidence events. BMC Med Res Methodol 7:53

    Article  Google Scholar 

  • Strzalkowska-Kominiak E, Stute W (2013) Empirical copulas for consequtive survival data: copulas in survival analysis. TEST 22:688–714

    Article  MathSciNet  MATH  Google Scholar 

  • Stute W, González-Manteiga W, Quindimil MP (1993) Bootstrap based goodness-of-fit-tests. Metrika 40:243–256

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu H, Wang MC (2012) Analyzing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika 99:345–361

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We are grateful to the comments and suggestions from associate editor and two anonymous referees that greatly improved the manuscript. This work is supported by the research grant funded by the National Science Council of Taiwan (NSC 101-2118-M-008-002-MY2) and the Ministry of Science and Technology of Taiwan (MOST 103-2118-M-008-MY2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takeshi Emura.

Appendix: The data generations for simulations

Appendix: The data generations for simulations

1.1 Data generations for one-parameter SEF

First, consider the case \(\eta >0\). We let

$$\begin{aligned} U\sim f_U (u)&= \eta _u \exp \{\eta _u (u-\tau _2 )\}, -\infty <u<\tau _2, \eta _u >0, \\ V\sim f_V (v)&= \eta _v \exp \{\eta _v (v-\tau _2 )\}, -\infty <v<\tau _2, \eta _v >0, \\ Y\sim f_Y (y)&= \eta \exp \{\eta (y-\tau _2 )\}, -\infty <y<\tau _2, \eta >0. \end{aligned}$$

In this case, data are generated by using inverse transformations

$$\begin{aligned} U=\tau _2 +\frac{1}{\eta _u }\log (W_1), \quad V=\tau _2 +\frac{1}{\eta _v }\log (W_2), \quad Y=\tau _2 +\frac{1}{\eta }\log (W_3), \end{aligned}$$

where \(W_1, W_2, W_3 \sim U(0, 1)\). Then, the inclusion probability is

$$\begin{aligned} P(U\le Y\le V)\!=\!\int \limits _{-\infty }^{\tau _2 } {\int \limits _{-\infty }^v {\int \limits _{-\infty }^y {f_Y (y)f_V (v)} } } f_U (u) dudydv =\frac{\eta \cdot \eta _v }{(\eta +\eta _u )(\eta +\eta _u +\eta _v )}. \end{aligned}$$

\(\eta _u =0\) yields \(P(-\infty <Y\le V)=\eta _v / (\eta _v +\eta )\) and \(\eta _v =\infty \) yields \(P(U\le Y<\tau _2 )=\eta / (\eta +\eta _u )\). One can get \(P(U\le Y\le V)\approx 0.5\) by making \(P(U\le Y)=0.75\) and \(P(Y\le V)=0.75\).

For instance, with fixed \(\eta =3\), we find \(\eta _u \) and \(\eta _v \) as follows:

  1. 1.

    Set \(P(U\le Y)=\eta / (\eta +\eta _u )=0.75\), and then obtain \(\eta _u =1\).

  2. 2.

    Set \(P(Y\le V)=\eta _v / (\eta _v +\eta )=0.75\), and then obtain \(\eta _v =9\).

Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5031447\).

Another case is \(\eta <0\), where the range of \(Y\) is \(y\in [\tau _1, \infty )\). We consider

$$\begin{aligned} U\sim f_U (u)&= -\eta _u \exp \{\eta _u (u-\tau _1 )\}, \tau _1 <u<\infty , \eta _u <0, \\ V\sim f_V (v)&= -\eta _v \exp \{\eta _v (v-\tau _1 )\}, \tau _1 <v<\infty , \eta _v <0, \\ Y\sim f_Y (y)&= -\eta \exp \{\eta (y-\tau _1 )\}, \tau _1 <y<\infty , \eta <0. \end{aligned}$$

In this case, data are generated by using inverse transformations

$$\begin{aligned} U=\tau _1 +\frac{1}{\eta _u }\log (1-W_1), \quad V=\tau _1 +\frac{1}{\eta _v }\log (1-W_2), \quad Y=\tau _1 +\frac{1}{\eta }\log (1-W_3), \end{aligned}$$

where \(W_1, W_2, W_3 \sim U(0, 1)\). Then, the inclusion probability is

$$\begin{aligned} P(U\le Y\le V)=\int \limits _\tau ^\infty {\int \limits _u^\infty {\int \limits _y^\infty {f_Y (y)f_V (v)} } } f_U (u) dvdydu =\frac{\eta \cdot \eta _u }{(\eta +\eta _v )(\eta +\eta _u +\eta _v )}. \end{aligned}$$

\(\eta _u =\infty \) yields \(P(0\le Y\le V)=\eta /(\eta +\eta _v )\) and \(\eta _v =0\) yields \(P(U\le Y<\infty )=\eta _u /(\eta +\eta _u )\). One can get \(P(U\le Y\le V)\approx 0.5\) by making \(P(U\le Y)=0.75\) and \(P(Y\le V)=0.75\).

For instance, with fixed \(\eta =-1\), we find \(\eta _u \) and \(\eta _v \) as follows:

  1. 1.

    Set \(P(U\le Y)=\eta _u /(\eta +\eta _u )=0.75\), and then obtain \(\eta _u =-3\).

  2. 2.

    Set \(P(Y\le V)=\eta /(\eta +\eta _v )=0.75\), and then obtain \(\eta _v =-1/3\).

Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5235602\).

1.2 Data generation for two-parameter SEF

We consider \(U\sim N(\mu _u, 1)\), \(V\sim N(\mu _v, 1)\) and \(Y\sim N(\mu , 1)\). One can obtain \(P(U\le Y\le V)\approx 0.5\) by making left-truncated percentage is equal to right-truncated percentage. Since the normal distribution is symmetric, we set \(\mu _u =\mu -\Delta \) and \(\mu _v =\mu +\Delta \). Then,

$$\begin{aligned} P(U\le Y\le V)=\int \limits _{-\infty }^\infty {\varphi (y-\mu )\cdot \Phi (y-\mu +\Delta )\cdot \{1-\Phi (y-\mu -\Delta )\}} \, dy. \end{aligned}$$

The desired value \(\Delta >0\) is chosen numerically. For instance, with fixed \(\mu =5\), the desired value is \(\Delta \)= 0.91, which makes \(P(U\le Y\le V)\approx 0.5\) (Fig. 8). Then, we choose \(\mu _u =4.09\) and \(\mu _v =5.91\). With this setting, we have \(P(U\le Y\le V)=0.5076142\).

Fig. 8
figure 8

An example for how to choose the value \(\Delta \) under the two-parameter SEF

1.3 Data generations for cubic SEF

For the cubic SEF with \( \eta _3 >0\), we consider \(U\sim N(\mu _u , 1)\), \(V\sim N(\mu _v, 1)\) and

$$\begin{aligned} Y\sim f_{\varvec{\eta }} (y)=\exp [\eta _1 y+\eta _2 y^{2}+\eta _3 y^{3}-\phi (\varvec{\eta })],\quad y\in {\mathcal {Y}}=(-\infty , \tau _2 ], \end{aligned}$$

where \(\phi (\varvec{\eta })=\log \{\int _{\mathcal {Y}} {\exp (\eta _1 y+\eta _2 y^{2}+\eta _3 y^{3})\, dy} \}\). Data are generated by using an inverse transformation, which numerically solves \(1-S_\eta (y)=W\), where \(W\sim U(0, 1)\). One can get \(P(U\le Y\le V)\approx 0.5\) by making left-truncated and right-truncated percentages equal by setting \(\mu _u =\eta _1 -\Delta \), \(\mu _v =\eta _1 +\Delta \). Then,

$$\begin{aligned} P(U\le Y\le V)=\int \limits _{-\infty }^{\tau _2 } {\{1-\Phi (y-\eta _1 -\Delta )\}\Phi (y-\eta _1 +\Delta )} f_\eta (y)\, dy. \end{aligned}$$

The desired value \(\Delta >0\) is chosen numerically. For instance, for fixed \(\eta _1 =5\), \(\eta _2 =-0.5\), \(\eta _3 =0.005\) and \(\tau _2 =8\), the value of \(\Delta \) is 1.01 (Fig. 9). Hence, we choose \(\mu _u =3.99\) and \(\mu _v =6.01\). Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5035228\).

Fig. 9
figure 9

An example for how to choose the value \(\Delta \) under the cubic SEF

The other case \( \eta _3 <0\) is similar. Under \(\eta _1 =5\), \(\eta _2 =-0.5\), \(\eta _3 =-0.005\), and \(\tau _1 =2\), the desired value is \(\Delta =0.91\) (see Fig. 9). Hence, we choose \(\mu _u =4.09\) and \(\mu _v =5.91\). Accordingly, the inclusion probability becomes \(P(U\le Y\le V)=0.5027334\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, YH., Emura, T. Maximum likelihood estimation for a special exponential family under random double-truncation. Comput Stat 30, 1199–1229 (2015). https://doi.org/10.1007/s00180-015-0564-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-015-0564-z

Keywords

Navigation