Penalized kernel quantile regression for varying coefficient models

https://doi.org/10.1016/j.jspi.2021.07.003Get rights and content

Highlights

  • Propose a method that identifies the partially linear structure of the varying coefficient model.

  • Develop an efficient algorithm using the proximal alternating direction method of multipliers with a convergence guarantee.

  • Derive the novel plug-in bandwidth selection using high-dimensional kernel theory.

Abstract

In nonparametric models, numerous penalization methods using a nonparametric series estimator have been developed for model selection and estimation. However, a penalization has been poorly understood combined with kernel smoothing. This can be attributed to the intrinsic technical and computational difficulties, which leads to different treatments from the developments of the penalized series estimators. Kernel smoothing is a popular and useful nonparametric estimation method; thus it is desirable to establish theoretical and computational analyses for penalized kernel smoothing. In this paper, we develop a novel penalized kernel quantile regression that has nice theoretical and computational properties in varying coefficient models. We show that the proposed method consistently identifies the partially linear structure of the varying coefficient model even when the number of covariates is allowed to increase with a sample size. We develop an efficient algorithm using the alternating direction method of multipliers with a computational convergence guarantee. We derive the plug-in bandwidth selection using high-dimensional kernel regression theory and the penalty parameter is selected by the proposed Bayesian Information Criterion. Our developments require novel high-dimensional kernel regression and computational analyses. The various simulations and real data analyses conducted in this study demonstrate the effectiveness and numerically verify the proposed method.

Introduction

As a useful nonparametric modeling method for exploring dynamic changes in data, varying coefficient regression has been used in various applications. There are extensive references in literature, for example, the studies of Hastie and Tibshirani (1993), Chen and Tsay (1993), Fan and Zhang (1999), Cai et al. (2000b), Fan and Zhang (2000), Park et al. (2017), and Park and He (2017), where various settings, including longitudinal and time series data, and extensions to generalized and quantile regression models were considered. An overview of the literature is presented in the following references: Fan and Zhang (2008) and Park et al. (2015). One popular estimation approach for varying coefficient models is kernel smoothing, which facilitates the use of intuition and insight directly from linear regression. In particular, it is well developed in the context of inferring smooth nonparametric coefficient changes (e.g., Fan and Zhang, 2000, Cai et al., 2000a, Fan and Huang, 2005, Xia et al., 2004, Wang and Zhu, 2009, Zhou and Liang, 2009, Lee et al., 2012b, Lee et al., 2012a, Chen and Hong, 2012, Cheng et al., 2013, Wu and Zhou, 2017, Lee et al., 2018).

Utilizing penalization functions, such as the least absolute shrinkage and selection operator (LASSO, Tibshirani, 1996) and smoothly clipped absolute deviation (SCAD, Fan and Li, 2001) have become effective tools for variable selection in the regression analysis. For an overview of variable selection via penalizations, see Fan and Lv (2010), Lee et al. (2019), and the references therein. In recent decades, the success of the penalization methods has been well investigated in linear regression, and the extensions to semiparametric and nonparametric models have recently become the focal point of studies. There is considerable literature on the estimation and variable selection for semiparametric and nonparametric models. Among these, numerous penalized sieve estimations have been well established (e.g., Wang et al., 2008, Ravikumar et al., 2009, Meier et al., 2009, Huang et al., 2010, Raskutti et al., 2012, Wei et al., 2011, Xue and Qu, 2012, Cheng et al., 2014, Noh and Lee, 2014, Klopp and Pensky, 2015, Honda et al., 2019).

In contrast, there is a paucity of literature on penalized kernel smoothing: for finite dimensional models, for example, Wang and Xia (2009a), Kai et al. (2011), Hu and Xia (2012), and Wang and Kulasekera (2012); and for high-dimensional models, only Lee and Mammen (2016) was found, to the best of our knowledge. The limited development in penalized kernel smoothing might be due to computational challenges in localization in kernel smoothing as also discussed by Lee and Mammen (2016), as well as the inherent methodological and technical issues involved. Specifically, in penalized kernel smoothing, the dimension of the optimization problem can be large due to numerous grids selected over the interval [0,1], even when the number of covariates is moderate. Therefore, it is very desirable to provide a computationally efficient penalized algorithm working in combination with kernel smoothing; in particular, for high-dimensional settings.

In this paper, we develop penalized kernel smoothing techniques that work theoretically and computationally for varying coefficient (VC) quantile models with a fixed or diverging number of variables. In VC quantile regression, some coefficient functions can be zero or invariant, i.e., a nonzero constant function. In this case the direct application of the VC model results in a loss of estimation efficiency for such invariant coefficients. It is an important statistical task to identify the underlying semi-parametric model structure for more accurate estimation. See Section 2.1 for a review on VC quantile model.

A main contribution of this paper is to develop a novel penalized local linear quantile regression that can simultaneously identify zeros, invariant coefficients, and varying coefficient functions for joint structural identification and selection. Notably, the proposed penalization scheme has similar complexities as the other penalized methods for nonparametric models. Note that every nonparametric penalization, including the one proposed in this study, has two tuning parameters, i.e., the penalty parameter and the additional nonparametric tuning parameter; specifically, the bandwidth for kernel methods and the number of basis for sieve methods. In this work, we develop a tuning parameter selection in a computationally efficient manner. Specifically, we derive a plug-in bandwidth selector for the proposed method from high-dimensional kernel theory. We also propose Bayesian Information Criterion for selection of the penalty parameter. Therefore, our algorithm requires only one tuning parameter λ without sacrificing the computational costs. For the statistical properties, we demonstrate that the proposed method identifies the underlying partially linear VC model with probability tending to one, which asymptotically provides the same estimators for the nonzero coefficients as the estimator using the knowledge from the true model, referred to as “oracle regularized estimator”.

For this asymptotic oracle property, we consider that the number of covariates p is allowed to increase with n, i.e., p approaches infinity, which requires different technical approaches. To the best of our knowledge, there exists no penalized kernel quantile regression for such simultaneous model identification and estimation even for fixed dimension. Further, the proposed theoretical development is novel in that it uses a technical argument that is different from other traditional kernel approaches, such as the sub-gradient from the convex optimization literature. Specifically, the high-dimensional kernel quantile regression theory in this paper is among the first and it requires a novel empirical process analysis on a space of function tuples of increasing dimension—refer to the Supplementary materials.

Another main contribution of this paper is to develop a new efficient computation method to implement the proposed method. We develop the penalized kernel smoothing algorithm using the alternating direction method of multipliers (ADMM, Boyd et al., 2011) algorithm with computational convergence guarantee—refer to Theorem 4. The ADMM approaches have been popularly used in efficient computations of numerous penalized methods. To the best of our knowledge, computation using ADMM with convergence guarantee is yet to be considered in the area of penalized kernel smoothing. Each step in the proposed ADMM has a closed-form expression and can be efficiently computed via a parallel implementation. This leads to a significant reduction in the computational time—refer to Section 5.3 in the Supplementary Material, which is desirable, especially for large dimensions. The proposed ADMM idea can be also easily extended to other kernel settings (e.g., Wang and Xia, 2009a, Kai et al., 2011, Zhang and Wu, 2014, Wu and Zhou, 2017, Park et al., 2021).

The remainder of this paper is organized as follows. In Section 2, we introduce the VC quantile regression model and propose penalized kernel smoothing for simultaneous structural identification and estimation in the VC quantile model. In Section 3, we highlight the theoretical properties of the proposed method. In Section 4, we present details on the proposed ADMM algorithm with a computational convergence property and on the tuning parameter selection. In Section 5, we present the simulation studies to evaluate the finite sample properties of the proposed method. In Section 6, we also apply the proposed method to real data sets to illustrate its usefulness. Section 7 concludes the paper. Technical proofs and some additional tables are presented in the Supplementary material.

Section snippets

VC quantile model

Given any quantile level τ (0<τ<1), consider the τ-th VC quantile regression Y=j=1pgj,τ(U)X(j)+ε,where U[0,1] is an index variable, gτ(u)=(g1,τ(u),,gp,τ(u)) is a vector of coefficient functions, and the conditional τ-th quantile of a random error ε given (U,{X(j)}1jp) is zero. The data {(Yi,Xi,Ui):1in} are independent and identically distributed copies of (Y,X,U) from the VC quantile model (1). Hereafter, we omit the dependence on τ in gj,τ(u),1jp and gτ for notational simplicity

Theoretical properties

Herein, we discuss theoretical properties of the proposed method when the number of covariates p is allowed to increase with n. First, we establish the asymptotic properties of the oracle regularized estimator that can be obtained when the underlying semiparametric partially linear VC model is known. Next, we show that the proposed estimator exhibits the oracle properties, that is, it has model selection consistency and it is asymptotically equivalent to the oracle regularized estimator. All

Numerical implementation

In this section, we include numerical details of the proposed method. In Sections 4.1 ADMM algorithm, 4.2 Convergence of the algorithm, we provide a computational algorithm with a computational convergence property. In Section 4.3, we propose a detailed strategy to choose tuning parameters for the proposed method. For efficient computation, we reformulate the proposed penalized kernel smoothing in order to apply the ADMM algorithm (Boyd et al., 2011), where each step is updated via parallel

Numerical studies

To investigate the finite sample performances of the proposed method, we conducted various simulations. In Sections 5.1 Simulation 1, 5.2 Simulation 2, we consider the homoscedastic and heteroscedastic settings, respectively. Throughout the simulations, we demonstrate that it is advantageous to select constant coefficients via the second penalty term of the proposed method.

For comparison, we consider three methods: the proposed simultaneous penalized quantile regression (SPQR), kernel quantile

Real data applications

In this section, we applied the proposed SPQR to two different real data sets to illustrate the usefulness of SPQR.

Summary and discussions

In this paper, we developed a novel penalized kernel smoothing methodology for VC quantile regression that attains simultaneous structural identification of zeros and invariant coefficients in a single estimation step. We developed the asymptotic oracle properties of the proposed estimator even when p is allowed to increase with n, i.e., p approaches infinity, which require novel technical developments. For the numerical implementation of this method, we computed the penalized kernel smoothing

Acknowledgments

Eun Ryung Lee is supported by a National Research Foundation of Korea grant funded by the Korea government (MSIT) (No. NRF-2019R1F1A1062795). Jinwoo Cho was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2016R1C1B1011874). Seyoung Park is supported by a National Research Foundation of Korea grant funded by the Korea government (MSIP) (No. NRF-2019R1C1C1003805).

References (64)

  • BoydS. et al.

    Distributed optimization and statistical learning via the alternating direction method of multipliers

    Found. TrendsR Mach. Learn.

    (2011)
  • CaiZ. et al.

    Efficient estimation and inferences for varying-coefficient models

    J. Amer. Statist. Assoc.

    (2000)
  • CaiZ. et al.

    Functional-coefficient regression models for nonlinear time series

    J. Amer. Statist. Assoc.

    (2000)
  • CaiZ. et al.

    Nonparametric quantile estimations for dynamic smooth coefficient models

    J. Amer. Statist. Assoc.

    (2008)
  • ChenB. et al.

    Testing for smooth structural changes in time series models via nonparametric regression

    Econometrica

    (2012)
  • ChenR. et al.

    Functional-coefficient autoregressive models

    J. Amer. Statist. Assoc.

    (1993)
  • ChengM. et al.

    Local linear regression on manifolds and its geometric interpretation

    J. Amer. Statist. Assoc.

    (2013)
  • ChengM.Y. et al.

    Nonparametric independence screening and structural identification for ultra-high dimensional longitudinal data

    Ann. Statist.

    (2014)
  • FanJ. et al.

    Profile likelihood inferences on semiparametric varying-coefficient partially linear models

    Bernoulli

    (2005)
  • FanJ. et al.

    Variable selection via nonconcave penalized likelihood and its oracle properties

    J. Amer. Statist. Assoc.

    (2001)
  • FanJ. et al.

    A selective overview of variable selection in high dimensional feature space

    Statist. Sinica

    (2010)
  • FanJ. et al.

    Statistical estimation in varying coefficient models

    Ann. Statist.

    (1999)
  • FanJ. et al.

    Functional linear models for longitudinal data

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2000)
  • FanJ. et al.

    Statistical methods with varying coefficient models

    Stat. Interface

    (2008)
  • GuY. et al.

    ADMM for high-dimensional sparse penalized quantile regression

    Technometrics

    (2018)
  • HastieT. et al.

    Varying-coefficient models

    J. R. Stat. Soc.

    (1993)
  • HondaT. et al.

    Adaptively weighted group lasso for semiparametric quantile regression models

    Bernoulli

    (2019)
  • HuT. et al.

    Adaptive semi-varying coefficient model selection

    Statist. Sinica

    (2012)
  • HuangJ. et al.

    Variable selection in nonparametric additive models

    Ann. Statist.

    (2010)
  • KaiB. et al.

    New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models

    Ann. Statist.

    (2011)
  • KimM.O.

    Quantile regression with varying coefficients

    Ann. Statist.

    (2007)
  • KloppO. et al.

    Sparse high-dimensional varying coefficient model:non-asymptotic minimax study

    Ann. Statist.

    (2015)
  • Cited by (0)

    View full text