Skip to main content
Log in

A second-order cone programming formulation for twin support vector machines

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Second-order cone programming (SOCP) formulations have received increasing attention as robust optimization schemes for Support Vector Machine (SVM) classification. These formulations study the worst-case setting for class-conditional densities, leading to potentially more effective classifiers in terms of performance compared to the standard SVM formulation. In this work we propose an SOCP extension for Twin SVM, a recently developed classification approach that constructs two nonparallel classifiers. The linear and kernel-based SOCP formulations for Twin SVM are derived, while the duality analysis provides interesting geometrical properties of the proposed method. Experiments on benchmark datasets demonstrate the virtues of our approach in terms of classification performance compared to alternative SVM methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Alizadeh F, Goldfarb D (2003) Second-order cone programming. Math Program 95:3–51

    Article  MathSciNet  MATH  Google Scholar 

  2. Alvarez F, López J, Ramírez C H (2010) Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines. Optimization Methods Software 25(6):859–881

    Article  MathSciNet  MATH  Google Scholar 

  3. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  4. Bai L, Wang Z, Shao Y H, Deng N Y (2014) A novel feature selection method for twin support vector machine. Knowl-Based Syst 59(0):1–8

    Article  Google Scholar 

  5. Bennett K, Bredensteiner E (2000) Duality and geometry in svm classifiers. In: In Proc. 17th International Conf. on Machine Learning, Morgan Kaufmann, pp 57–64

  6. Bosch P, López J, Ramírez H, Robotham H (2013) Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in chile. Expert Syst Appl 40(10):4029–4034

    Article  Google Scholar 

  7. Chang C C, Lin C J (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:1–27. software available at, http://www.csie.ntu.edu.tw/cjlin/libsvm

    Article  Google Scholar 

  8. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  9. De Maesschalck R, Jouan-Rimbaud D, Massart D (2000) The mahalanobis distance. Chemom Intell Lab Syst 50:1–18

    Article  Google Scholar 

  10. Debnath R, Muramatsu M, Takahashi H (2005) An efficient support vector machine learning method with second-order cone programming for large-scale problems. Appl Intell 23:219–239

    Article  MATH  Google Scholar 

  11. Geng X, Zhan D C, Zhou Z H (2005) Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern B Cybern 35(6):1098–1107

    Article  Google Scholar 

  12. Goldfarb D, Iyengar G (2003) Robust convex quadratically constrained programs. Math Program 97 (3):495–515

    Article  MathSciNet  MATH  Google Scholar 

  13. Horn R A, Johnson C R (1990) Matrix Analysis, 1st edn. Cambridge University Press, New York

    Google Scholar 

  14. Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910

    Article  MATH  Google Scholar 

  15. Lanckriet G, Ghaoui L, Bhattacharyya C, Jordan M (2003) A robust minimax approach to classification. J Mach Learn Res 3:555–582

    MathSciNet  MATH  Google Scholar 

  16. Li C, Liu K, Wang H (2011) The incremental learning algorithm with support vector machine based on hyperplane-distance. Appl Intell 34:19–27

    Article  Google Scholar 

  17. Lobo M, Vandenberghe L, Boyd S, Lebret H (1998) Applications of second-order cone programming. Linear Algebra Appl 284:193–228

    Article  MathSciNet  MATH  Google Scholar 

  18. López J, Maldonado S, Carrasco M (2015) A novel multi-class svm model using second-order cone constraints. Applied Intelligence In Press

  19. Maldonado S, López J (2014a) Alternative second-order cone programming formulations for support vector classification. Inform Sci 268:328–341

  20. Maldonado S, López J (2014b) Imbalanced data classification using second-order cone programming support vector machines. Pattern Recogn 47:2070–2079

  21. Maldonado S, Famili F, Weber R (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inform Sci 286:228–246

    Article  Google Scholar 

  22. Mangasarian O L (1994) Nonlinear Programming. Classics in Applied Mathematics, Society for Industrial and Applied Mathematics

  23. Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond A 209:415–446

    Article  MATH  Google Scholar 

  24. Nath S, Bhattacharyya C (2007) Maximum margin classifiers with specified false positive and false negative error rates. In: Proceedings of the SIAM International Conference on Data mining

  25. Peng X (2011) Building sparse twin support vector machine classifiers in primal space. Inform Sci 181 (18):3967–3980

  26. Qi Z, Tian Y, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recogn 46(1):305–316

    Article  MATH  Google Scholar 

  27. Schölkopf B, Smola A J (2002) Learning with Kernels. MIT Press

  28. Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–68

    Article  Google Scholar 

  29. Shao Y H, Chen W J, Deng N Y (2014) Nonparallel hyperplane support vector machine for binary classification problems. Inform Sci 263(1):22–35

    Article  MathSciNet  MATH  Google Scholar 

  30. Sokolova M, Japkowicz N (2006) Beyond accuracy, f-score and roc: A family of discriminant measures for performance evaluation. In: Advances in Artificial Intelligence. Springer, Berlin Heidelberg, pp 1015–1021

  31. Sturm J (1999) Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optimization Methods and Software 11(12):625–653. special issue on Interior Point Methods (CD supplement with software)

  32. Trafalis T B, Alwazzi S A (2010) Support vector machine classification with noisy data: a second order cone programming approach. Int J Gen Syst 39(7):757–781

  33. Xie J, Hone K, Xie W, Gao X, Shi Y, Liu X (2013) Extending twin support vector machine classifier for multi-category classification problems. Intelligent Data Analysis 17(4):649–664

    Google Scholar 

  34. Zhong P, Fukushima M (2007) Second-order cone programming formulations for robust multiclass classification. Neural Comput 19:258–282

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The first author was supported by FONDECYT project 1140831, the second was funded by CONICYT Anillo ACT1106, and third author was supported by FONDECYT project 1130905. Support from the Chilean “Instituto Sistemas Complejos de Ingeniería” (ICM: P-05-004-F, CONICYT: FB016, www.sistemasdeingenieria.cl) is greatly acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastián Maldonado.

Appendix: A Dual formulation for twin SOCP-SVM

Appendix: A Dual formulation for twin SOCP-SVM

1.1 A.1 Proof of Theorem 1

Proof

The Lagrangian function associated with Problem (19) is given by

$$\begin{array}{@{}rcl@{}} L(\textbf{w}_{1},b_{1},\lambda)&=&\frac{1}{2}\left\| A\textbf{w}_{1}+\textbf{e}_{1}b_{1} \right\|^{2}+\frac{\theta_{1}}{2}\left( \|\textbf{w}_{1}\|^{2}+{b_{1}^{2}}\right)\\ & &+\lambda(\textbf{w}_{1}^{\top} {\boldsymbol{\mu}}_{2}+b_{1}+1+\kappa_{2}\|S_{2}^{\top}\textbf{w}_{1}\|), \end{array} $$

where λ≥0. Since ∥v∥= max∥u∥≤1u v holds for any vR n, we can rewrite the Lagrangian as follows:

$$L(\textbf{w}_{1},b_{1},\lambda)=\max_{\textbf{u}}\{L_{1}(\textbf{w}_{1},b_{1},\lambda,\textbf{u}):\|\textbf{u}\|\le 1\}, $$

with L 1 given by

$$\begin{array}{@{}rcl@{}} L_{1}(\textbf{w}_{1},b_{1},\lambda,\textbf{u})&=&\frac{1}{2}\left\| A\textbf{w}_{1}+\textbf{e}_{1}b_{1} \right\|^{2}+\frac{\theta_{1}}{2}\left( \|\textbf{w}_{1}\|^{2}+{b_{1}^{2}}\right)\\ &&+\lambda(\textbf{w}_{1}^{\top} {\boldsymbol{\mu}}_{2}+b_{1}+1\,+\,\kappa_{2}\textbf{w}_{1}^{\top} S_{2}\textbf{u}). \end{array} $$
(A.1)

Thus, problem (19) can be written equivalently as:

$$\min_{\textbf{w}_{1},b_{1} }\max_{\textbf{u},\lambda}\{L_{1}(\textbf{w}_{1},b_{1},\lambda,\textbf{u}):\|\textbf{u}\|\le 1,\lambda\ge0\}. $$

Hence, the dual problem of (19) is given by:

$$ \max_{\textbf{u},\lambda}\min_{\textbf{w}_{1},b_{1} }\{L_{1}(\textbf{w}_{1},b_{1},\lambda,\textbf{u}):\|\textbf{u}\|\le 1,\lambda\ge0\}. $$
(A.2)

The above expression allows the construction of the dual formulation. The detailed description of this procedure can be found in [22]. The computation of the first order condition for the inner optimization task (the minimization problem) yields to

$$\begin{array}{@{}rcl@{}} \nabla_{\textbf{w}_{1}}L_{1}&=&A^{\top}(A\textbf{w}_{1}+\textbf{e}_{1}b_{1})+\theta_{1}\textbf{w}_{1}+\lambda({\boldsymbol{\mu}}_{2}+\kappa_{2} S_{2}\textbf{u})=0,\qquad \end{array} $$
(A.3)
$$\begin{array}{@{}rcl@{}} \nabla_{b_{1}}L_{1}&=&\textbf{e}_{1}^{\top}(A\textbf{w}_{1}+\textbf{e}_{1}b_{1})+\theta_{1}b_{1}+\lambda=0. \end{array} $$
(A.4)

Let us denote by \( \hat {\textbf {z}}=[\textbf {z}^{\top },1]^{\top }\in \Re ^{n+1}, \) with z = μ 2 + κ 2 S 2 uR n. Then the relations (A.3)–(A.4) can be written compactly as

$$(H^{\top} H+\theta_{1}I) \textbf{v}_{1}+\lambda\hat{\textbf{z}}=0. $$

Since the symmetric matrix (H H + 𝜃 1 I) is positive definite, for any 𝜃 1>0, one has

$$ \textbf{v}_{1}=-\lambda(H^{\top} H+\theta_{1}I)^{-1}\hat{\textbf{z}}. $$
(A.5)

On the other hand, by replacing (A.3)–(A.4) in (A.1) and using the relations (21) and (A.5), the dual problem can be stated as follows:

$$ \begin{array}{llll} &&\max_{\textbf{z},\textbf{u},\lambda } -\frac{1}{2}\lambda^{2}\hat{\textbf{z}}^{\top}(H^{\top} H+\theta_{1}I)^{-1}\hat{\textbf{z}}+\lambda \\ &&\text{s.t.}\, \textbf{z}= {\boldsymbol{\mu}}_{2}+\kappa_{2} S_{2}\textbf{u}, \ \|\textbf{u}\|\le1, \\ &&\, \lambda\ge0. \end{array} $$
(A.6)

Notice that the objective function of the dual problem (A.6) is concave with respect to λ, and it attains its maximum value at

$$ \lambda^{*}=\frac{1}{\hat{\textbf{z}}^{\top}(H^{\top} H+\theta_{1}I)^{-1}\hat{\textbf{z}}}, $$
(A.7)

with optimal value

$$\displaystyle\frac{1}{2}\frac{1}{\hat{\mathbf{z}}^{\top}(H^{\top} H+\theta_{1}I)^{-1}\hat{\textbf{z}}}.$$

Then, by using (A.7) the dual problem of (19) can be stated as follows:

$$ \begin{array}{lll} \min_{\textbf{z},\textbf{u} } & \ \frac{1}{2}\left( \textbf{z}^{\top} \ 1 \right) (H^{\top} H+\theta_{1}I)^{-1}\left( \begin{array}{c} \textbf{z}\\ 1 \end{array}\right) \\ \text{s.t.}\, &\, \textbf{z}\in \textbf{B}({\boldsymbol{\mu}}_{2},S_{2},\kappa_{2}), \end{array} $$
(A.8)

where

$$\textbf{B}({\boldsymbol{\mu}},S,\kappa)=\{\textbf{z}\in\Re^{n}:{\mathbf{z}}={\boldsymbol{\mu}}+\kappa S\textbf{u},\|\textbf{u}\|\le1\}. $$

Similarly, since the symmetric matrix (G G + 𝜃 2 I) is positive definite, for any 𝜃 2>0, one can prove that the dual of the problem (20) is given by:

$$ \begin{array}{lllll} \min_{\textbf{p},\textbf{u} } & \ \frac{1}{2}\left( \textbf{p}^{\top} \ 1 \right) (G^{\top} G+\theta_{2}I)^{-1}\left( \begin{array}{c} \textbf{p}\\ 1 \end{array}\right) \\ \text{s.t.}\, &\, \textbf{p}\in \textbf{B}({\boldsymbol{\mu}}_{1},S_{1},\kappa_{1}). \end{array} $$
(A.9)

1.2 A.2 Proof of Proposition 1

Proof

Let us denote the objective function of Problem (26) by

$$f(\textbf{z})= \frac{1}{2}\left( \textbf{z}^{\top} \ 1 \right) (H^{\top} H+\theta_{1}I)^{-1}\left( \begin{array}{c} \textbf{z}\\ 1 \end{array}\right). $$

Since the symmetric matrix \(H^{\top } H+\theta _{1}I=\left (\begin {array}{cc} A^{\top } A +\theta _{1}I& A^{\top } \textbf {e}_{1}\\ \textbf {e}_{1}^{\top } A& \textbf {e}_{1}^{\top } \textbf {e}_{1}+\theta _{1} \end {array} \right ) \) is positive definite for each 𝜃 1>0, where \(H=[A\ \textbf {e}_{1} ]\in \mathbb {R}^{m_{1}\times (n+1)}\) (cf. (23)), Theorem 7.7.6 of [13] implies that the matrix \(C_{s}(\theta _{1})=A^{\top } A+\theta _{1}I-\frac {1}{m_{1}+\theta _{1}}A^{\top } \textbf {e}_{1}\textbf {e}_{1}^{\top } A\) is invertible, and that

$$\begin{array}{@{}rcl@{}} (H^{\top} H+\theta_{1}I)^{-1}&=&\left( \begin{array}{cc} I& 0\\ -\frac{1}{m_{1}+\theta_{1}}\textbf{e}_{1}^{\top} A& 1 \end{array} \right)\!\left( \begin{array}{cc} C_{s}(\theta_{1})^{-1} & 0\\ 0& \frac{1}{m_{1}+\theta_{1}} \end{array} \right)\\&&\times\left( \begin{array}{cc} I& -\frac{1}{m_{1}+\theta_{1}}A^{\top}\textbf{e}_{1}\\ 0& 1 \end{array} \right). \end{array} $$
(A.10)

By using the first equality of (16), and making the product of the matrices we have that

$$\begin{array}{@{}rcl@{}} f(\textbf{z})&=&\frac{1}{2}\left( \left( \textbf{z}^{\top}-\frac{m_{1}}{m_{1}+\theta_{1}}\hat{\boldsymbol{\mu}}_{1}^{\top}\right)C_{s} (\theta_{1})^{-1}\right.\\&&\times\left.\left( \textbf{z}-\frac{m_{1}}{m_{1}+\theta_{1}}\hat{\boldsymbol{\mu}}_{1}\right)+\frac{1}{m_{1}+\theta_{1}}\right). \end{array} $$

Thus, (31) follows. Formulation (32) can be derived in a similar way.

Now, we suppose that 𝜃 1=0 and that H H is positive definite. Since

$$\begin{array}{@{}rcl@{}} C_{s}(0)&=&A^{\top} \left( I-\frac{1}{m_{1}} \textbf{e}_{1}\textbf{e}_{1}^{\top} \right)A=A^{\top} \left( I-\frac{1}{m_{1}} \textbf{e}_{1}\textbf{e}_{1}^{\top} \right)\\&&\times \left( I-\frac{1}{m_{1}} \textbf{e}_{1}\textbf{e}_{1}^{\top} \right)A=m_{1}S_{1}S_{1}^{\top}=m_{1}\hat{\Sigma}_{1}, \end{array} $$

where the last two equalities follow from (16) and (17), the expression (A.10) can be written as

$$(H^{\top} H)^{-1}=\frac{1}{m_{1}}\left( \begin{array}{cc} I& 0\\ -\hat{\boldsymbol{\mu}}_{1}^{\top}& 1 \end{array} \right)\left( \begin{array}{cc} \hat{\Sigma}_{1}^{-1}& 0\\ 0& 1 \end{array} \right)\left( \begin{array}{cc} I& -\hat{\boldsymbol{\mu}}_{1}\\ 0& 1 \end{array} \right). $$

Then,

$$\begin{array}{@{}rcl@{}} f(\textbf{z})&=&\frac{1}{2m_{1}}\left( (\textbf{z}^{\top}-\hat{\boldsymbol{\mu}}_{1}^{\top})\hat{\Sigma}_{1}^{-1}(\textbf{z}-\hat{\boldsymbol{\mu}}_{1})+1\right)\\&=& \frac{1}{2m_{1}}\left( \|\hat{\Sigma}_{1}^{-1/2}(\textbf{z}-\hat{\boldsymbol{\mu}}_{1})\|^{2}+1\right). \end{array} $$

Hence, (33) holds. Formulation (34) can be derived in a similar way. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maldonado, S., López, J. & Carrasco, M. A second-order cone programming formulation for twin support vector machines. Appl Intell 45, 265–276 (2016). https://doi.org/10.1007/s10489-016-0764-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0764-4

Keywords

Navigation