Article

Efficient kernel feature extraction for massive data sets

Authors:
Ivor W. Tsang

Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong

Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
View Profile

,
Andras Kocsor

University of Szeged, Hungary

University of Szeged, Hungary
View Profile

,
James T. Kwok

Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong

Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
View Profile

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006Pages 724–729https://doi.org/10.1145/1150402.1150494

Published:20 August 2006Publication History

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 724–729

ABSTRACT

Maximum margin discriminant analysis (MMDA) was proposed that uses the margin idea for feature extraction. It often outperforms traditional methods like kernel principal component analysis (KPCA) and kernel Fisher discriminant analysis (KFD). However, as in other kernel methods, its time complexity is cubic in the number of training points m, and is thus computationally inefficient on massive data sets. In this paper, we propose an (1+ε)²-approximation algorithm for obtaining the MMDA features by extending the core vector machines. The resultant time complexity is only linear in m, while its space complexity is independent of m. Extensive comparisons with the original MMDA, KPCA, and KFD on a number of large data sets show that the proposed feature extractor can improve classification accuracy, and is also faster than these kernel-based methods by more than an order of magnitude.

References

M. Bǎdoiu and K. L. Clarkson. Optimal core-sets for balls. In DIMACS Workshop on Computational Geometry, 2002.]]Google Scholar
T. Friess, N. Cristianini, and C. Campbell. The kernel-adatron: a fast and simple learning procedure for support vector machines. In Proceeding of the Fifteenth International Conference on Machine Learning, pages 188--196, 1998.]] Google ScholarDigital Library
W. Kienzle and B. Schölkopf. Training support vector machines with multiple equality constraints. In Proceedings of the European Conference on Machine Learning, 2005.]] Google ScholarDigital Library
H.-C. Kim, S. Pang, H.-M. Je, D. Kim, and S. Bang. Constructing support vector machine ensemble. Pattern Recognition, 36(12):2757--2767, 2003.]]Google ScholarCross Ref
A. Kocsor, K. Kovács, and C. Szepesvári. Margin maximizing discriminant analysis. In Proceedings of the 15th European Conference on Machine Learning, pages 227--238, Pisa, Italy, Sept. 2004.]]Google ScholarDigital Library
O. Mangasarian and E. Wild. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1):69--74, 2006.]] Google ScholarDigital Library
S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.-R. Müller. Fisher discriminant analysis with kernels. In Y.-H. Hu, J. Larsen, E. Wilson, and S. Douglas, editors, Neural Networks for Signal Processing IX, pages 41--48, 1999.]]Google Scholar
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 185--208. MIT Press, Cambridge, MA, 1999.]] Google ScholarDigital Library
B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.]]Google Scholar
I. W. Tsang, J. T. Kwok, and P.-M. Cheung. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363--392, 2005.]] Google ScholarDigital Library
I. W. Tsang, J. T. Kwok, and K. T. Lai. Core vector regression for very large regression problems. In Proceedings of the Twentieth-Second International Conference on Machine Learning, pages 913--920, Bonn, Germany, Aug. 2005.]] Google ScholarDigital Library

Index Terms

Efficient kernel feature extraction for massive data sets
1. Computing methodologies
  1. Machine learning

Recommendations

A novel local sensitive frontier analysis for feature extraction
ICIC'09: Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications

In this paper, an efficient feature extraction method, named local sensitive frontier analysis (LSFA), is proposed. LSFA tries to find instances near the crossing of the multi-manifold, which are sensitive to classification, to form the frontier ...
Read More
Constrained discriminant neighborhood embedding for high dimensional data feature extraction

When handling pattern classification problem such as face recognition and digital handwriting identification, image data is always represented to high dimensional vectors, from which discriminant features are extracted using dimensionality reduction ...
Read More
Efficient and robust feature extraction by maximum margin criterion

In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discriminatory information. Principal component analysis (PCA) and linear discriminant analysis (LDA) are the two most ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SVM
extraction
kernel feature
scalability
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 648
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient kernel feature extraction for massive data sets

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel local sensitive frontier analysis for feature extraction

Constrained discriminant neighborhood embedding for high dimensional data feature extraction

Efficient and robust feature extraction by maximum margin criterion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient kernel feature extraction for massive data sets

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel local sensitive frontier analysis for feature extraction

Constrained discriminant neighborhood embedding for high dimensional data feature extraction

Efficient and robust feature extraction by maximum margin criterion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media