research-article

Isolation-Based Anomaly Detection

Authors:
Fei Tony Liu

Monash University

Monash University
View Profile

,
Kai Ming Ting

Monash University

Monash University
View Profile

,
Zhi-Hua Zhou

Nanjing University

Nanjing University
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 6 Issue 1Article No.: 3pp 1–39https://doi.org/10.1145/2133360.2133363

Published:01 March 2012Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.

As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. iForest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample.

References

Abe, N., Zadrozny, B., and Langford, J. 2006. Outlier detection by active learning. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 504--509. Google ScholarDigital Library
Angiulli, F. and Fassetti, F. 2009. Dolphin: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans. Knowl. Discov. Data 3, 1, 1--57. Google ScholarDigital Library
Asuncion, A. and Newman, D. 2007. UCI machine learning repository. http://mlearn.ics.ucl.edu/MLRepository.html.Google Scholar
Bay, S. D. and Schwabacher, M. 2003. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 29--38. Google ScholarDigital Library
Breiman, L. 2001. Random forests. Mach. Learn. 45, 1, 5--32. Google ScholarDigital Library
Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. LOF: Identifying density-based local outliers. ACM SIGMOD Record 29, 2, 93--104. Google ScholarDigital Library
Caputo, B., Sim, K., Furesjo, F., and Smola, A. 2002. Appearance-based object recognition using SVMs: Which kernel should I use? In Proceedings of the NIPS Workshop on Statitsical Methods for Computational Experiments in Visual Processing and Computer Vision.Google Scholar
Chandola, V., Banerjee, A., and Kumar, V. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3, 1--58. Google ScholarDigital Library
Chaudhary, A., Szalay, A. S., Szalay, E. S., and Moore, A. W. 2002. Very fast outlier detection in large multidimensional datasets. In Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. ACM.Google Scholar
Fan, H., Zaïane, O. R., Foss, A., and Wu, J. 2006. A nonparametric outlier detection for effectively discovering top-n outliers from engineering data. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Lecture Notes in Computer Science, vol. 3918, Springer, 557--566. Google ScholarDigital Library
Ghoting, A., Otey, M. E., and Parthasarathy, S. 2004. Loaded: Link-based outlier and anomaly detection in evolving datasets. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). IEEE Computer Society Press, 387--390. Google ScholarDigital Library
Hand, D. J. and Till, R. J. 2001. A simple generalisation of the area under the roc curve for multiple class classification problems. Mach. Learn. 45, 2, 171--186. Google ScholarDigital Library
He, Z., Xu, X., and Deng, S. 2003. Discovering cluster-based local outliers. Patt. Recog. Lett. 24, 9--10, 1641--1650. Google ScholarDigital Library
He, Z., Deng, S., and Xu, X. 2005. A unified subspace outlier ensemble framework for outlier detection. In Proceedings of the International Conference on Web-Age Information Management (WAIM). W. Fan, Z. Wu, and J. Yang Eds., Lecture Notes in Computer Science, vol. 3739, Springer, 632--637. Google ScholarDigital Library
Joanes, D. N. and Gill, C. A. 1998. Comparing measures of sample skewness and kurtosis. J. Roy. Stat. Soc. (Series D): Statistician 47, 1, 183--189.Google ScholarCross Ref
Knorr, E. M. and Ng, R. T. 1998. Algorithms for mining distance-based outliers in large datasets. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB’98). Morgan Kaufmann, 392--403. Google ScholarDigital Library
Knorr, E. M., Ng, R. T., and Tucakov, V. 2000. Distance-based outliers: Algorithms and applications. VLDB J. 8, 3--4, 237--253. Google ScholarDigital Library
Knuth, D. E. 2006. Art of Computer Programming, Volume 4, Fascicle 4: Generating All Trees. Addison-Wesley Professional. Google ScholarDigital Library
Lazarevic, A. and Kumar, V. 2005. Feature bagging for outlier detection. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, 157--166. Google ScholarDigital Library
Liu, F. T., Ting, K. M., and Zhou, Z.-H. 2008a. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). IEEE Computer Society Press, 413--422. Google ScholarDigital Library
Liu, F. T., Ting, K. M., and Zhou, Z.-H. 2008b. Spectrum of variable-random trees. J. Artif. Intel. Res. 32, 355--384. Google ScholarDigital Library
Liu, F. T., Ting, K. M., and Zhou, Z.-H. 2010a. Can isolation-based anomaly detectors handle arbitrary multi-modal patterns in data? Tech. rep. TR2010/2, Monash University.Google Scholar
Liu, F. T., Ting, K. M., and Zhou, Z.-H. 2010b. On detecting clustered anomalies using SCiForest. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’10). 274--290. Google ScholarDigital Library
Liu, R. Y., Parelius, J. M., and Singh, K. 1999. Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Stat. 27, 3, 783--840.Google ScholarCross Ref
Murphy, R. B. 1951. On tests for outlying observations. Ph.D. dissertation, Princeton University.Google Scholar
Papadimitriou, S., Kitagawa, H., Gibbons, P., and Faloutsos, C. 2003. Loci: Fast outlier detection using the local correlation integral. In Proceedings of the 19th International Conference on Data Engineering. 315--326.Google Scholar
Preiss, B. R. 1999. Data Structures and Algorithms with Object-Oriented Design Patterns in Java. Wiley. Google ScholarDigital Library
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning 1st Ed. (Series in Machine Learning), Morgan Kaufmann. Google ScholarDigital Library
Ramaswamy, S., Rastogi, R., and Shim, K. 2000. Efficient algorithms for mining outliers from large datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 427--438. Google ScholarDigital Library
Rocke, D. M. and Woodruff, D. L. 1996. Identification of outliers in multivariate data. J. Amer. Statist. Soc. 91, 435, 1047--1061.Google Scholar
Rousseeuw, P. J. and Leroy, A. M. 1987. Robust Regression and Outlier Detection. Wiley, New York. Google ScholarDigital Library
Ruskey, F. 1980. On the average shape of binary trees. SIAM J. Alg. Disc. Meth. 1, 1, 43--50.Google ScholarCross Ref
Schölkopf, B., Platt, J. C., Shawe-Taylor, J. C., Smola, A. J., and Williamson, R. C. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7, 1443--1471. Google ScholarDigital Library
Shi, T. and Horvath, S. 2006. Unsupervised learning with random forest predictors. J. Computat. Graph. Stat. 15, 1, 118--138.Google ScholarCross Ref
Sing, T., Sander, O., Beerenwinkel, N., and Lengauer, T. 2005. ROCR: Visualizing classifier performance in r. Bioinformatics 21, 20, 3940--3941. Google ScholarDigital Library
Song, X., Wu, M., Jermaine, C., and Ranka, S. 2007. Conditional anomaly detection. IEEE Trans. Knowl. Data Eng. 19, 5, 631--645. Google ScholarDigital Library
Tan, P.-N., Steinbach, M., and Kumar, V. 2005. Introduction to Data Mining. Addison-Wesley. Google ScholarDigital Library
Tang, J., Chen, Z., Fu, A. W.-C., and Cheung, D. W.-L. 2002. Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD). Springer-Verlag, 535--548. Google ScholarDigital Library
Tao, Y., Xiao, X., and Zhou, S. 2006. Mining distance-based outliers from large databases in any metric space. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 394--403. Google ScholarDigital Library
Tax, D. M. J. and Duin, R. P. W. 2004. Support vector data description. Mach. Learn. 54, 1, 45--66. Google ScholarDigital Library
Tukey, J. W. 1977. Exploratory Data Analysis. Addison-Wesley.Google Scholar
Williams, G., Baxter, R., He, H., Hawkins, S., and Gu, L. 2002. A comparative study of rnn for outlier detection in data mining. In Proceedings of the IEEE International Conference on Data Mining (ICDM’02). IEEE Computer Society Press, 709--712. Google ScholarDigital Library
Wu, M. and Jermaine, C. 2006. Outlier detection by sampling with accuracy guarantees. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 767--772. Google ScholarDigital Library
Yamanishi, K., Takeuchi, J.-I., Williams, G., and Milne, P. 2000. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 320--324. Google ScholarDigital Library
Yu, X., Tang, L. A., and Han, J. 2009. Filtering and refinement: A two-stage approach for efficient and effective anomaly detection. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM). IEEE Computer Society Press, 617--626. Google ScholarDigital Library

Index Terms

Isolation-Based Anomaly Detection
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Fuzzy Isolation Forest for Anomaly Detection
Abstract
Anomaly detection is nowadays a key data mining task. Anomaly detection methods generally look for patterns of ”normal” profile and then identify data points that do not match that profile. One outstanding method, Isolation Forest, showed high ...
Read More
Sparse random projection isolation forest for outlier detection
Highlights
- We analyzed the isolation-forest-based methods’ problem of lacking efficacy in selecting suitable hyperplanes to split data.
Graphical abstract

Display Omitted

Abstract
Isolation Forest has a low computational complexity, hence has been widely applied to detect outliers in large-scale data. However, it suffers from the artifacts caused by the hyperplanes chosen, thereby failing to detect outliers in ...
Read More
Isolation Forest
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 6, Issue 1
March 2012
137 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2133360
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2012
- Accepted: 1 February 2011
- Revised: 1 August 2010
- Received: 1 March 2010
Published in tkdd Volume 6, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Anomaly detection
binary tree
ensemble methods
isolation
isolation forest
outlier detection
random tree ensemble
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,123
  Total Citations
  View Citations
- 7,549
  Total Downloads
- Downloads (Last 12 months)1,127
- Downloads (Last 6 weeks)157
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Fuzzy Isolation Forest for Anomaly Detection

Sparse random projection isolation forest for outlier detection

Isolation Forest

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Isolation-Based Anomaly Detection

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Fuzzy Isolation Forest for Anomaly Detection

Sparse random projection isolation forest for outlier detection

Isolation Forest

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media