Abstract
This paper presents a model of 3D object recognition motivated from the robust properties of human vision system (HVS). The HVS shows the best efficiency and robustness for an object identification task. The robust properties of the HVS are visual attention, contrast mechanism, feature binding, multi-resolution, size tuning, and part-based representation. In addition, bottom-up and top-down information are combined cooperatively. Based on these facts, a plausible computational model integrating these facts under the Monte Carlo optimization technique was proposed. In this scheme, object recognition is regarded as a parameter optimization problem. The bottom-up process is used to initialize parameters in a discriminative way; the top-down process is used to optimize them in a generative way. Experimental results show that the proposed recognition model is feasible for 3D object identification and pose estimation in visible and infrared band images.
Similar content being viewed by others
References
Lowe DG (1987) Three-dimensional object recognition from single two-dimensional images. Artif Intell 31(3):355–395
Faugeras OD, Hebert M (1986) The representation recognition, and locating of 3-D objects. Int J Robotics Res 5(3):27–52
Mundy J, Zisserman A (1992) Geometric invariance in computer vision. MIT, Cambridge, MA, pp 335–460
Rothwell CA (1993) Recognition using projective invariance, Ph.D Thesis, Oxford
Murase H, Nayar S (1995) Visual learning and recognition of 3-D objects from appearance. Int JComput Vis 14:5–24
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int JComput Vis 60(2):91–110
Rothganger F, Lazebnik S, Schmid C, Ponce J (2004) Segmenting, modeling, and matching video clips containing multiple moving objects. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, June, pp 914–921
Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, June, pp 264–271
Peters G (2000) Theories of three-dimensional object perception—a survey. In recent research developments in pattern recognition, transworld research network, Part-I, vol 1, pp 179–197
Nichols MJ, Newsome WT (1999) The neurobiology of cognition. Nature 402(2):C35–C38
Siegel M, Kording KP, Konig P (2000) Integrating top-down and bottom-up sensory processing by somato-dendritic interactions. J Comput Neurosci 8:161–173
Bar M (2004) Visual objects in context. Nat Rev: Neurosci 5:617–629
Treisman A (1998) Feature binding, attention and object perception. Philos Trans: Biol Sci 29 353(1373):1295–1306
VanRullen R (2003) Visual saliency and spike timing in the ventral visual pathway. J Physiol (Paris) 97:365–377
Fiser J, Subramaniam S, Biederman I (2001) Size Tuning in the absence of spatial frequency tuning in object recognition. Vis Res 41(15):1931–1950
Biederman I (1987) Recognition by components: a theory of human image understanding. Psychol Rev 94(2):115–147
Pasupathy A, Connor CE (2001) Shape representation in area V4: position-specific tuning for boundary conformation. J Neurophysiol 86(5):2505–2519
Kuno Y, Ikeuchi K, Kanade T (1988) Model-based vision by cooperative processing of evidence and hypotheses using configuration spaces. SPIE Digital Opt Shape Representation Pattern Recognit 938:444–453
Zhu SC, Zhang R, Tu Z (2000) Integrating bottom-up/top-down for object recognition by data driven markov chain Monte Carlo. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, SC, June, pp 738–745
Milanese R, Wechsler H, Gil S (1994) Integration of bottom-up and top-down for visual attention using non-linear relaxation. Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, June, pp 781–785
Kumar VP (2002) Towards trainable man-machine interfaces: combining top-down constraints with bottom-up learning in facial analysis. Ph.D Thesis, MIT
Serre T, Riesenhuber M (2004) Realistic modeling of simple and complex cell tuning in the HMX model, and implications for invariant object recognition in cortex. AIM, MIT
Tu Z, Chen X, Yuille A, Zhu SC (2005) Image parsing: unifying segmentation, detection, and object recognition (Marr Prize Issue, a short version appeared in ICCV 2003). Int J Comput Vis 63(2):113–140
Borgelt C, Kruse Z (2001) Graphical models: methods for data analysis and mining. Wiley, New York, pp 1–12
Green P (1996) Reversible jump markov chain Monte Carlo computation and bayesian Model Determination. Champman and Hall, London
Doucet A, Freitas ND, Gordon N (2001) Sequential Monte Carlo methods in practice. Springer, New York, pp 432–444, 3–13
Ristic B, Arulampalam S, Gordon N (2004) Beyond the Kalman filter: particle filters for tracking applications. Artech House, London, pp 35–62
Robert CP, Casella G (1999) Monte Carlo statistical methods. Springer, Berlin Heidelberg New York
Edelman S, Bülthoff H (1992) Orientation dependence in the recognition of familiar and novel views of 3D objects. Vis Res 32:2385–2400
Lindeberg T (1998) Feature detection with automatic scale selection. Int JComput Vis 30(2):77–116
Kim S, Kweon IS (2005) Automatic model-based 3D object recognition by combining feature matching with tracking. Machine Vis Appl DOI 10.1007/s00138-005-0194-9
Parkhurst D, Law K, Niebur E (2002) Modeling the role of salience in the allocation of overt visual attention. Vis Res 42:107–123
Feldman J, Singh M (2005) Information along contours and object boundaries. Psychol Rev 112(1):243–252
Reisfeld D, Wolfson H, Yeshurun Y (1995) Context-free attentional Operators: the generalized symmetry transform. Int J Comput Vis 14(2):119–130
Harris CJ, Stephens M (1988) A combined corner and edge detector. In Proceedings of 4th Alvey Vision Conference, Manchester, pp 147–151
Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest point detectors. Int J Comput Vis 37(2):151–172
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Machine Intell 8(6):679–698
Mikolajczyk K, Schmid C (2003) A performance evaluation of local descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, pp 774–781
Desolneux A, Moisan L, Morel JM (2004) Gestalt theory and computer vision. In: Carsetti A (ed) Seeing, thinking and knowing. Kluwer Academic, New York, pp 71–101
Colorado State University Computer Vision Group, Fort Carson RSTA Data Collection, http://www.cs.colostate.edu/∼vision/ft_carson/
Acknowledgements
This research has been supported by the Korean Ministry of Science and Technology for National Research Laboratory Program (Grant number M1-0302-00-0064), Korea.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, S., Jang, G. & Kweon, I.S. An effective 3D target recognition model imitating robust methods of the human visual system. Pattern Anal Applic 8, 211–226 (2005). https://doi.org/10.1007/s10044-005-0001-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-005-0001-y