ABSTRACT
This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.
- M. Boucart, J.-F. Dinon, P. Despretz, T. Desmettre, K. Hladiuk, and A. Oliva. Recognition of facial emotion in low vision: A flexible usage of facial features. Visual Neuroscience, 25(4):603--609, 2008.Google ScholarCross Ref
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.Google ScholarCross Ref
- J. Chen, Z. Chen, Z. Chi, and H. Fu. Emotion recognition in the wild with feature fusion and multiple kernel learning. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 508--513, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- F. De la Torre, W.-S. Chu, X. Xiong, F. Vicente, X. Ding, and J. Cohn. Intraface. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, pages 1--8, May 2015.Google ScholarCross Ref
- A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 2106--2112, Nov 2011.Google ScholarCross Ref
- A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. MultiMedia, IEEE, 19(3):34--41, July 2012. Google ScholarDigital Library
- A. Dhall, R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 17th International Conference on Multimodal Interaction, ICMI '15. ACM, 2015. Google ScholarDigital Library
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition, 2014.Google Scholar
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. Google ScholarDigital Library
- I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio. Challenges in representation learning: A report on three machine learning contests. Neural Networks, 64:59--63, 2015. Special Issue on "Deep Learning of Representations".Google ScholarDigital Library
- S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, c. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, M. Mirza, S. Jean, P.-L. Carrier, Y. Dauphin, N. Boulanger-Lewandowski, A. Aggarwal, J. Zumer, P. Lamblin, J.-P. Raymond, G. Desjardins, R. Pascanu, D. Warde-Farley, A. Torabi, A. Sharma, E. Bengio, M. Côté, K. R. Konda, and Z. Wu. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 543--550, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.Google ScholarDigital Library
- M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen. Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 525--530, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 494--501, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- A. Omigbodun and G. Cottrell. Is facial expression processing holistic? In Proceedings of the 35th Annual Conference of the Cognitive Science Society, CogSci 2013. CSS, 2013.Google Scholar
- E. R. Prazak and E. D. Burgund. Keeping it real: Recognizing expressions in real compared to schematic faces. Visual Cognition, 22(5):737--750, 2014.Google ScholarCross Ref
- R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 759--766, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- O. Rudovic, V. Pavlovic, and M. Pantic. Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(5):944--958, May 2015.Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), pages 1--42, April 2015. Google ScholarDigital Library
- J. R. S. Widen and A. Brooks. Anger and disgust: Discrete or overlapping categories? In Proceedings of the 2004 APS Annual Convention, 2004.Google Scholar
- E. Sariyanidi, H. Gunes, and A. Cavallaro. Automatic analysis of facial affect: A survey of registration, representation, and recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(6):1113--1133, June 2015.Google ScholarDigital Library
- K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 517--524, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu. Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 481--486, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- Y.-L. Tian, T. Kanade, and J. Cohn. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97--115, Feb 2001. Google ScholarDigital Library
- P. A. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137--154, 2004. Google ScholarDigital Library
- J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems 27 (NIPS '14), pages 3320--3328. Curran Associates, Inc., 2014.Google Scholar
- K. Yu, Z. Wang, L. Zhuo, J. Wang, Z. Chi, and D. Feng. Learning realistic facial expressions from web images. Pattern Recognition, 46(8):2144--2155, 2013. Google ScholarDigital Library
- M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.Google Scholar
- L. Zhang, D. Tjondronegoro, and V. Chandran. Representation of facial expression categories in continuous arousal-valence space: Feature and correlation. Image and Vision Computing, 32(12):1067--1079, 2014. Google ScholarDigital Library
Index Terms
- Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning
Recommendations
Group-level emotion recognition using transfer learning from face identification
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal InteractionIn this paper, we describe our algorithmic approach, which was used for submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017) group-level emotion recognition sub-challenge. We extracted feature vectors of detected faces using the ...
Using Data Mining Techniques to Analyze Facial Expression Motion Vectors
Dynamics of Information SystemsAbstractAutomatic recognition of facial expressions is a common problem in human-computer interaction. While humans can recognize facial expressions very easily, machines cannot do it as easily as humans. Analyzing facial changes during facial expressions ...
An Improved Face-Emotion Recognition to Automatically Generate Human Expression With Emoticons
Any human face image expression naturally identifies expressions of happy, sad etc.; sometimes human facial image expression recognition is complex, and it is a combination of two emotions. The existing literature provides face emotion classification ...
Comments