research-article

Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning

Authors:
Hong-Wei Ng

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore
View Profile

,
Viet Dung Nguyen

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore
View Profile

,
Vassilios Vonikakis

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore
View Profile

,
Stefan Winkler

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore

University of Illinois at Urbana-Champaign (Singapore), Singapore, Singapore
View Profile

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionNovember 2015Pages 443–449https://doi.org/10.1145/2818346.2830593

Published:09 November 2015Publication History

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 443–449

ABSTRACT

This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.

References

M. Boucart, J.-F. Dinon, P. Despretz, T. Desmettre, K. Hladiuk, and A. Oliva. Recognition of facial emotion in low vision: A flexible usage of facial features. Visual Neuroscience, 25(4):603--609, 2008.Google ScholarCross Ref
K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.Google ScholarCross Ref
J. Chen, Z. Chen, Z. Chi, and H. Fu. Emotion recognition in the wild with feature fusion and multiple kernel learning. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 508--513, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
F. De la Torre, W.-S. Chu, X. Xiong, F. Vicente, X. Ding, and J. Cohn. Intraface. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, pages 1--8, May 2015.Google ScholarCross Ref
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 2106--2112, Nov 2011.Google ScholarCross Ref
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. MultiMedia, IEEE, 19(3):34--41, July 2012. Google ScholarDigital Library
A. Dhall, R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 17th International Conference on Multimodal Interaction, ICMI '15. ACM, 2015. Google ScholarDigital Library
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition, 2014.Google Scholar
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. Google ScholarDigital Library
I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio. Challenges in representation learning: A report on three machine learning contests. Neural Networks, 64:59--63, 2015. Special Issue on "Deep Learning of Representations".Google ScholarDigital Library
S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, c. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, M. Mirza, S. Jean, P.-L. Carrier, Y. Dauphin, N. Boulanger-Lewandowski, A. Aggarwal, J. Zumer, P. Lamblin, J.-P. Raymond, G. Desjardins, R. Pascanu, D. Warde-Farley, A. Torabi, A. Sharma, E. Bengio, M. Côté, K. R. Konda, and Z. Wu. Combining modality specific deep neural networks for emotion recognition in video. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 543--550, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.Google ScholarDigital Library
M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen. Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 525--530, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 494--501, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
A. Omigbodun and G. Cottrell. Is facial expression processing holistic? In Proceedings of the 35th Annual Conference of the Cognitive Science Society, CogSci 2013. CSS, 2013.Google Scholar
E. R. Prazak and E. D. Burgund. Keeping it real: Recognizing expressions in real compared to schematic faces. Visual Cognition, 22(5):737--750, 2014.Google ScholarCross Ref
R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 759--766, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
O. Rudovic, V. Pavlovic, and M. Pantic. Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(5):944--958, May 2015.Google Scholar
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), pages 1--42, April 2015. Google ScholarDigital Library
J. R. S. Widen and A. Brooks. Anger and disgust: Discrete or overlapping categories? In Proceedings of the 2004 APS Annual Convention, 2004.Google Scholar
E. Sariyanidi, H. Gunes, and A. Cavallaro. Automatic analysis of facial affect: A survey of registration, representation, and recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(6):1113--1133, June 2015.Google ScholarDigital Library
K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI '13, pages 517--524, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu. Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI '14, pages 481--486, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
Y.-L. Tian, T. Kanade, and J. Cohn. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97--115, Feb 2001. Google ScholarDigital Library
P. A. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137--154, 2004. Google ScholarDigital Library
J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems 27 (NIPS '14), pages 3320--3328. Curran Associates, Inc., 2014.Google Scholar
K. Yu, Z. Wang, L. Zhuo, J. Wang, Z. Chi, and D. Feng. Learning realistic facial expressions from web images. Pattern Recognition, 46(8):2144--2155, 2013. Google ScholarDigital Library
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.Google Scholar
L. Zhang, D. Tjondronegoro, and V. Chandran. Representation of facial expression categories in continuous arousal-valence space: Feature and correlation. Image and Vision Computing, 32(12):1067--1079, 2014. Google ScholarDigital Library

Index Terms

Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Group-level emotion recognition using transfer learning from face identification
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

In this paper, we describe our algorithmic approach, which was used for submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017) group-level emotion recognition sub-challenge. We extracted feature vectors of detected faces using the ...
Read More
Using Data Mining Techniques to Analyze Facial Expression Motion Vectors
Dynamics of Information Systems
Abstract
Automatic recognition of facial expressions is a common problem in human-computer interaction. While humans can recognize facial expressions very easily, machines cannot do it as easily as humans. Analyzing facial changes during facial expressions ...
Read More
An Improved Face-Emotion Recognition to Automatically Generate Human Expression With Emoticons

Any human face image expression naturally identifies expressions of happy, sad etc.; sometimes human facial image expression recognition is complex, and it is a combination of two emotions. The existing literature provides face emotion classification ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
November 2015
678 pages
ISBN:9781450339124
DOI:10.1145/2818346
General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning networks
emotion classification
facial expression analysis
Qualifiers
- research-article
Conference

Acceptance Rates
ICMI '15 Paper Acceptance Rate52of127submissions,41%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 378
  Total Citations
  View Citations
- 4,994
  Total Downloads
- Downloads (Last 12 months)361
- Downloads (Last 6 weeks)61
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Group-level emotion recognition using transfer learning from face identification

Using Data Mining Techniques to Analyze Facial Expression Motion Vectors

An Improved Face-Emotion Recognition to Automatically Generate Human Expression With Emoticons

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Group-level emotion recognition using transfer learning from face identification

Using Data Mining Techniques to Analyze Facial Expression Motion Vectors

An Improved Face-Emotion Recognition to Automatically Generate Human Expression With Emoticons

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media