Speech Features Analysis for Tone Language Speaker Discrimination Systems

Edoho, Mercy; Ekpenyong, Moses; Inyang, Udoinyang

doi:10.1007/978-3-319-77028-4_57

Mercy Edoho¹⁵,
Moses Ekpenyong¹⁵ &
Udoinyang Inyang¹⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 738))

2630 Accesses
1 Citations

Abstract

In this paper, a speech pattern analysis framework for tone language speaker discrimination systems is proposed. We hold the hypothesis that speech feature variability is an efficient means for discriminating speakers. To achieve this, we exploit prosody-related acoustic features (pitch, intensity and glottal pulse) of corpus recordings obtained from male and female speakers of varying age categories: children (0–15), youths (16–30), adults (31–50), seniors (above 50)—and captured under suboptimal conditions. The speaker dataset was segmented into three sets: train, validation and test set—in the ratio of 70%, 15% and 15%, respectively. A 41 × 14 self-organizing map (SOM) architecture was then used to model the speech features, thereby determining the relationship between the speech features, segments and patterns. Results of a speech pattern analysis indicated wide F0 variability amongst children speakers compared with other speakers. This gap however closes as the speaker ages. Further, the intensity variability among speakers was similar across all speaker classes/categories, while glottal pulse exhibited significant variation among the different speaker classes. Results of SOM feature visualization confirmed high inter-variability—between speakers, and low intra-variability—within speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

W. Koenig, A new frequency scale for acoustic measurements. Bell Telephone Lab. Rec. 27, 299–301 (1949)
Google Scholar
S.B. Davis, P. Ermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)
Article Google Scholar
N. Kaiki, K. Takeda, Y. Sagisaka, Linguistic properties in the control of segmental duration for speech synthesis, in Talking Machines: Theories, Models, and Designs, ed. By G. Bailly, C. Benoit, T.R. Sawalis (Elsevier, Amsterdam, 1992), pp. 255–263
Google Scholar
M. Riley, Tree-based modelling of segmental duration, in Talking Machines: Theories, Models, and Designs, ed. By G. Bailly, C. Benoit, T.R. Sawallis (Elsevier Science, Amsterdam, 1992), pp. 265–273
Google Scholar
N. Iwahashi, Y. Sagisaka, Duration modeling with multiple split regression, in Proceedings of the EUROSPEEC, 1993, pp. 329–332
Google Scholar
J.P.H. van Santen, C. Shih, B. Mobius, E. Tzoukermann, M. Tanenblatt, Multi-lingual duration modeling, in Proceedings of the EUROSPEEC-97 vol. 5, 1997, pp. 2651–2654
Google Scholar
T. Yoshimura, K. Tokuda, T. Masuko, T Kobayashi, T Kitamura, Duration modeling for HMM-based speech synthesis, in Proceedings of the ICSLP 98, 1998, pp. 29–31
Google Scholar
K.S. Rao, B. Yegnanarayana, Modeling durations of syllables using neural networks. Comput. Speech Lang. 1, 282–295 (2007)
Article Google Scholar
T. Shreekantha, V. Udayashankarab, M. Chandrika, Duration modelling using neural networks for hindi TTS system considering position of syllable in a word. Procedia Comput. Sci. 46, 60–67 (2015)
Article Google Scholar
A.K. Jain, A. Ross, S. Prabhakar, An introduction to biometric recognition. IEEE Trans. Circuit. Syst. Video Technol. 14(1), 4–20 (2004)
Article Google Scholar
U. Bhattacharjee, K. Sarmah, Speaker verification using acoustic and prosodic features. Adv. Comput. Int. J. 4(1), 45–51 (2013)
Article Google Scholar
S. Gabrielsson, S. Gabrielsson. The use of Self-Organizing Maps in Recommender Systems. A Survey of the Recommender Systems Field and a Presentation of a State of the Art Highly Interactive Visual Movie Recommender System. M.Sc. Thesis, Uppsala Universitet, Sweden, 2006
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Uyo, Uyo, Nigeria
Mercy Edoho, Moses Ekpenyong & Udoinyang Inyang

Authors

Mercy Edoho
View author publications
You can also search for this author in PubMed Google Scholar
Moses Ekpenyong
View author publications
You can also search for this author in PubMed Google Scholar
Udoinyang Inyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moses Ekpenyong .

Editor information

Editors and Affiliations

Department of Electrical & Computer Engineering, University of Nevada, Las Vegas, Las Vegas, Nevada, USA
Shahram Latifi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Edoho, M., Ekpenyong, M., Inyang, U. (2018). Speech Features Analysis for Tone Language Speaker Discrimination Systems. In: Latifi, S. (eds) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol 738. Springer, Cham. https://doi.org/10.1007/978-3-319-77028-4_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-77028-4_57
Published: 13 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77027-7
Online ISBN: 978-3-319-77028-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics