Speech Recognition with Weighted Finite-State Transducers

Mohri, Mehryar; Pereira, Fernando; Riley, Michael

doi:10.1007/978-3-540-49127-9_28

Mehryar Mohri Prof.⁴,
Fernando Pereira Ph.D⁵ &
Michael Riley Ph.D⁶

Part of the book series: Springer Handbooks ((SHB))

9046 Accesses
96 Citations

Abstract

This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:: automatic speech recognition
DARPA:: Defense Advanced Research Projects Agency
DFA:: deterministic finite automata
HMM:: hidden Markov models
MLLR:: maximum-likelihood linear regression
NAB:: North American Business News
NFA:: nondeterministic finite automata
WFSA:: weighted finite-state acceptors
WFST:: weighted finite-state transducer

References

F. Pereira, R. Wright: Finite-state approximation of phrase-structure grammars. In: Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge, MA 1997) pp. 149-173
Google Scholar
M-J Nederhof: Practical experiments with regular approximation of context-free languages, Computational Linguistics, 26(1) (2000)
Google Scholar
M. Mohri, M.-J. Nederhof: Regular approximation of context-free grammars through transformation. In: Robustness in Language and Speech Technology, ed. by J.C. Junqua, G. van Noord (Kluwer Academic, The Netherlands 2001) pp. 153-163
Chapter Google Scholar
M. Mohri, F. Pereira, M. Riley: The design principles of a weighted finite-state transducer library, Theoret. Comput. Sci. 231, 17-32 (2000)
Article MathSciNet MATH Google Scholar
F. Pereira, M. Riley: Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata (MIT Press, Cambridge, MA 1997)
Google Scholar
M. Mohri: Finite-state transducers in language and speech processing, Computational Linguistics 23(2) (1997)
Google Scholar
M. Mohri, F. Pereira, Michael Riley: Weighted automata in text and speech processing, ECAI-96 Workshop, Budapest ECAI (1996)
Google Scholar
M. Mohri, M. Riley: Network optimizations for large vocabulary speech recognition, Speech Commun. 25(3) (1998)
Google Scholar
M. Mohri, M. Riley, D. Hindle, A. Ljolje, F. Pereira: full expansion of context-dependent networks in large vocabulary speech recognition, Procs. ICASSP (ICASSP ʼ98), Seattle (1998)
Google Scholar
M. Mohri, M. Riley: Integrated context-dependent networks in very large vocabulary speech recognition, Proc. 6th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)
Google Scholar
S. Ortmanns, H. Ney, A. Eiden: Language-model look-ahead for large vocabulary speech recognition, Proc. Int. Conf. Spoken Lang. Proces. (ICSLPʼ96), University of Delaware and Alfred I. duPont Institute (1996) pp. 2095-2098
Google Scholar
A. Salomaa, M. Soittola: Automata-Theoretic Aspects of Formal Power Series (Springer, New York 1978)
Book MATH Google Scholar
J. Berstel, C. Reutenauer: Rational Series and Their Languages (Springer, Heidelberg 1988)
Book MATH Google Scholar
W. Kuich, A. Salomaa: Semirings, Automata, Languages, EATCS Monographs on Theoretical Computer Science, Vol. 5 (Springer, Berlin, Heidelberg 1986)
Google Scholar
C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri: OpenFST - a General and Efficient Weighted Finite-State Transducer Library, Proceedings of the International Conference on Implementation and Application of Automata, Prague (2007)
Google Scholar
J.E. Hopcroft, J.D. Ullman: Introduction to Automata Theory, Languages, and Computation (Addison Wesley, Reading 1979)
MATH Google Scholar
A.V. Aho, R. Sethi, J.D. Ullman: Compilers, Principles, Techniques and Tools (Addison Wesley, Reading 1986)
MATH Google Scholar
A.V. Aho, J.E. Hopcroft, J.D. Ullman: The Design and Analysis of Computer Algorithms (Addison Wesley, Reading 1974)
MATH Google Scholar
D. Revuz: Minimisation of acyclic deterministic automata in linear time, Theoret. Comput. Sci. 92, 181-189 (1992)
Article MathSciNet MATH Google Scholar
M. Mohri: Semiring frameworks and algorithms for shortest-distance problems, J. Automata Lang. Combinat. 7(3), 321-350 (2002)
MathSciNet MATH Google Scholar
M. Mohri, M. Riley: A weight pushing algorithm for large vocabulary speech recognition, Proc. 7th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ01), Aalborg (2001)
Google Scholar
D.J. Lehmann: Algebraic structures for transitive closures, Theoret. Comput. Sci. 4, 59-76 (1977)
Article MathSciNet MATH Google Scholar
S. Eilenberg: Automata, Languages and Machines (Academic, San Diego 1974-1976) vol. A-B
Google Scholar
J. Berstel: Transductions and Context-Free Languages (Teubner Studienbucher, Stuttgart 1979)
Book MATH Google Scholar
C. Allauzen, M. Mohri: Efficient algorithms for testing the twins property, J. Automata Languages Combinat. 8(2), 117-144 (2003)
MathSciNet MATH Google Scholar
C. Allauzen, M. Mohri: An optimal pre-determinization algorithm for weighted transducers, Theor. Comput. Sci. 328(1-2), 3-18 (2004)
Article MathSciNet MATH Google Scholar
M. Mohri: Statistical natural language processing. In Applied Combinatorics on Words, ed. by M. Lothaire (Cambridge Univ. Press Cambridge 2005)
Google Scholar
S.M. Katz: Estimation of probabilities from sparse data for the language model component of a speech recogniser, IEEE Trans Acoust. Speech Signal Process. 35(3), 400-401 (1987)
Article Google Scholar
C. Allauzen, M. Mohri, B. Roark, M. Riley: A generalized construction of integrated speech recognition transducers, Proc. ICASSP (ICASSP 2004), Montréal (2004)
Google Scholar
M. Riley, F. Pereira, M. Mohri: Transducer composition for context-dependent network expansion, Proc. Eurospeechʼ97, Rhodes (1997)
Google Scholar
M. Mohri, F. C.N. Pereira: Dynamic compilation of weighted context-free grammars, 36th Annual Meeting ACL and 17th Int. Conf. Computat. Linguist., vol. 2 (1998) p. 891-897
Google Scholar
J.W. Carlyle, A. Paz: Realizations by stochastic finite automaton, J. Comput. Syst. Sci. 5, 26-40 (1971)
Article MATH Google Scholar
K. Seymore, R. Rosenfeld: Scalable backoff language models, Proc. ICSLP, Philadelphia (1996)
Google Scholar
S. Kanthak, H. Ney, M. Riley, M. Mohri: A comparison of two LVR search optimization techniques, Proc. Int. Conf. Spoken Lang. Proces. 2002 (ICSLP ʼ02), Denver (2002)
Google Scholar
M. Saraclar, M. Riley, E. Bocchieri, V. Goffin: Towards automatic closed captioning : Low latency real time broadcast news transcription, In Proceedings of the International Conference on Spoken Language Processing (ICSLPʼ02) (2002)
Google Scholar
A. Ljolje, F. Pereira, M. Riley: Efficient general lattice generation and rescoring, Proc. Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)
Google Scholar
C. Leggetter, P. Woodland: Maximum likelihood linear regession for speaker adaptation of continuous density HMMs, Comput. Speech Lang. 9(2), 171-186 (1995)
Article Google Scholar
M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock, M. Saraclar, C. Wooters, G. Zavaliagkos: Stochastic pronunciation modelling form hand-labelled phonetic corpora, Speech Commun. 29, 209-224 (1999)
Article Google Scholar
R. Sproat: Multilingual text analysis for text-to-speech synthesis, J. Nat. Lang. Eng. 2(4), 369-380 (1997)
Article Google Scholar
C. Allauzen, M. Mohri, M. Riley: Statistical modeling for unit selection in speech synthesis, In 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona (2004)
Google Scholar
R. M. Kaplan, M. Kay: Regular models of phonological rule systems, Computational Linguistics 20(3) (1994)
Google Scholar
L. Karttunen: The replace operator, In 33rd Meeting of the Association for Computational Linguistics (ACL 95), Proceedings of the Conference, MIT, Cambridge ACL (1995)
Google Scholar
M. Crochemore, W. Rytter: Text Algorithms (Oxford Univ. Press, Oxford 1994)
MATH Google Scholar
K.I.I. Culik, J. Kari: Digital images and formal languages. In: Handbook of Formal Languages, ed. by G. Rozenberg, A Salomaa (Springer, Berlin, Heidelberg 1997) pp. 599-616
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Courant Institute of Mathematical Sciences, 251 Mercer Street, 10012, New York, NY, USA
Mehryar Mohri Prof.
Department of Computer and Information Science, University of Pennsylvania, 305 Levine Hall, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Fernando Pereira Ph.D
Google, Inc., Research, 111 Eighth AV, 10011, New York, NY, USA
Michael Riley Ph.D

Authors

Mehryar Mohri Prof.
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Pereira Ph.D
View author publications
You can also search for this author in PubMed Google Scholar
Michael Riley Ph.D
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mehryar Mohri Prof. , Fernando Pereira Ph.D or Michael Riley Ph.D .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mohri, M., Pereira, F., Riley, M. (2008). Speech Recognition with Weighted Finite-State Transducers. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics