Abstract
This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ASR:
-
automatic speech recognition
- DARPA:
-
Defense Advanced Research Projects Agency
- DFA:
-
deterministic finite automata
- HMM:
-
hidden Markov models
- MLLR:
-
maximum-likelihood linear regression
- NAB:
-
North American Business News
- NFA:
-
nondeterministic finite automata
- WFSA:
-
weighted finite-state acceptors
- WFST:
-
weighted finite-state transducer
References
F. Pereira, R. Wright: Finite-state approximation of phrase-structure grammars. In: Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge, MA 1997) pp. 149-173
M-J Nederhof: Practical experiments with regular approximation of context-free languages, Computational Linguistics, 26(1) (2000)
M. Mohri, M.-J. Nederhof: Regular approximation of context-free grammars through transformation. In: Robustness in Language and Speech Technology, ed. by J.C. Junqua, G. van Noord (Kluwer Academic, The Netherlands 2001) pp. 153-163
M. Mohri, F. Pereira, M. Riley: The design principles of a weighted finite-state transducer library, Theoret. Comput. Sci. 231, 17-32 (2000)
F. Pereira, M. Riley: Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata (MIT Press, Cambridge, MA 1997)
M. Mohri: Finite-state transducers in language and speech processing, Computational Linguistics 23(2) (1997)
M. Mohri, F. Pereira, Michael Riley: Weighted automata in text and speech processing, ECAI-96 Workshop, Budapest ECAI (1996)
M. Mohri, M. Riley: Network optimizations for large vocabulary speech recognition, Speech Commun. 25(3) (1998)
M. Mohri, M. Riley, D. Hindle, A. Ljolje, F. Pereira: full expansion of context-dependent networks in large vocabulary speech recognition, Procs. ICASSP (ICASSP ʼ98), Seattle (1998)
M. Mohri, M. Riley: Integrated context-dependent networks in very large vocabulary speech recognition, Proc. 6th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)
S. Ortmanns, H. Ney, A. Eiden: Language-model look-ahead for large vocabulary speech recognition, Proc. Int. Conf. Spoken Lang. Proces. (ICSLPʼ96), University of Delaware and Alfred I. duPont Institute (1996) pp. 2095-2098
A. Salomaa, M. Soittola: Automata-Theoretic Aspects of Formal Power Series (Springer, New York 1978)
J. Berstel, C. Reutenauer: Rational Series and Their Languages (Springer, Heidelberg 1988)
W. Kuich, A. Salomaa: Semirings, Automata, Languages, EATCS Monographs on Theoretical Computer Science, Vol. 5 (Springer, Berlin, Heidelberg 1986)
C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri: OpenFST - a General and Efficient Weighted Finite-State Transducer Library, Proceedings of the International Conference on Implementation and Application of Automata, Prague (2007)
J.E. Hopcroft, J.D. Ullman: Introduction to Automata Theory, Languages, and Computation (Addison Wesley, Reading 1979)
A.V. Aho, R. Sethi, J.D. Ullman: Compilers, Principles, Techniques and Tools (Addison Wesley, Reading 1986)
A.V. Aho, J.E. Hopcroft, J.D. Ullman: The Design and Analysis of Computer Algorithms (Addison Wesley, Reading 1974)
D. Revuz: Minimisation of acyclic deterministic automata in linear time, Theoret. Comput. Sci. 92, 181-189 (1992)
M. Mohri: Semiring frameworks and algorithms for shortest-distance problems, J. Automata Lang. Combinat. 7(3), 321-350 (2002)
M. Mohri, M. Riley: A weight pushing algorithm for large vocabulary speech recognition, Proc. 7th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ01), Aalborg (2001)
D.J. Lehmann: Algebraic structures for transitive closures, Theoret. Comput. Sci. 4, 59-76 (1977)
S. Eilenberg: Automata, Languages and Machines (Academic, San Diego 1974-1976) vol. A-B
J. Berstel: Transductions and Context-Free Languages (Teubner Studienbucher, Stuttgart 1979)
C. Allauzen, M. Mohri: Efficient algorithms for testing the twins property, J. Automata Languages Combinat. 8(2), 117-144 (2003)
C. Allauzen, M. Mohri: An optimal pre-determinization algorithm for weighted transducers, Theor. Comput. Sci. 328(1-2), 3-18 (2004)
M. Mohri: Statistical natural language processing. In Applied Combinatorics on Words, ed. by M. Lothaire (Cambridge Univ. Press Cambridge 2005)
S.M. Katz: Estimation of probabilities from sparse data for the language model component of a speech recogniser, IEEE Trans Acoust. Speech Signal Process. 35(3), 400-401 (1987)
C. Allauzen, M. Mohri, B. Roark, M. Riley: A generalized construction of integrated speech recognition transducers, Proc. ICASSP (ICASSP 2004), Montréal (2004)
M. Riley, F. Pereira, M. Mohri: Transducer composition for context-dependent network expansion, Proc. Eurospeechʼ97, Rhodes (1997)
M. Mohri, F. C.N. Pereira: Dynamic compilation of weighted context-free grammars, 36th Annual Meeting ACL and 17th Int. Conf. Computat. Linguist., vol. 2 (1998) p. 891-897
J.W. Carlyle, A. Paz: Realizations by stochastic finite automaton, J. Comput. Syst. Sci. 5, 26-40 (1971)
K. Seymore, R. Rosenfeld: Scalable backoff language models, Proc. ICSLP, Philadelphia (1996)
S. Kanthak, H. Ney, M. Riley, M. Mohri: A comparison of two LVR search optimization techniques, Proc. Int. Conf. Spoken Lang. Proces. 2002 (ICSLP ʼ02), Denver (2002)
M. Saraclar, M. Riley, E. Bocchieri, V. Goffin: Towards automatic closed captioning : Low latency real time broadcast news transcription, In Proceedings of the International Conference on Spoken Language Processing (ICSLPʼ02) (2002)
A. Ljolje, F. Pereira, M. Riley: Efficient general lattice generation and rescoring, Proc. Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)
C. Leggetter, P. Woodland: Maximum likelihood linear regession for speaker adaptation of continuous density HMMs, Comput. Speech Lang. 9(2), 171-186 (1995)
M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock, M. Saraclar, C. Wooters, G. Zavaliagkos: Stochastic pronunciation modelling form hand-labelled phonetic corpora, Speech Commun. 29, 209-224 (1999)
R. Sproat: Multilingual text analysis for text-to-speech synthesis, J. Nat. Lang. Eng. 2(4), 369-380 (1997)
C. Allauzen, M. Mohri, M. Riley: Statistical modeling for unit selection in speech synthesis, In 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona (2004)
R. M. Kaplan, M. Kay: Regular models of phonological rule systems, Computational Linguistics 20(3) (1994)
L. Karttunen: The replace operator, In 33rd Meeting of the Association for Computational Linguistics (ACL 95), Proceedings of the Conference, MIT, Cambridge ACL (1995)
M. Crochemore, W. Rytter: Text Algorithms (Oxford Univ. Press, Oxford 1994)
K.I.I. Culik, J. Kari: Digital images and formal languages. In: Handbook of Formal Languages, ed. by G. Rozenberg, A Salomaa (Springer, Berlin, Heidelberg 1997) pp. 599-616
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Mohri, M., Pereira, F., Riley, M. (2008). Speech Recognition with Weighted Finite-State Transducers. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)