Skip to main content

Speech Recognition with Weighted Finite-State Transducers

  • Chapter
Book cover Springer Handbook of Speech Processing

Part of the book series: Springer Handbooks ((SHB))

Abstract

This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:

automatic speech recognition

DARPA:

Defense Advanced Research Projects Agency

DFA:

deterministic finite automata

HMM:

hidden Markov models

MLLR:

maximum-likelihood linear regression

NAB:

North American Business News

NFA:

nondeterministic finite automata

WFSA:

weighted finite-state acceptors

WFST:

weighted finite-state transducer

References

  1. F. Pereira, R. Wright: Finite-state approximation of phrase-structure grammars. In: Finite-State Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge, MA 1997) pp. 149-173

    Google Scholar 

  2. M-J Nederhof: Practical experiments with regular approximation of context-free languages, Computational Linguistics, 26(1) (2000)

    Google Scholar 

  3. M. Mohri, M.-J. Nederhof: Regular approximation of context-free grammars through transformation. In: Robustness in Language and Speech Technology, ed. by J.C. Junqua, G. van Noord (Kluwer Academic, The Netherlands 2001) pp. 153-163

    Chapter  Google Scholar 

  4. M. Mohri, F. Pereira, M. Riley: The design principles of a weighted finite-state transducer library, Theoret. Comput. Sci. 231, 17-32 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. F. Pereira, M. Riley: Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata (MIT Press, Cambridge, MA 1997)

    Google Scholar 

  6. M. Mohri: Finite-state transducers in language and speech processing, Computational Linguistics 23(2) (1997)

    Google Scholar 

  7. M. Mohri, F. Pereira, Michael Riley: Weighted automata in text and speech processing, ECAI-96 Workshop, Budapest ECAI (1996)

    Google Scholar 

  8. M. Mohri, M. Riley: Network optimizations for large vocabulary speech recognition, Speech Commun. 25(3) (1998)

    Google Scholar 

  9. M. Mohri, M. Riley, D. Hindle, A. Ljolje, F. Pereira: full expansion of context-dependent networks in large vocabulary speech recognition, Procs. ICASSP (ICASSP ʼ98), Seattle (1998)

    Google Scholar 

  10. M. Mohri, M. Riley: Integrated context-dependent networks in very large vocabulary speech recognition, Proc. 6th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)

    Google Scholar 

  11. S. Ortmanns, H. Ney, A. Eiden: Language-model look-ahead for large vocabulary speech recognition, Proc. Int. Conf. Spoken Lang. Proces. (ICSLPʼ96), University of Delaware and Alfred I. duPont Institute (1996) pp. 2095-2098

    Google Scholar 

  12. A. Salomaa, M. Soittola: Automata-Theoretic Aspects of Formal Power Series (Springer, New York 1978)

    Book  MATH  Google Scholar 

  13. J. Berstel, C. Reutenauer: Rational Series and Their Languages (Springer, Heidelberg 1988)

    Book  MATH  Google Scholar 

  14. W. Kuich, A. Salomaa: Semirings, Automata, Languages, EATCS Monographs on Theoretical Computer Science, Vol. 5 (Springer, Berlin, Heidelberg 1986)

    Google Scholar 

  15. C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri: OpenFST - a General and Efficient Weighted Finite-State Transducer Library, Proceedings of the International Conference on Implementation and Application of Automata, Prague (2007)

    Google Scholar 

  16. J.E. Hopcroft, J.D. Ullman: Introduction to Automata Theory, Languages, and Computation (Addison Wesley, Reading 1979)

    MATH  Google Scholar 

  17. A.V. Aho, R. Sethi, J.D. Ullman: Compilers, Principles, Techniques and Tools (Addison Wesley, Reading 1986)

    MATH  Google Scholar 

  18. A.V. Aho, J.E. Hopcroft, J.D. Ullman: The Design and Analysis of Computer Algorithms (Addison Wesley, Reading 1974)

    MATH  Google Scholar 

  19. D. Revuz: Minimisation of acyclic deterministic automata in linear time, Theoret. Comput. Sci. 92, 181-189 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  20. M. Mohri: Semiring frameworks and algorithms for shortest-distance problems, J. Automata Lang. Combinat. 7(3), 321-350 (2002)

    MathSciNet  MATH  Google Scholar 

  21. M. Mohri, M. Riley: A weight pushing algorithm for large vocabulary speech recognition, Proc. 7th Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ01), Aalborg (2001)

    Google Scholar 

  22. D.J. Lehmann: Algebraic structures for transitive closures, Theoret. Comput. Sci. 4, 59-76 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  23. S. Eilenberg: Automata, Languages and Machines (Academic, San Diego 1974-1976) vol. A-B

    Google Scholar 

  24. J. Berstel: Transductions and Context-Free Languages (Teubner Studienbucher, Stuttgart 1979)

    Book  MATH  Google Scholar 

  25. C. Allauzen, M. Mohri: Efficient algorithms for testing the twins property, J. Automata Languages Combinat. 8(2), 117-144 (2003)

    MathSciNet  MATH  Google Scholar 

  26. C. Allauzen, M. Mohri: An optimal pre-determinization algorithm for weighted transducers, Theor. Comput. Sci. 328(1-2), 3-18 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  27. M. Mohri: Statistical natural language processing. In Applied Combinatorics on Words, ed. by M. Lothaire (Cambridge Univ. Press Cambridge 2005)

    Google Scholar 

  28. S.M. Katz: Estimation of probabilities from sparse data for the language model component of a speech recogniser, IEEE Trans Acoust. Speech Signal Process. 35(3), 400-401 (1987)

    Article  Google Scholar 

  29. C. Allauzen, M. Mohri, B. Roark, M. Riley: A generalized construction of integrated speech recognition transducers, Proc. ICASSP (ICASSP 2004), Montréal (2004)

    Google Scholar 

  30. M. Riley, F. Pereira, M. Mohri: Transducer composition for context-dependent network expansion, Proc. Eurospeechʼ97, Rhodes (1997)

    Google Scholar 

  31. M. Mohri, F. C.N. Pereira: Dynamic compilation of weighted context-free grammars, 36th Annual Meeting ACL and 17th Int. Conf. Computat. Linguist., vol. 2 (1998) p. 891-897

    Google Scholar 

  32. J.W. Carlyle, A. Paz: Realizations by stochastic finite automaton, J. Comput. Syst. Sci. 5, 26-40 (1971)

    Article  MATH  Google Scholar 

  33. K. Seymore, R. Rosenfeld: Scalable backoff language models, Proc. ICSLP, Philadelphia (1996)

    Google Scholar 

  34. S. Kanthak, H. Ney, M. Riley, M. Mohri: A comparison of two LVR search optimization techniques, Proc. Int. Conf. Spoken Lang. Proces. 2002 (ICSLP ʼ02), Denver (2002)

    Google Scholar 

  35. M. Saraclar, M. Riley, E. Bocchieri, V. Goffin: Towards automatic closed captioning : Low latency real time broadcast news transcription, In Proceedings of the International Conference on Spoken Language Processing (ICSLPʼ02) (2002)

    Google Scholar 

  36. A. Ljolje, F. Pereira, M. Riley: Efficient general lattice generation and rescoring, Proc. Europ. Conf. Speech Commun. Technol. (Eurospeech ʼ99), Budapest (1999)

    Google Scholar 

  37. C. Leggetter, P. Woodland: Maximum likelihood linear regession for speaker adaptation of continuous density HMMs, Comput. Speech Lang. 9(2), 171-186 (1995)

    Article  Google Scholar 

  38. M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock, M. Saraclar, C. Wooters, G. Zavaliagkos: Stochastic pronunciation modelling form hand-labelled phonetic corpora, Speech Commun. 29, 209-224 (1999)

    Article  Google Scholar 

  39. R. Sproat: Multilingual text analysis for text-to-speech synthesis, J. Nat. Lang. Eng. 2(4), 369-380 (1997)

    Article  Google Scholar 

  40. C. Allauzen, M. Mohri, M. Riley: Statistical modeling for unit selection in speech synthesis, In 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona (2004)

    Google Scholar 

  41. R. M. Kaplan, M. Kay: Regular models of phonological rule systems, Computational Linguistics 20(3) (1994)

    Google Scholar 

  42. L. Karttunen: The replace operator, In 33rd Meeting of the Association for Computational Linguistics (ACL 95), Proceedings of the Conference, MIT, Cambridge ACL (1995)

    Google Scholar 

  43. M. Crochemore, W. Rytter: Text Algorithms (Oxford Univ. Press, Oxford 1994)

    MATH  Google Scholar 

  44. K.I.I. Culik, J. Kari: Digital images and formal languages. In: Handbook of Formal Languages, ed. by G. Rozenberg, A Salomaa (Springer, Berlin, Heidelberg 1997) pp. 599-616

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mehryar Mohri Prof. , Fernando Pereira Ph.D or Michael Riley Ph.D .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mohri, M., Pereira, F., Riley, M. (2008). Speech Recognition with Weighted Finite-State Transducers. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics