Elsevier

Neural Networks

Volume 21, Issues 2–3, March–April 2008, Pages 511-523
Neural Networks

2008 Special Issue
Compact hardware liquid state machines on FPGA for real-time speech recognition

https://doi.org/10.1016/j.neunet.2007.12.009Get rights and content

Abstract

Hardware implementations of Spiking Neural Networks are numerous because they are well suited for implementation in digital and analog hardware, and outperform classic neural networks. This work presents an application driven digital hardware exploration where we implement real-time, isolated digit speech recognition using a Liquid State Machine. The Liquid State Machine is a recurrent neural network of spiking neurons where only the output layer is trained. First we test two existing hardware architectures which we improve and extend, but that appears to be too fast and thus area consuming for this application. Next, we present a scalable, serialized architecture that allows a very compact implementation of spiking neural networks that is still fast enough for real-time processing. All architectures support leaky integrate-and-fire membranes with exponential synaptic models. This work shows that there is actually a large hardware design space of Spiking Neural Network hardware that can be explored. Existing architectures have only spanned part of it.

Introduction

Spiking Neural Networks (SNNs), neural network models that use spikes to communicate, (1) have been shown theoretically (Maass, 1997) and practically 1 (Booij, 2004, Schrauwen and Van Campenhout, 2006) to computationally outperform analog neural networks, (2) are biologically more plausible, (3) have an intrinsic temporal nature that can be used to solve temporal problems, and (4) are well suited be implemented on digital and analog hardware. SNNs have been applied with success to several applications such as face detection (Delorme & Thorpe, 2001), lipreading (Booij, 2004), speech recognition (Verstraeten, Schrauwen, Stroobandt, & Van Campenhout, 2005) and speaker identification (Verstraeten, 2004), autonomous robot control (Floreano et al., 2004, Roggen et al., 2003) and several UCI benchmarks (Bohte, Poutré, & Kok, 2002).

The main drawback of SNNs is that they are difficult to train in a supervised fashion mainly because the hard thresholding that is present in all simple spiking neuron models, makes the calculation of gradients very prone to errors which deteriorates the learning rule’s performance (Bohte et al., 2002, Schrauwen and Van Campenhout, 2004, Schrauwen and Van Campenhout, 2006). One way to circumvent this is by using fixed parameters. This is what is embodied by the Liquid State Machine (LSM) concept (Maass, Natschläger, & Markram, 2002) (which is conceptually identical to Echo State Networks (Jaeger & Haas, 2004) and which is generally termed Reservoir Computing (Verstraeten, Schrauwen, D’Haene, & Stroobandt, 2007)). Here a recurrent network of spiking neurons is constructed where all the network parameters (interconnection, weights, delays, etc.) are fixed and randomly chosen. This network, the so-called liquid or reservoir, typically exhibits complex non-linear dynamics in its high-dimensional internal state. This state is excited by the network input, and is expected to capture and expose the relevant information embedded in the latter. As is the case with kernel methods, it is possible to extract this information by processing these network states with simple linear techniques to obtain the actual regression or classification output.

Neural networks can be, and are often, implemented through simulation on sequential computers. This approach obviously limits the speed of operation of the network. Direct parallel hardware implementations of neural networks have been a fruitful research area due to their intrinsic parallel nature which allows very large speedups compared to sequential implementations. Several general overview publications on digital hardware implementations of neural networks have been published (Anguita and Valle, 2001, Aybay et al., 1996, Burr, 1991, Moerland and Fiesler, 1997, Schoenauer et al., 1998), some of them specific to implementations on Field Programmable Gate Arrays (FPGA, reconfigurable digital hardware components) (Girau, 2000, Omondi and Rajapakse, 2006, Zhu and Sutton, 2003), and specific for digital spiking neural networks (Jahnke et al., 1995, Jahnke et al., 1997, Preis et al., 2001, Schäfer et al., 2002), or neuromorphic analog VLSI (Smith & Hamilton, 1998). In this work we focus on digital implementations on FPGAs because they offer an intermediate between the programmability of classic processors and the parallel nature and high speed of ASICs. Also, FPGAs allow a much faster implementation cycle (minutes instead of months) and for a ‘small’ number of chips (currently less than 100,000!). FPGAs are cheaper than ASICs.

The main reasons for looking at spiking neuron hardware implementations is that: (1) information is represented in time instead of in space allowing one to use better the intrinsic high speed of digital hardware, (2) a spiking communication channel can carry more information than an analog (rate coded) communication channel (Softky, 1996) allowing for lower bandwidth interconnection links in hardware, (3) due to the on/off nature no multiplications have to be performed at the input of a neuron, and (4) a broad range of architectures are possible spanning a large continuum in design space which allows the user to make an area/speed trade-off depending on the application. Several digital implementations of SNNs have already been published, ranging from very simple neurons implemented on 8-bit microcontrollers (Floreano et al., 2002, Nielsen and Lund, 2003) to somewhat larger FPGA-based systems (Bellis et al., 2004, Girau and Torres-Huitzil, 2006, Pearson et al., 2005, Roggen et al., 2003, Upegui et al., 2005) and finally very large systems comprising several FPGAs or even ASICs (de Garis et al., 2000, Glackin et al., 2005, Grassmann and Anlauf, 2002, Hartmann et al., 1997, Hellmich et al., 2005, Jahnke et al., 1996, Mehrtash et al., 2003, Ros et al., 2005, Schoenauer et al., 1998, Waldemark et al., 2000). A taxonomy of these architectures is presented in Table 1. We subdivide the designs according to the simulation principle, hardware platform, the number of Processing Elements (PEs), and if they use time sharing of PEs or not (multiple neurons are processed on the same hardware).

The LSM architecture has many properties that are advantageous to hardware implementations: weights are fixed and chosen at random (high weight quantization noise can be taken into account a priori), and interconnection topology can be very sparse and even ‘small world’ (many local connections, few global ones) which allows easy wireability that matches the intrinsic FPGA wiring capabilities well. Also, for many LSM applications quite large networks of spiking neurons (up to 1000) need to be simulated with hard real-time constraints which is difficult in software or when using event-based simulation techniques. Multiple outputs can be generated from the same reservoir 2 allowing a generic hardware reservoir component that can operate on different applications and with multiple outputs. And due to the intrinsic LSM robustness even faulty FPGAs can be used or a higher yield of ASICs is possible (Schürmann, Meier, & Schemmel, 2005).

Recently a very convincing engineering application for the Liquid State Machine was presented: isolated spoken digit recognition (Verstraeten et al., 2005). When optimally tweaked (Verstraeten, Schrauwen, & Stroobandt, 2006), it can outperform state-of-the-art Hidden Markov Model based recognizers. The system is biologically motivated: a model of the human inner ear is used to preprocess the audio data, next an LSM is constructed with biologically correct settings and interconnection (Maass et al., 2002), and a simple linear classifier is used to perform the actual classification.

In this paper we present an application-oriented design flow for LSM-based hardware implementation. Real-time, single channel speech recognition with the lowest hardware cost is desired. To attain this goal we implement the speech task on two existing hardware architectures for SNNs: a design that processes synapses serially and which uses parallel arithmetic (Roggen et al., 2003, Upegui et al., 2005) and a design that processes the synapses in parallel, but does the arithmetic serially (Girau and Torres-Huitzil, 2006, Schrauwen and Van Campenhout, 2006). As we will show, these architectures are always much faster than real-time, and thus waste chip area. For all digital implementations in general, a broad range of different architectures is possible with very different properties with respect to chip area, memory usage and computing time. Most of the time the properties are contradictory and therefor a area/time trade-off has to be made. We present a new architecture that uses both serial synapse processing and serial arithmetic. Using this option we are able to process just fast enough for real-time with a very limited amount of hardware. Without much extra hardware cost this design allows to easily scale between a single PE which performs slow serial processing of the neurons to multiple PEs that each process part of the network at increased speed. The design space for hardware SNNs has thus been drastically enlarged. All presented designs were implemented at our lab and run on actual hardware. The LSM-hardware idea (but with threshold logic neurons) was previously already implemented in analog VLSI hardware in Schürmann et al. (2005).

All above-mentioned hardware implementations of SNNs only implement very simple Leaky Integrate and Fire (LIF) spiking neurons (or even the simpler Integrate and Fire). This is the simplest first-order or single-compartment spiking neuron model. It was theoretically shown by Maass (1997) that second-order models (which have a model for synaptic dynamics) are computationally superior to first-order spiking neurons, and the proof that SNNs are more powerful (in the sense of computational efficiency) than ANNs (Maass, 1997) is based on second-order SNNs. This is why we chose to include second-order neurons in all implementation architectures discussed in this work.

For all hardware architectures studied in this publication, network topologies can be automatically generated by the Matlab code which is part of the Reservoir Computing toolbox 3 presented in Verstraeten et al. (2007). This way, network structures can first be explored in software, and when a good topology is found, it can easily and automatically be exported to a structured hardware description which fits in an automated design flow.

This contribution, which is an extended version of a paper presented at IJCNN 2007 (Schrauwen et al., 2007), is structured as follows. We start in Sections 2 Application: Isolated digit speech recognition, 3 Hardware-oriented RC design flow: RC Matlab toolbox by presenting the LSM-based isolated digits speech recognition application, and a methodology to port the software application to hardware based on a freely available Matlab toolbox for RC simulations. Section 4 presents the class of neuron models that will be implemented in hardware, and Section 5 shows how these models can be efficiently approximated digitally in a time-step-based simulation. An overview of the different possible forms of parallelism and interconnection is given in Section 6. Three implementations, each with very specific spatial and temporal characteristics, are presented in Sections 7 Existing compact hardware architectures for SNNs, 8 Multiple PEs, serial processing, serial arithmetic. Section 9 compares their properties in terms of the important network and neuron parameters. Section 10 briefly gives an overview of the complete hardware system that is developed to perform the speech recognition. We conclude and point out future directions of research in Section 11.

Section snippets

Application: Isolated digit speech recognition

The isolated digit speech recognition application that will be implemented in hardware is organized as follows: a much used subset of the TI46 isolated digit corpus, consisting of 10 digits uttered 10 times by five female speakers was preprocessed using Lyon’s passive ear model (a model of the human inner ear) (Lyon, 1982) which generates 88 frequency channels. The multiple channels of this preprocessing step are converted to spikes using BSA (Schrauwen & Van Campenhout, 2003), a fast spike

Hardware-oriented RC design flow: RC Matlab toolbox

To generate the hardware in this work, we used the following rough guidelines on how to tackle an engineering problem using hardware Reservoir Computing. To do this, we use the RC toolbox presented in Verstraeten et al. (2007) which offers a user-friendly environment to do a thorough exploration of certain areas of the parameter space, and to investigate some optimal parameter settings in a software environment before making the transition to hardware. The following steps are advisable:

  • Generic

Neuron models

A multitude of biologically inspired neuron models exist (see Izhikevich (2004) for a good overview of different neuron models), ranging from complex compartmental structures using ion channel models, to the very simple integrate-and-fire model. Most of these neuron models can be approximated very well by means of the Spike Response Model (SRM) (Gerstner & Kistler, 2002): a neuron is modeled by a superposition of pre-synaptically induced time-dependent input ‘kernels’ (pulse response functions

Digital approximations of exponential decay

Exponential decay is the key ingredient for the implementation of the LIF-type membranes with Dirac and/or exponential synapse models mentioned above. Even short- and long-term adaptation rules such as Spike Timing Dependent Plasticity (Markram, Lübke, Frotscher, & Sakmann, 1997), Dynamic Synapses (Maass & Zador, 1999) and Intrinsic Plasticity (Triesch, 2004) are all based on exponential decay. In this section we will further investigate how we can efficiently and compactly implement

Overview of possible hardware implementations

We will now give an overview of the different possibilities to implement a SNN processor into digital hardware. This is split into several parts: how is the neuron PE implemented and how are the PEs interconnected.

Existing compact hardware architectures for SNNs

As was pointed out in the introduction, already various implementation architectures for SNNs have been investigated. In this section we briefly discuss two previously published architectures which we re-implemented. The first is an approach inspired on classic processor design where processing is serial and arithmetic is parallel; the second approach takes advantage of several FPGA specific properties: it uses distributed memory, parallel processing but with serial arithmetic to limit size. We

Multiple PEs, serial processing, serial arithmetic

Because both of the previously published architectures give much faster than real-time performance on the speech recognition task, they use more hardware than needed. We will now present a novel architecture for the processing of SNN that allows slower but scalable operations at a highly reduced hardware cost. The architecture processes all synapses serially as well as doing all arithmetic serially (SPSA). This results in a very small implementation of the PE (only 4 4-LUTs!) but in longer

Comparison

The three architectures have different area, memory and time scaling properties. An approximation of the requirements for the three different designs is given in Table 2, Table 3, Table 4. To compare the designs, Fig. 7 shows the number of 4-LUTs per PE, the memory (RAM and FF combined) usage and the number of clock cycles per time-step needed for each of the three architectures with respect to the number of inputs and the number of neurons. The others settings are I=12,N=200,B=10,S=1,T=2. The

System architecture

The speech recognition application needs more than just the simulation of the SNN. The incoming speech signal is preprocessed by the Lyon cochlear ear model, and the emitted spikes by all neurons in the reservoir are filtered by a first-order low-pass filter, down-sampled, and the 10 weighted sums for each of the readouts need to be performed. The result of this is further postprocessed to come to a single class for a given speech sample. This complete setup has been implemented on the ML401

Conclusions and future work

In this work we show that real-time speech recognition is possible on limited FPGA hardware using an LSM. To attain this we first explored existing hardware architectures (which we re-implemented and improved) for compact implementation of SNNs. These designs are however more than 200 times faster than real-time which is not desired because lots of hardware resources are spend on speed that is not needed. We present a novel hardware architecture based on serial processing of dendritic trees

Acknowledgements

The first author’s work is partially funded by the FWO Flanders project G.0317.05. David Verstraeten and Michiel D’Haene are sponsored by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen).

References (59)

  • Booij, O. (2004). Temporal pattern classification using spiking neural networks. Master’s Thesis, University of...
  • J.B. Burr

    Digital neural network implementations

  • A. Delorme et al.

    Face identification using one spike per neuron: Resistance to image degradations

    Neural Networks

    (2001)
  • D. Floreano et al.

    Evolutionary bits’n’spikes

  • D. Floreano et al.

    From wheels to wings with evolutionary spiking neurons

    Artificial Life

    (2004)
  • W. Gerstner et al.

    Spiking neuron models

    (2002)
  • Girau, B. (2000). Neural networks on FPGAs: A survey. In Proceeding of second ICSC symposium on neural...
  • Girau, B., & Torres-Huitzil, C. (2006). FPGA implementation of an integrate-and-fire LEGION model for image...
  • Glackin, B., McGinnity, T. M., Maguire, L. P., Wu, Q. X., Belatreche, A., (2005). A noval approach for the...
  • Grassmann, C., & Anlauf, J. (2002). RACER—A rapid prototyping accelerator for pulsed neural networks. In Proceedings of...
  • Hartmann, G., Frank, G., Schäfer, M., & Wolff, C. (1997). SPIKE128K—An accelerator for dynamic simulation of large...
  • Hellmich, H., Geike, M., Griep, P., Mahr, P., Rafanelli, M., & Klar, H. (2005). Emulation engine for spiking neurons...
  • E. Izhikevich

    Which model to use for cortical spiking neurons?

    IEEE Transactions on Neural Networks

    (2004)
  • Jaeger, H. (2002). Tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the echo state network...
  • H. Jaeger et al.

    Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless telecommunication

    Science

    (2004)
  • Jahnke, A., Roth, U., & Klar, H. (1995). Towards efficient hardware for spike-processing neural networks. In...
  • Jahnke, A., Roth, U., & Klar, H. (1996). A SIMD/dataflow architecture for a neurocomputer for spike-processing neural...
  • A. Jahnke et al.

    Simulation of spiking neural networks on different hardware platforms

  • R. Legenstein et al.

    New directions in statistical signal processing: From systems to brain

    (2005)
  • Cited by (130)

    • An analog electronic emulator of non-linear dynamics in optical microring resonators

      2021, Chaos, Solitons and Fractals
      Citation Excerpt :

      A possible approach to alleviating this challenge involves realizing an analog electronic emulator based on a surrogate system that shows dynamics closely resembling a microresonator, and is well-suited for scaling up to hundreds or even thousands of nodes. Knowingly, analog electronic implementations of neural networks have been investigated since the early 1990’s [33] and the topic remains active [34,35], as exemplified by a recent review focused on reservoir computing [36]. Other lines of work have attempted to establish correspondences between the emergent dynamics of coupled chaotic oscillators and neuroscience [37], and Tito Arecchi himself enthusiastically pioneered multidisciplinary explorations of control and synchronization across lasers and neural systems [38].

    View all citing articles on Scopus

    An abbreviated version of some portions of this article appeared in Schrauwen, D’Haene, Verstraeten, and Van Campenhout (2007) as part of the IJCNN 2007 Conference Proceedings, published under IEE copyright.

    View full text