2008 Special IssueCompact hardware liquid state machines on FPGA for real-time speech recognition☆
Introduction
Spiking Neural Networks (SNNs), neural network models that use spikes to communicate, (1) have been shown theoretically (Maass, 1997) and practically 1 (Booij, 2004, Schrauwen and Van Campenhout, 2006) to computationally outperform analog neural networks, (2) are biologically more plausible, (3) have an intrinsic temporal nature that can be used to solve temporal problems, and (4) are well suited be implemented on digital and analog hardware. SNNs have been applied with success to several applications such as face detection (Delorme & Thorpe, 2001), lipreading (Booij, 2004), speech recognition (Verstraeten, Schrauwen, Stroobandt, & Van Campenhout, 2005) and speaker identification (Verstraeten, 2004), autonomous robot control (Floreano et al., 2004, Roggen et al., 2003) and several UCI benchmarks (Bohte, Poutré, & Kok, 2002).
The main drawback of SNNs is that they are difficult to train in a supervised fashion mainly because the hard thresholding that is present in all simple spiking neuron models, makes the calculation of gradients very prone to errors which deteriorates the learning rule’s performance (Bohte et al., 2002, Schrauwen and Van Campenhout, 2004, Schrauwen and Van Campenhout, 2006). One way to circumvent this is by using fixed parameters. This is what is embodied by the Liquid State Machine (LSM) concept (Maass, Natschläger, & Markram, 2002) (which is conceptually identical to Echo State Networks (Jaeger & Haas, 2004) and which is generally termed Reservoir Computing (Verstraeten, Schrauwen, D’Haene, & Stroobandt, 2007)). Here a recurrent network of spiking neurons is constructed where all the network parameters (interconnection, weights, delays, etc.) are fixed and randomly chosen. This network, the so-called liquid or reservoir, typically exhibits complex non-linear dynamics in its high-dimensional internal state. This state is excited by the network input, and is expected to capture and expose the relevant information embedded in the latter. As is the case with kernel methods, it is possible to extract this information by processing these network states with simple linear techniques to obtain the actual regression or classification output.
Neural networks can be, and are often, implemented through simulation on sequential computers. This approach obviously limits the speed of operation of the network. Direct parallel hardware implementations of neural networks have been a fruitful research area due to their intrinsic parallel nature which allows very large speedups compared to sequential implementations. Several general overview publications on digital hardware implementations of neural networks have been published (Anguita and Valle, 2001, Aybay et al., 1996, Burr, 1991, Moerland and Fiesler, 1997, Schoenauer et al., 1998), some of them specific to implementations on Field Programmable Gate Arrays (FPGA, reconfigurable digital hardware components) (Girau, 2000, Omondi and Rajapakse, 2006, Zhu and Sutton, 2003), and specific for digital spiking neural networks (Jahnke et al., 1995, Jahnke et al., 1997, Preis et al., 2001, Schäfer et al., 2002), or neuromorphic analog VLSI (Smith & Hamilton, 1998). In this work we focus on digital implementations on FPGAs because they offer an intermediate between the programmability of classic processors and the parallel nature and high speed of ASICs. Also, FPGAs allow a much faster implementation cycle (minutes instead of months) and for a ‘small’ number of chips (currently less than 100,000!). FPGAs are cheaper than ASICs.
The main reasons for looking at spiking neuron hardware implementations is that: (1) information is represented in time instead of in space allowing one to use better the intrinsic high speed of digital hardware, (2) a spiking communication channel can carry more information than an analog (rate coded) communication channel (Softky, 1996) allowing for lower bandwidth interconnection links in hardware, (3) due to the on/off nature no multiplications have to be performed at the input of a neuron, and (4) a broad range of architectures are possible spanning a large continuum in design space which allows the user to make an area/speed trade-off depending on the application. Several digital implementations of SNNs have already been published, ranging from very simple neurons implemented on 8-bit microcontrollers (Floreano et al., 2002, Nielsen and Lund, 2003) to somewhat larger FPGA-based systems (Bellis et al., 2004, Girau and Torres-Huitzil, 2006, Pearson et al., 2005, Roggen et al., 2003, Upegui et al., 2005) and finally very large systems comprising several FPGAs or even ASICs (de Garis et al., 2000, Glackin et al., 2005, Grassmann and Anlauf, 2002, Hartmann et al., 1997, Hellmich et al., 2005, Jahnke et al., 1996, Mehrtash et al., 2003, Ros et al., 2005, Schoenauer et al., 1998, Waldemark et al., 2000). A taxonomy of these architectures is presented in Table 1. We subdivide the designs according to the simulation principle, hardware platform, the number of Processing Elements (PEs), and if they use time sharing of PEs or not (multiple neurons are processed on the same hardware).
The LSM architecture has many properties that are advantageous to hardware implementations: weights are fixed and chosen at random (high weight quantization noise can be taken into account a priori), and interconnection topology can be very sparse and even ‘small world’ (many local connections, few global ones) which allows easy wireability that matches the intrinsic FPGA wiring capabilities well. Also, for many LSM applications quite large networks of spiking neurons (up to 1000) need to be simulated with hard real-time constraints which is difficult in software or when using event-based simulation techniques. Multiple outputs can be generated from the same reservoir 2 allowing a generic hardware reservoir component that can operate on different applications and with multiple outputs. And due to the intrinsic LSM robustness even faulty FPGAs can be used or a higher yield of ASICs is possible (Schürmann, Meier, & Schemmel, 2005).
Recently a very convincing engineering application for the Liquid State Machine was presented: isolated spoken digit recognition (Verstraeten et al., 2005). When optimally tweaked (Verstraeten, Schrauwen, & Stroobandt, 2006), it can outperform state-of-the-art Hidden Markov Model based recognizers. The system is biologically motivated: a model of the human inner ear is used to preprocess the audio data, next an LSM is constructed with biologically correct settings and interconnection (Maass et al., 2002), and a simple linear classifier is used to perform the actual classification.
In this paper we present an application-oriented design flow for LSM-based hardware implementation. Real-time, single channel speech recognition with the lowest hardware cost is desired. To attain this goal we implement the speech task on two existing hardware architectures for SNNs: a design that processes synapses serially and which uses parallel arithmetic (Roggen et al., 2003, Upegui et al., 2005) and a design that processes the synapses in parallel, but does the arithmetic serially (Girau and Torres-Huitzil, 2006, Schrauwen and Van Campenhout, 2006). As we will show, these architectures are always much faster than real-time, and thus waste chip area. For all digital implementations in general, a broad range of different architectures is possible with very different properties with respect to chip area, memory usage and computing time. Most of the time the properties are contradictory and therefor a area/time trade-off has to be made. We present a new architecture that uses both serial synapse processing and serial arithmetic. Using this option we are able to process just fast enough for real-time with a very limited amount of hardware. Without much extra hardware cost this design allows to easily scale between a single PE which performs slow serial processing of the neurons to multiple PEs that each process part of the network at increased speed. The design space for hardware SNNs has thus been drastically enlarged. All presented designs were implemented at our lab and run on actual hardware. The LSM-hardware idea (but with threshold logic neurons) was previously already implemented in analog VLSI hardware in Schürmann et al. (2005).
All above-mentioned hardware implementations of SNNs only implement very simple Leaky Integrate and Fire (LIF) spiking neurons (or even the simpler Integrate and Fire). This is the simplest first-order or single-compartment spiking neuron model. It was theoretically shown by Maass (1997) that second-order models (which have a model for synaptic dynamics) are computationally superior to first-order spiking neurons, and the proof that SNNs are more powerful (in the sense of computational efficiency) than ANNs (Maass, 1997) is based on second-order SNNs. This is why we chose to include second-order neurons in all implementation architectures discussed in this work.
For all hardware architectures studied in this publication, network topologies can be automatically generated by the Matlab code which is part of the Reservoir Computing toolbox 3 presented in Verstraeten et al. (2007). This way, network structures can first be explored in software, and when a good topology is found, it can easily and automatically be exported to a structured hardware description which fits in an automated design flow.
This contribution, which is an extended version of a paper presented at IJCNN 2007 (Schrauwen et al., 2007), is structured as follows. We start in Sections 2 Application: Isolated digit speech recognition, 3 Hardware-oriented RC design flow: RC Matlab toolbox by presenting the LSM-based isolated digits speech recognition application, and a methodology to port the software application to hardware based on a freely available Matlab toolbox for RC simulations. Section 4 presents the class of neuron models that will be implemented in hardware, and Section 5 shows how these models can be efficiently approximated digitally in a time-step-based simulation. An overview of the different possible forms of parallelism and interconnection is given in Section 6. Three implementations, each with very specific spatial and temporal characteristics, are presented in Sections 7 Existing compact hardware architectures for SNNs, 8 Multiple PEs, serial processing, serial arithmetic. Section 9 compares their properties in terms of the important network and neuron parameters. Section 10 briefly gives an overview of the complete hardware system that is developed to perform the speech recognition. We conclude and point out future directions of research in Section 11.
Section snippets
Application: Isolated digit speech recognition
The isolated digit speech recognition application that will be implemented in hardware is organized as follows: a much used subset of the TI46 isolated digit corpus, consisting of 10 digits uttered 10 times by five female speakers was preprocessed using Lyon’s passive ear model (a model of the human inner ear) (Lyon, 1982) which generates 88 frequency channels. The multiple channels of this preprocessing step are converted to spikes using BSA (Schrauwen & Van Campenhout, 2003), a fast spike
Hardware-oriented RC design flow: RC Matlab toolbox
To generate the hardware in this work, we used the following rough guidelines on how to tackle an engineering problem using hardware Reservoir Computing. To do this, we use the RC toolbox presented in Verstraeten et al. (2007) which offers a user-friendly environment to do a thorough exploration of certain areas of the parameter space, and to investigate some optimal parameter settings in a software environment before making the transition to hardware. The following steps are advisable:
- •
Generic
Neuron models
A multitude of biologically inspired neuron models exist (see Izhikevich (2004) for a good overview of different neuron models), ranging from complex compartmental structures using ion channel models, to the very simple integrate-and-fire model. Most of these neuron models can be approximated very well by means of the Spike Response Model (SRM) (Gerstner & Kistler, 2002): a neuron is modeled by a superposition of pre-synaptically induced time-dependent input ‘kernels’ (pulse response functions
Digital approximations of exponential decay
Exponential decay is the key ingredient for the implementation of the LIF-type membranes with Dirac and/or exponential synapse models mentioned above. Even short- and long-term adaptation rules such as Spike Timing Dependent Plasticity (Markram, Lübke, Frotscher, & Sakmann, 1997), Dynamic Synapses (Maass & Zador, 1999) and Intrinsic Plasticity (Triesch, 2004) are all based on exponential decay. In this section we will further investigate how we can efficiently and compactly implement
Overview of possible hardware implementations
We will now give an overview of the different possibilities to implement a SNN processor into digital hardware. This is split into several parts: how is the neuron PE implemented and how are the PEs interconnected.
Existing compact hardware architectures for SNNs
As was pointed out in the introduction, already various implementation architectures for SNNs have been investigated. In this section we briefly discuss two previously published architectures which we re-implemented. The first is an approach inspired on classic processor design where processing is serial and arithmetic is parallel; the second approach takes advantage of several FPGA specific properties: it uses distributed memory, parallel processing but with serial arithmetic to limit size. We
Multiple PEs, serial processing, serial arithmetic
Because both of the previously published architectures give much faster than real-time performance on the speech recognition task, they use more hardware than needed. We will now present a novel architecture for the processing of SNN that allows slower but scalable operations at a highly reduced hardware cost. The architecture processes all synapses serially as well as doing all arithmetic serially (SPSA). This results in a very small implementation of the PE (only 4 4-LUTs!) but in longer
Comparison
The three architectures have different area, memory and time scaling properties. An approximation of the requirements for the three different designs is given in Table 2, Table 3, Table 4. To compare the designs, Fig. 7 shows the number of 4-LUTs per PE, the memory (RAM and FF combined) usage and the number of clock cycles per time-step needed for each of the three architectures with respect to the number of inputs and the number of neurons. The others settings are . The
System architecture
The speech recognition application needs more than just the simulation of the SNN. The incoming speech signal is preprocessed by the Lyon cochlear ear model, and the emitted spikes by all neurons in the reservoir are filtered by a first-order low-pass filter, down-sampled, and the 10 weighted sums for each of the readouts need to be performed. The result of this is further postprocessed to come to a single class for a given speech sample. This complete setup has been implemented on the ML401
Conclusions and future work
In this work we show that real-time speech recognition is possible on limited FPGA hardware using an LSM. To attain this we first explored existing hardware architectures (which we re-implemented and improved) for compact implementation of SNNs. These designs are however more than 200 times faster than real-time which is not desired because lots of hardware resources are spend on speed that is not needed. We present a novel hardware architecture based on serial processing of dendritic trees
Acknowledgements
The first author’s work is partially funded by the FWO Flanders project G.0317.05. David Verstraeten and Michiel D’Haene are sponsored by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen).
References (59)
- et al.
Error-backpropagation in temporally encoded networks of spiking neurons
Neurocomputing
(2002) - et al.
Building an artificial brain using an FPGA based CAM-Brain Machine
Applied Mathematics and Computation
(2000) Networks of spiking neurons: The third generation of neural network models
Neural Networks
(1997)Fine analog coding minimizes information transmission
Neural Networks
(1996)- et al.
An FPGA platform for on-line topology exploration of spiking neural networks
Microprocessors and Microsystems
(2005) - et al.
A unifying comparison of reservoir computing methods
Neural Networks
(2007) - et al.
Isolated word recognition with the liquid state machine: A case study
Information Processing Letters
(2005) - Anguita, D., & Valle, M. (2001). Perspectives on dedicated hardware implementations. In: Proceedings of the European...
- et al.
Classification of neural network hardware
- Bellis, S., Razeeb, K. M., Saha, C., Delaney, K., O’Mathuna, C., & Pounds-Cornish, A., et al. (2004). FPGA...
Digital neural network implementations
Face identification using one spike per neuron: Resistance to image degradations
Neural Networks
Evolutionary bits’n’spikes
From wheels to wings with evolutionary spiking neurons
Artificial Life
Spiking neuron models
Which model to use for cortical spiking neurons?
IEEE Transactions on Neural Networks
Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless telecommunication
Science
Simulation of spiking neural networks on different hardware platforms
New directions in statistical signal processing: From systems to brain
Cited by (130)
An analog electronic emulator of non-linear dynamics in optical microring resonators
2021, Chaos, Solitons and FractalsCitation Excerpt :A possible approach to alleviating this challenge involves realizing an analog electronic emulator based on a surrogate system that shows dynamics closely resembling a microresonator, and is well-suited for scaling up to hundreds or even thousands of nodes. Knowingly, analog electronic implementations of neural networks have been investigated since the early 1990’s [33] and the topic remains active [34,35], as exemplified by a recent review focused on reservoir computing [36]. Other lines of work have attempted to establish correspondences between the emergent dynamics of coupled chaotic oscillators and neuroscience [37], and Tito Arecchi himself enthusiastically pioneered multidisciplinary explorations of control and synchronization across lasers and neural systems [38].
Content-aware convolutional neural networks
2021, Neural NetworksAn Efficient FrWT and IPCA Tools for an Automated Healthcare CAD System
2023, Wireless Personal CommunicationsMachine learning in TCM with natural products and molecules: current status and future perspectives
2023, Chinese Medicine (United Kingdom)
- ☆
An abbreviated version of some portions of this article appeared in Schrauwen, D’Haene, Verstraeten, and Van Campenhout (2007) as part of the IJCNN 2007 Conference Proceedings, published under IEE copyright.