Re-visiting the echo state property

doi:10.1016/j.neunet.2012.07.005

Neural Networks

Volume 35, November 2012, Pages 1-9

https://doi.org/10.1016/j.neunet.2012.07.005 Get rights and content

Abstract

An echo state network (ESN) consists of a large, randomly connected neural network, the reservoir, which is driven by an input signal and projects to output units. During training, only the connections from the reservoir to these output units are learned. A key requisite for output-only training is the echo state property (ESP), which means that the effect of initial conditions should vanish as time passes. In this paper, we use analytical examples to show that a widely used criterion for the ESP, the spectral radius of the weight matrix being smaller than unity, is not sufficient to satisfy the echo state property. We obtain these examples by investigating local bifurcation properties of the standard ESNs. Moreover, we provide new sufficient conditions for the echo state property of standard sigmoid and leaky integrator ESNs. We furthermore suggest an improved technical definition of the echo state property, and discuss what practicians should (and should not) observe when they optimize their reservoirs for specific tasks.

Highlights

► A widely used criterion is shown to be insufficient for the echo state property. ► Novel algebraic conditions are provided for the echo state property. ► Users can benefit from the simple recipes provided for the echo state property. ► A new definition for the echo state property is provided. ► The scaling of the spectral radius is discussed for end-users.

Introduction

Echo state networks (ESN) Jaeger (2001) and Jaeger and Haas (2004) provide an architecture and supervised learning principle for recurrent neural networks (RNNs). The main idea is (i) to drive a random, large, fixed recurrent neural network with the input signal, thereby inducing in each neuron within this “reservoir” network a nonlinear response signal, and (ii) combine a desired output signal by a trainable linear combination of all of these response signals. The internal weights of the underlying reservoir network are not changed by the learning; only the reservoir-to-output connections are trained.

This basic functional principle is shared with Liquid State Machines (LSM), which were developed independently from and simultaneously with ESNs by Maass, Natschläger, and Markram (2002). An earlier precursor is a biological neural learning mechanism investigated by Peter F. Dominey in the context of modeling sequence processing in mammalian brains (Dominey, 1995). Increasingly often, LSMs, ESNs and some other related methods are subsumed under the name of reservoir computing (introduction: Jaeger (2007), survey of current trends: Lukoševičius and Jaeger (2009)). Today, reservoir computing has established itself as one of the standard approaches to supervised RNN training.

A crucial, enabling precondition for ESN learning algorithms to function is that the underlying reservoir network possesses the echo state property (ESP). Roughly speaking, the ESP is a condition of asymptotic state convergence of the reservoir network, under the influence of driving input. The ESP is connected to algebraic properties of the reservoir weight matrix, and to properties of the driving input. It is a rather subtle mathematical concept. Often the ESP is violated if the spectral radius of the weight matrix exceeds unity. Conversely, under rather general conditions, the ESP is obtained most of the time when the spectral radius is smaller than unity. This combination of facts has led to a widespread misconception that all one has to observe in order to obtain the ESP is to scale the reservoir weight matrix to a spectral radius below unity. We witness that a significant fraction–even a majority–of “end-users” of reservoir computing fall prey to this misconception. In fact, neither does a spectral radius below unity generally ensure the ESP, nor does a spectral radius above unity generally destroy it. In numerous applications–depending on the nature of the driving input and on the nature of the desired readout signal–a spectral radius well above unity serves best. The widespread practice of scaling the spectral radius to below unity thus leads to an under-exploitation of the learning and modeling capacities of reservoirs.

Here we re-visit the ESP, with the general aim to illuminate this concept from several sides for the practical benefit of reservoir computing practice. Besides this didactic goal, the technical contribution of this article is twofold. First, after summarizing the standard formalism and ESP definition in Section 2, we present a bifurcation analysis to show in detail how the ESP can be lost even for spectral radii below unity (Section 3). Second, we derive a new, convenient-to-use formulation of a sufficient algebraic criterion for the ESP (Section 4). Then, in Section 5, we comment on situations where the ESP is obtained for spectral radii exceeding unity, which are of significant practical importance. We conclude with a short appreciation of the entire subject in a final discussion section.

Section snippets

Echo state networks

In this section we define the standard ESN and the echo state property.

The standard discrete-time ESN, which we denote shortly by $x_{k + 1} = F (x_{k}, x_{k}^{o u t}, u_{k + 1})$ , is defined as follows: $x_{k + 1} = f (W x_{k} + W^{i n} u_{k + 1} + W^{f b} x_{k}^{o u t}),$ $x_{k}^{o u t} = g (W^{o u t} [x_{k}; u_{k}]),$ where $W \in R^{N \times N}$ is the internal weight matrix or the reservoir, $W^{i n} \in R^{N \times K}$ is the input matrix, $W^{f b} \in R^{N \times L}$ is the feedback matrix, $W^{o u t} \in R^{L \times (N + K)}$ is the output matrix and $x_{k} \in R^{N \times 1}$ , $u_{k} \in R^{K \times 1}$ and $x_{k}^{o u t} \in R^{L \times 1}$ are the internal, input and output vectors at time $k$ , respectively (see Fig. 1

Bifurcations in 2-dim echo state networks

Here we investigate ESNs with internal weight matrix $W$ and a spectral radius $ρ (W) < 1$ where the network does not have the echo state property. We will constrain our analysis to the constant zero-input case, that is, $U = {0}$ , because this basic case supports the present arguments already. In other words, we are interested in $W$ matrices with $ρ (W) < 1$ for which the system $x_{k + 1} = f (W x_{k})$ does not have the echo state property. In particular, we investigate some bifurcation types which yield systems with

New sufficient conditions for the echo state property

In this section we provide sufficient conditions for the echo state property of the standard and leaky integrator ESNs. These sufficient conditions are important because, in practice, less restrictive conditions are typically used which do not guarantee the echo state property. In the standard ESNs (Eq. (1)), one usually samples a random internal weight matrix $W$ with a subsequent scaling of the connectivity matrix $W$ to ensure that its spectral radius is less than unity, $ρ (W) < 1$ . In Section 3, we

A new definition for the echo state property

So far, we have investigated how the ESP can be lost for a spectral radius below unity, in the case of zero input. Furthermore, for zero input a spectral radius not exceeding unity is a necessary condition for the ESP. Obviously in practical applications one will usually have nonzero input. For nonzero input, the ESP can be obtained even with a spectral radius exceeding unity. A very simple example is the one-dimensional “reservoir” $x_{k + 1} = tan h (2 x_{k} + 1)$ driven by constant input $u_{k} \equiv 1$ . A quick look

Discussion

In this article we discussed, from various angles, the echo state property (ESP) and the closely related issue of the spectral radius of the reservoir weight matrix. The main technical contribution is a detailed analysis how the ESP is lost for specific weight patterns even when the spectral radius is below unity. Furthermore, we provided a novel algebraic criterion which is sufficient for the ESP for any input in reservoirs whose nonlinearity has a derivative bounded in [−1, 1] (such as $tan h$

References (24)

Amit Bhaya et al.
On discrete-time diagonal and $d$ -stability
Linear Algebra and its Applications
(1993)
H. Jaeger et al.
Optimization and applications of echo state networks with leaky-integrator neurons
Neural Networks
(2007)
C.G. Langton
Computation at the edge of chaos: phase transitions and emergent computation
Physica D
(1990)
Mantas Lukoševičius et al.
Reservoir computing approaches to recurrent neural network training
Computer Science Review
(2009)
A. Packard et al.
The complex structured singular value
Automatica
(1993)
R. Pascanu et al.
A neurodynamical model for working memory
Neural Networks
(2011)
N. Bertschinger et al.
Real-time computation at the edge of chaos in recurrent neural networks
Neural Computation
(2004)
Nils Bertschinger et al.
At the edge of chaos: real-time computations and self-organized criticality in recurrent neural networks
L. Büsing et al.
Connectivity, dynamics, and memory in reservoir computing with binary and analog neurons
Neural Computation
(2010)
M. Buehner et al.
A tighter bound for the echo state property
IEEE Transactions on Neural Networks
(2006)

P.F. Dominey

Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning

Biological Cybernetics

(1995)

S. Hu et al.

Global stability of a class of discrete-time recurrent neural networks

IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications

(2002)

Cited by (407)

Euler State Networks: Non-dissipative Reservoir Computing
2024, Neurocomputing
Inspired by the numerical solution of ordinary differential equations, in this paper, we propose a novel Reservoir Computing (RC) model, called the Euler State Network (EuSN). The presented approach makes use of forward Euler discretization and antisymmetric recurrent matrices to design reservoir dynamics that are both stable and non-dissipative by construction.
Our mathematical analysis shows that the resulting model is biased towards a unitary effective spectral radius and zero local Lyapunov exponents, intrinsically operating near the edge of stability. Experiments on long-term memory tasks show the clear superiority of the proposed approach over standard RC models in problems requiring effective propagation of input information over multiple time steps. Furthermore, results on time-series classification benchmarks indicate that EuSN can match (or even exceed) the accuracy of trainable Recurrent Neural Networks, while retaining the training efficiency of the RC family, resulting in up to $\approx 464$ -fold savings in computation time and $\approx 1750$ -fold savings in energy consumption. At the same time, our results on time-series modeling tasks show competitive results against standard RC when the architecture is complemented by direct input-readout connections.
A stochastic optimization technique for hyperparameter tuning in reservoir computing
2024, Neurocomputing
This paper presents a new approach to reservoir computing (RC) optimization. Reservoir computing (RC) is a framework for building recurrent neural networks (RNNs) that alleviates the well-known learning difficulties in such neural networks by training only the output layer. While the weights of the input and the nonlinear hidden layers are randomly generated, the adjustment of critical hyperparameters, such as the input and the feedback scaling factor, is essential for optimal performance in RC. While recent hardware implementations of RC are crucial for high-speed processing, standard gradient-based hyperparameter optimization is often irrelevant due to potentially uncertain or time-varying internal functions and parameters. In this work, we propose and analyze a stochastic optimization approach using gradient approximations based solely on noisy measurements of the loss function to circumvent this problem. Our numerical and experimental results confirm that the proposed method can provide near-optimal RC hyperparameters with substantial complexity reduction compared to competing methods, validating its potential for RC optimization.
Multi-reservoir echo state network with five-elements cycle
2024, Information Sciences
Echo state network (ESN) is reservoir computing model that effectively replace recurrent neural network (RNN). However, building reservoir in traditional ESN often has randomness, making it difficult to effectively determine the reservoir that matches a given task. Therefore, a multi-reservoir ESN based on five-elements cycle (FEC-MRESN) is proposed to design the reservoir automatically in this paper. First, FEC-MRESN designs a pruning algorithm using a top-down strategy to remove redundant neurons from the reservoir based on the generation and restriction relations between elements in the five-elements cycle. Second, based on the reservoir neurons retained by the pruning algorithm, an exponential weight assignment method is studied to achieve deterministic assignment of reservoir weights. Finally, FEC-MRESN is tested on some time series benchmark datasets. The experimental results show that FEC-MRESN not only improves prediction accuracy, but also removes redundant reservoir neurons, improves network generalization performance and training efficiency.
Echo state network structure optimization algorithm based on correlation analysis
2024, Applied Soft Computing
Echo State Network (ESN) is an effective variant of Recurrent Neural Network (RNN). However, it is difficult for traditional ESN to determine the reservoir size that matches a given task. In this paper, an ESN structure optimization pruning algorithm based on correlation analysis, called PCESN, is proposed to design the size of the reservoir automatically. First, a characteristic matrix is constructed utilizing probability theory and information theory to measure the correlation between each neuron in the reservoir and the output neuron. On this basis, a pruning criterion is proposed to achieve a sparse reservoir structure by dynamically removing neurons with low correlation. Second, in order to retain the sample information of the neurons in the removed reservoir during the network reduction, the input weights of the retained reservoir are updated by means of the average transverse propagation of the weights. Finally, the performance of PCESN is tested on multiple time series. Simulation results show that the proposed PCESN outperforms some fixed size ESN and other dynamic ESN in terms of prediction accuracy, generalization performance and model complexity.
Reservoir computing with error correction: Long-term behaviors of stochastic dynamical systems
2023, Physica D: Nonlinear Phenomena
The prediction of stochastic dynamical systems and the capture of dynamical behaviors are profound problems. In this article, we propose a data-driven framework combining Reservoir Computing and Normalizing Flow to study this issue, which mimics error modeling to improve traditional Reservoir Computing performance and integrates the virtues of both approaches. With few assumptions about the underlying stochastic dynamical systems, this model-free method successfully predicts the long-term evolution of stochastic dynamical systems and replicates dynamical behaviors. We verify the effectiveness of the proposed framework in several experiments, including the stochastic Van der Pal oscillator, El Niño-Southern Oscillation simplified model, and stochastic Lorenz system. These experiments consist of Markov/non-Markov and stationary/non-stationary stochastic processes, which are defined by linear/nonlinear stochastic differential equations or stochastic delay differential equations. Additionally, we explore the noise-induced tipping phenomenon, relaxation oscillation, stochastic mixed-mode oscillation, and replication of the strange attractor.
Continual adaptation of federated reservoirs in pervasive environments
2023, Neurocomputing
When performing learning tasks in pervasive environments, the main challenge arises from the need of combining federated and continual settings. The former comes from the massive distribution of devices with privacy-regulated data. The latter is required by the low resources of the participating devices, which may retain data for short periods of time. In this paper, we propose a setup for learning with Echo State Networks (ESNs) in pervasive environments. Our proposal focuses on the use of Intrinsic Plasticity (IP), a gradient-based method for adapting the reservoir’s non-linearity. First, we extend the objective function of IP to include the uncertainty arising from the distribution of the data over space and time. Then, we propose Federated Intrinsic Plasticity (FedIP), which is intended for client–server federated topologies with stationary data, and adapts the learning scheme provided by Federated Averaging (FedAvg) to include the learning rule of IP. Finally, we further extend this algorithm for learning to Federated Continual Intrinsic Plasticity (FedCLIP) to equip clients with CL strategies for dealing with continuous data streams. We evaluate our approach on an incremental setup built upon real-world datasets from human monitoring, where we tune the complexity of the scenario in terms of the distribution of the data over space and time. Results show that both our algorithms improve the representation capabilities and the performance of the ESN, while being robust to catastrophic forgetting.

View all citing articles on Scopus

¹: Tel.: +49 421 200 3215; fax: +49 421 200 493215.

²: Tel.: +49 341 9940 2435; fax: +49 341 9940 2221.

View full text

Re-visiting the echo state property

Abstract

Highlights

Introduction

Section snippets

Echo state networks

Bifurcations in 2-dim echo state networks

New sufficient conditions for the echo state property

A new definition for the echo state property

Discussion

Linear Algebra and its Applications

Neural Networks

Physica D

Computer Science Review

Automatica

Neural Networks

Real-time computation at the edge of chaos in recurrent neural networks

Neural Computation

At the edge of chaos: real-time computations and self-organized criticality in recurrent neural networks

Connectivity, dynamics, and memory in reservoir computing with binary and analog neurons

Neural Computation

A tighter bound for the echo state property

IEEE Transactions on Neural Networks

Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning

Biological Cybernetics

Global stability of a class of discrete-time recurrent neural networks

IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications