Re-visiting the echo state property
Highlights
► A widely used criterion is shown to be insufficient for the echo state property. ► Novel algebraic conditions are provided for the echo state property. ► Users can benefit from the simple recipes provided for the echo state property. ► A new definition for the echo state property is provided. ► The scaling of the spectral radius is discussed for end-users.
Introduction
Echo state networks (ESN) Jaeger (2001) and Jaeger and Haas (2004) provide an architecture and supervised learning principle for recurrent neural networks (RNNs). The main idea is (i) to drive a random, large, fixed recurrent neural network with the input signal, thereby inducing in each neuron within this “reservoir” network a nonlinear response signal, and (ii) combine a desired output signal by a trainable linear combination of all of these response signals. The internal weights of the underlying reservoir network are not changed by the learning; only the reservoir-to-output connections are trained.
This basic functional principle is shared with Liquid State Machines (LSM), which were developed independently from and simultaneously with ESNs by Maass, Natschläger, and Markram (2002). An earlier precursor is a biological neural learning mechanism investigated by Peter F. Dominey in the context of modeling sequence processing in mammalian brains (Dominey, 1995). Increasingly often, LSMs, ESNs and some other related methods are subsumed under the name of reservoir computing (introduction: Jaeger (2007), survey of current trends: Lukoševičius and Jaeger (2009)). Today, reservoir computing has established itself as one of the standard approaches to supervised RNN training.
A crucial, enabling precondition for ESN learning algorithms to function is that the underlying reservoir network possesses the echo state property (ESP). Roughly speaking, the ESP is a condition of asymptotic state convergence of the reservoir network, under the influence of driving input. The ESP is connected to algebraic properties of the reservoir weight matrix, and to properties of the driving input. It is a rather subtle mathematical concept. Often the ESP is violated if the spectral radius of the weight matrix exceeds unity. Conversely, under rather general conditions, the ESP is obtained most of the time when the spectral radius is smaller than unity. This combination of facts has led to a widespread misconception that all one has to observe in order to obtain the ESP is to scale the reservoir weight matrix to a spectral radius below unity. We witness that a significant fraction–even a majority–of “end-users” of reservoir computing fall prey to this misconception. In fact, neither does a spectral radius below unity generally ensure the ESP, nor does a spectral radius above unity generally destroy it. In numerous applications–depending on the nature of the driving input and on the nature of the desired readout signal–a spectral radius well above unity serves best. The widespread practice of scaling the spectral radius to below unity thus leads to an under-exploitation of the learning and modeling capacities of reservoirs.
Here we re-visit the ESP, with the general aim to illuminate this concept from several sides for the practical benefit of reservoir computing practice. Besides this didactic goal, the technical contribution of this article is twofold. First, after summarizing the standard formalism and ESP definition in Section 2, we present a bifurcation analysis to show in detail how the ESP can be lost even for spectral radii below unity (Section 3). Second, we derive a new, convenient-to-use formulation of a sufficient algebraic criterion for the ESP (Section 4). Then, in Section 5, we comment on situations where the ESP is obtained for spectral radii exceeding unity, which are of significant practical importance. We conclude with a short appreciation of the entire subject in a final discussion section.
Section snippets
Echo state networks
In this section we define the standard ESN and the echo state property.
The standard discrete-time ESN, which we denote shortly by , is defined as follows: where is the internal weight matrix or the reservoir, is the input matrix, is the feedback matrix, is the output matrix and , and are the internal, input and output vectors at time , respectively (see Fig. 1
Bifurcations in 2-dim echo state networks
Here we investigate ESNs with internal weight matrix and a spectral radius where the network does not have the echo state property. We will constrain our analysis to the constant zero-input case, that is, , because this basic case supports the present arguments already. In other words, we are interested in matrices with for which the system does not have the echo state property. In particular, we investigate some bifurcation types which yield systems with
New sufficient conditions for the echo state property
In this section we provide sufficient conditions for the echo state property of the standard and leaky integrator ESNs. These sufficient conditions are important because, in practice, less restrictive conditions are typically used which do not guarantee the echo state property. In the standard ESNs (Eq. (1)), one usually samples a random internal weight matrix with a subsequent scaling of the connectivity matrix to ensure that its spectral radius is less than unity, . In Section 3, we
A new definition for the echo state property
So far, we have investigated how the ESP can be lost for a spectral radius below unity, in the case of zero input. Furthermore, for zero input a spectral radius not exceeding unity is a necessary condition for the ESP. Obviously in practical applications one will usually have nonzero input. For nonzero input, the ESP can be obtained even with a spectral radius exceeding unity. A very simple example is the one-dimensional “reservoir” driven by constant input . A quick look
Discussion
In this article we discussed, from various angles, the echo state property (ESP) and the closely related issue of the spectral radius of the reservoir weight matrix. The main technical contribution is a detailed analysis how the ESP is lost for specific weight patterns even when the spectral radius is below unity. Furthermore, we provided a novel algebraic criterion which is sufficient for the ESP for any input in reservoirs whose nonlinearity has a derivative bounded in [−1, 1] (such as
References (24)
- et al.
On discrete-time diagonal and -stability
Linear Algebra and its Applications
(1993) - et al.
Optimization and applications of echo state networks with leaky-integrator neurons
Neural Networks
(2007) Computation at the edge of chaos: phase transitions and emergent computation
Physica D
(1990)- et al.
Reservoir computing approaches to recurrent neural network training
Computer Science Review
(2009) - et al.
The complex structured singular value
Automatica
(1993) - et al.
A neurodynamical model for working memory
Neural Networks
(2011) - et al.
Real-time computation at the edge of chaos in recurrent neural networks
Neural Computation
(2004) - et al.
At the edge of chaos: real-time computations and self-organized criticality in recurrent neural networks
- et al.
Connectivity, dynamics, and memory in reservoir computing with binary and analog neurons
Neural Computation
(2010) - et al.
A tighter bound for the echo state property
IEEE Transactions on Neural Networks
(2006)
Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning
Biological Cybernetics
Global stability of a class of discrete-time recurrent neural networks
IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications
Cited by (407)
Euler State Networks: Non-dissipative Reservoir Computing
2024, NeurocomputingMulti-reservoir echo state network with five-elements cycle
2024, Information SciencesEcho state network structure optimization algorithm based on correlation analysis
2024, Applied Soft ComputingReservoir computing with error correction: Long-term behaviors of stochastic dynamical systems
2023, Physica D: Nonlinear PhenomenaContinual adaptation of federated reservoirs in pervasive environments
2023, Neurocomputing