Echo State Networks

What are Echo State Networks?

Echo State Networks (ESNs) are a kind of Recurrent Neural Network with a sparsely hidden layer (it usually has less than 10% connectivity). They essentially give an architecture and a supervised learning principle for RNNs and are part of the reservoir computing framework.

The connectivity and weights of the hidden layer’s (reservoir) neurons are fixed (not trainable) and they are randomly assigned. The output neurons have weights that are trainable and can be learned so that the network is capable to produce or reproduce particular temporal patterns.

The reservoir architecture creates a nonlinear embedding of the input. This is connected to the output that is needed and then the final weights are capable of being trained.

So, the aim of Echo State Networks is to drive a big, random, fixed RNN with the input signal, thus inducing a nonlinear response signal in every neuron in the reservoir and connect it to a desired output signal using a trainable linear combination of all of the response signals.

The main concept behind Echo State Networks is rather closely tied to Liquid State Machines (LSMs), which were independently and simultaneously developed along with echo state networks by Wolfgang Maass. Liquid state machines, echo state networks, and the newly researched Backpropagation Decorrelation learning rule for recurrent neural networks are widely summarized under the umbrella of reservoir computing.

echo state network (ESN) — Source: ResearchGate

‍

What is the Echo State Property?

For the Echo State Network to work, the reservoir needs to have the Echo State Property (ESP). This property relates asymptotic properties of the excited reservoir dynamics to the driving signal.

The property states that the reservoir asymptotically washes out any information from initial conditions.

This property is guaranteed for additive-sigmoid neuron reservoirs, when the reservoir weight matrix and the leaking rates fulfill specific algebraic conditions in terms of singular values.

In reservoirs that have a tanh sigmoid, the Echo State Property gets violated for zero input if the spectral radius of the reservoir weight matrix is more than unity. The converse also holds true. If for any input if this spectral radius is less than unity, the ESP is granted.

‍

Why should you use Echo State Networks?

Echo State Networks do not suffer from the vanishing/exploding gradient problem which causes the parameters in the hidden layers either to not change much or lead to numeric instability and chaotic behavior.

While traditional neural networks are computationally expensive, ESNs tend to be fast due to the lack of a backpropagation phase on the reservoir.

Echo State Networks are effective at handling chaotic time series and are not disrupted by bifurcations, unlike traditional neural networks.

Before echo state networks were introduced, recurrent neural networks were hardly ever used in practice. This was due to the complexity involved in adjusting their connections due to the lack of autodifferentiation and susceptibility to vanishing and exploding gradients, etc.

RNN algorithms used to work very slowly and were often vulnerable to issues, such as branching errors. Because of these issues, convergence could not be guaranteed. But Echo state network training does not have a problem with branching and is easy to implement. In early studies, echo state networks even demonstated that they could perform rather well on time series prediction tasks from synthetic datasets.

‍

How do echo state networks work?

The echo state network makes use of a very sparsely connected hidden layer (that usually has 1% connectivity). The connectivity and weights of hidden neurons are fixed and are assigned on a random basis. The weights of output neurons can be learned, enabling the network to produce or reproduce specific temporal patterns. The most interesting part of this network is that in spite of its behaviour being non-linear, the only weights that end up getting modified during the training processes are for the synapses that connect the hidden neurons to output neurons. Because of this, he error function is quadratic with respect to the parameter vector and it is very easy to differentiate an echo state network from a linear system.

The main approach that an echo state network follows while working is to first operate a random, large, fixed, recurring neural network with the input signal, thus inducing a nonlinear response signal in each neuron within this "reservoir" network, and next to connect a desired output signal by a trainable linear combination of all these response signals.

Echo state networks also boast autonomous operation in prediction. If you train he Echo State Network with an input that is a backshifted version of the output, then you can used it for the purpose of signal generation/prediction by making use of the previous output as input.

What are the variants of echo state networks?

There are several ways through which echo state networks can be built. You can set them up with or without directly trainable input-to-output connections, with or without output reservation feedback, with different neurotypes, different reservoir internal connectivity patterns, etc. You can calculate the output weight for linear regression with all algorithms whether they are online or offline. Along with the solutions for errors with smallest squares, margin maximization criteria, training support vector machines, are used to determine the output values.

Other variants of echo state networks aim to alter the formulation to better match common models of physical systems like those typically defined by differential equations. Work in this direction even includes echo state networks which partially include physical models, hybrid echo state networks, as well as continuous-time echo state networks.

The fixed recurrent neural network serves as a random, nonlinear medium whose dynamic response, the "echo", is used as a signal base. It would even be possible to train the linear combination of this base to reconstruct the desired output by means of minimizing some error criteria.