Embedded chimera states in recurrent neural networks
Chimera state
In order to train a RNN to display a chimera, we first selected a suitable supervisor. Following ref. 4, we used a system of Kuramoto oscillators consisting of two coupled populations of n = 3 identical oscillators (Fig. 1, Methods). Each oscillator is coupled with strength μ within the subpopulation and ν between subpopulations, with μ > ν (Fig. 1B). By using chimera-like initial conditions (see Supplementary Fig. 1), a chimera state is easily achieved where one sub-population synchronizes, while the other is asynchronously oscillating (schematically represented in Fig. 1A by six coupled metronomes.) The phases generated by the oscillators, θi(t) (orange/synchronous population) and ϕi(t) (purple/asynchronous population), will be critical for constructing the supervisor for the RNN (Fig. 1C–E and Supplementary Movie 1).
A Schematic representation of the chimera state: each metronome represents an oscillator with phase θi (orange or light gray for b/w printing, synchronous group) or ϕi (purple or dark gray for b/w printing, asynchronous group). B Diagram showing the coupling scheme of the two groups (n = 3): oscillators are coupled with strength μ within the same group and ν between groups, note that μ > ν. C Time-series of the 6 different oscillator’s phases θi (orange) and ϕi (purple). D Snapshot of phases θi and ϕi for the 6 oscillators at t = 4500. E Same as panel D in polar coordinates. F Mean phase velocity profile. G Trajectories of the order parameter for both groups.
Prior to RNN training, we confirmed the simulated network was indeed displaying a chimera with two well-known measures: the mean phase velocity42, which captures the frequency of the oscillators for a given time period and the order parameter R2,4, which characterizes the synchronization of the system: for fully synchronized systems R = 1 and for asynchronous systems 0 < R < 1 (Methods). The mean phase velocity (Fig. 1F) shows distinct profiles for the two groups and with two different regimes: nodes from the asynchronous group oscillate faster than nodes from the synchronized group. The order parameter (Fig. 1G) also confirms distinct behaviors for the asynchronous group [0.4 < R(t) < 0.8] and for the synchronous group [R(t) = 1]. Thus, the two-population Kuramoto network displays a chimera state which will become the supervisor for the RNN.
Recurrent neural networks can be trained to display chimeras
With the chimera solution in hand, we investigated if it could serve as a m dimensional supervisor to train a RNN to collectively display a chimera. First, as the oscillators’ phases are discontinuous, and wrapped around the interval [0, 2π), the unit circle transformations (\(\cos {\theta }_{i}\), \(\sin {\theta }_{i}\)), and (\(\cos {\phi }_{i}\), \(\sin {\phi }_{i}\)) were applied to the phases. With 2n oscillators, this results in a m = 4n dimensional supervisor for a RNN (throughout the paper we use n = 3 and n = 25). We use the FORCE method30 to train a RNN to autonomously mimic the time series produced by the supervisor. The training is considered successful if the network alone, unguided by the supervisor, can produce an identical chimera.
The RNN is a N autonomous dynamical system:
$$\tau {{{{{{{\boldsymbol{z}}}}}}}}^{\prime} =-{{{{{{{\boldsymbol{z}}}}}}}}+{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}{{{{{{{\boldsymbol{r}}}}}}}}$$
(1)
$${{{{{{{\boldsymbol{r}}}}}}}}=\tanh ({{{{{{{\boldsymbol{z}}}}}}}}).$$
(2)
The network is initially sparsely connected with a set of static weights ω0 which initiates the neuron’s currents zi(t) into a high dimensional chaotic regime43,44,45 (Fig. 2A). During learning a low-rank perturbation ηd⊤ is added to the weight matrix ω0, where η and d are both N × m (Fig. 2B). The matrix d is used to decode the output of the network, with \({{{{{{{{\bf{d}}}}}}}}}^{\top }{{{{{{{\boldsymbol{r}}}}}}}}=\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\) (Fig. 2C). This output is simultaneously fed back into the network:
$$\tau {{{{{{{\boldsymbol{z}}}}}}}}^{\prime} ={{{{{{{\boldsymbol{z}}}}}}}}+({{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}+{{{{{{{\boldsymbol{\eta }}}}}}}}{{{{{{{{\bf{d}}}}}}}}}^{\top }){{{{{{{\boldsymbol{r}}}}}}}}={{{{{{{\boldsymbol{z}}}}}}}}+{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}{{{{{{{\boldsymbol{r}}}}}}}}+{{{{{{{\boldsymbol{\eta }}}}}}}}\hat{{{{{{{{\boldsymbol{s}}}}}}}}}$$
(3)
A In the FORCE method an initial sparse matrix ω0 initializes the neurons to chaotic firing rates. B A secondary set of weights ηd⊤ is learned online with the Recursive Least Squares technique. These learned weights are added to the initial sparse matrix, leading to the firing rates r(t) which are used to compute the network output. For both matrices in panels A and B blue and red edges (dark and light gray for b/w printing) indicate respectively excitatory (positive) or inhibitory (negative) connections. Edge thickness represents the weight of each connection. C The network output is given by d⊤r(t), which is a s x nt matrix (s is the total number of supervisors). Each network output column i is nt time units long and is a linear combination of the firing rates r1(t), ⋯, rN(t) with diN as coefficients. D Network output \(\cos {{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}(t)\) (purple or dark gray for b/w printing) and \(\cos {{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}(t)\) (orange or light gray for b/w printing) and the correspondent supervisors (black). E Embedded chimera from network output: \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}(t)=\arctan \left[\frac{\sin {{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}(t)}{\cos {{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}(t)}\right]\) and \(\hat{{{{{{{{\boldsymbol{\theta }}}}}}}}}(t)=\arctan \left[\frac{\sin {{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}(t)}{\cos {{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}(t)}\right]\). F Mean phase velocity profile for the embedded chimera (identical profiles as in Fig. 1F). G Trajectories of the order parameter for the embedded chimera (identical trajectories as in Fig. 1G).
The components of η are fixed and random whereas the components of d are learned with Recursive Least Squares (RLS), which minimizes the sum-squared difference between the network output \({{{\hat{{{{{\boldsymbol{s}}}}}}}}}\), and the chimera supervisor s (see Supplementary Fig. 2). Training is considered successful when a constant value of the matrix d allows the network to mimic the dynamics of the chimera.
We found that a RNN with N = 1500 could indeed autonomously display a chimera dynamics, i.e., the chimera state with n = 3 as shown in Fig. 2D, G (see also Supplementary Movie 2). We found that smaller network sizes (N = 500 and N = 1000) did not learn this chimera state, while larger ones did as well (N = 2000). Note that the necessary N changes depending on the supervisor’s size (n = 3 or n = 25) and on the constraints we enforce in the RNN (see section FORCE method robustness for the specific N that we used in each case). This is in line with what was discovered in ref. 30, where for more complex supervisors, more neurons were required in a RNN for accurate dynamical fitting.
We refer to the chimera dynamics displayed by the RNN (e.g., Fig. 2D, G and Supplementary Movie 2) as an embedded chimera within the RNN. The chimera is embedded in the sense that a linear decoder (d) is used to extract the chimera state from the network as a whole with a linear combination of the firing rates (Fig. 2C, D). The phases of the individual oscillators can also be reconstituted with a Hilbert Transform (Fig. 2E). To confirm the network indeed learned the chimera shown in Fig. 1, we computed both the mean phase velocity and the order parameter (Fig. 2F, G), which are identical to those in Fig. 1.
FORCE method robustness
Next, we investigated if chimeras states could be generically embedded within RNNs that enforced specific biological constraints. We considered three different constraints. We first enforced Dale’s law, which states that a neuron can be either excitatory or inhibitory, not both. In our model this translates to the final learned synaptic weights ω1 = ω0 + ηd⊤ being constrained by the excitatory/inhibitory nature of each neuron. If neuron i is excitatory (inhibitory), all of its outgoing connections will be positive (negative): \({\omega }_{1i}^{1},\cdots \,,{\omega }_{Ni}^{1} \, > \, 0\) (\({\omega }_{1i}^{1},\cdots \,,{\omega }_{Ni}^{1} \, < \,0\))(Fig. 3A, B). See Methods for more details. We found that the chimera from Figs. 1 and 2 could be reliably trained with Dale’s law enforced (Fig. 3C) using a RNN of N = 3500. This was confirmed with the mean-phase velocity (Fig. 3D) and Kuramoto order parameter (Fig. 3E). Thus, chimeras can be generically embedded by a RNN respecting Dale’s Law.
A Dale’s law schematic representation. Blue and red edges (dark and light gray for b/w printing) indicate respectively excitatory (positive) or inhibitory (negative) connections. Edge thickness represents the weight of each connection. B Fraction of the post-learning weight matrix ω1 = ω0 + ηd⊤, which respects Dale’s law: an excitatory (inhibitory) neuron (columns in the plotted matrix) will only excite (inhibit) its connections, regardless of the neuron target type. C Embedded chimera from network output subject to Dale’s law. D Mean phase velocity profile for the embedded chimera (bright colors) obtained from the network output subject to Dale’s law and for the supervisor (light colors). E Order parameter trajectory for the embedded chimera (bright colors) obtained from the network output subject to Dale’s law and for the supervisor (light colors). F Schematic representation of sparsity enforcement in ω1. Blue and red edges (dark and light gray for b/w printing) indicate respectively excitatory (positive) or inhibitory (negative) connections. Edge thickness represents the weight of each connection. G Density distribution for the non-zero weights for ω0 and ω1, sparsity is decreased only by 8%. H Embedded chimera from network output subject to sparsity. I Mean phase velocity profile for the embedded chimera (bright colors) obtained from the network output subject to sparsity and for the supervisor (light colors). J Order parameter trajectory for the embedded chimera (bright colors) obtained from the network output subject to sparsity and for the supervisor (light colors).
Second, biological circuits are often sparsely connected43,44,46,47. This constraint is not immediately satisfied by FORCE training as the perturbation to ω0 is low rank, which yields all-to-all connectivity. To constrain the network to sparse connectivity, we enforced sparsity to the post-learning weight matrix ω1 (Fig. 3F) as in ref. 33 and tested if a sparsely coupled RNN could still learn the chimera state. In particular, the matrices η and d were kept at a sparsity value of 90% during and after learning. See Methods for more details on the implementation. Post learning, both weight matrices ω0 and ω1 were observed to have the same Gaussian-like distribution and that ω1 was successfully kept at 82% sparse (Fig. 3G). The sparse RNN with N = 3000 was still able to learn how to embed a chimera (Fig. 3H–J).
Finally, we investigated if a RNN could learn more complex chimeras. The supervisor was changed to a larger chimera from two populations of 25 identically coupled Kuramoto oscillators (Fig. 4A and Supplementary Movie 3). The RNN could again successfully reproduce the chimera (Fig. 4B and Supplementary Movie 4). This was confirmed with mean-phase velocity and Kuramoto order parameter for the higher-dimensional supervisor (Fig. 4C, D). Collectively, these results imply that the different chimeras may be generically embedded into RNNs while satisfying some of the main biological constraints in real neural circuits.
A Schematic representation of the chimera state with n = 25 in each group: each metronome represents an oscillator with phase θi (light orange or light gray for b/w printing, synchronous group) or ϕi (light purple or dark gray for b/w printing, asynchronous group). B Embedded chimera from the network output using a larger supervisor. C Mean phase velocity profile for the embedded chimera (bright colors) obtained from the network output using a larger supervisor (light colors). D Order parameter trajectory for the embedded chimera (bright colors) obtained from the network output using a larger supervisor (light colors).
Characteristics of the recurrent neural network
With RNNs successfully and robustly trained to display an embedded chimera, we next investigated if such a chimera can be detected from the micro to the macro scale. To analyze the underlying dynamics of the RNN for chimera signatures, a Fast Fourier Transform (FFT) was applied to both the network output and the original chimera in the Kuramoto system, which allowed us to extract the fundamental frequencies. We observed identical frequencies in the Kuramoto network (Fig. 5A, B) as in the RNN output (Fig. 5C, D). The frequencies correspond to the frequency of the synchronous population, f1 = 0.021 and to the main frequencies of the asynchronous population (Fig. 5A, B), f1 = 0.021, f2 = 0.059 and f3 = 0.096, which can be qualitatively identified in Fig. 1C.
A Network supervisor. B Fast Fourier Transform (FFT) of the network supervisor. Each FFT is normalized to the highest peak and plotted in a log y-scale. The three dashed vertical lines indicate the three highest frequencies (f1 = 0.021, f2 = 0.059 and f3 = 0.096), which correspond to the frequency of the synchronized (f1) and to the frequencies of the unsynchronized groups (f1, f2 and f3). C Network output. D FFT of the network output. E Principal component analysis (PCA) of the network’s firing rates. We plot the 1rst, the 3rd and the 5th components vs time (dark red, turquoise and yellow or black, dark and light gray for b/w printing). F FFTs (dark red, turquoise and yellow or black, dark and light gray for b/w printing) of the 1rst, 2nd and 3rd components, respectively. G Individual firing rates vs time. H FFTs of the individual firing rates. I Mean of the firing rates vs time. J FFT of the mean of the firing rates.
Next, we searched for these fundamental frequencies in experimentally accessible quantities across different scales. First, we considered a common reduction of high-dimensional neural data: Principal Component Analysis (PCA). This was applied to the set of individual firing rates in the network, with the 1st, 3rd, and 5th principal components shown in Fig. 5E. The three main frequencies found in the original Kuramoto system were found in the FFTs of these principal components (marked with dashed lines, Fig. 5F). Note that the 2nd, 4th and 6th components are orthogonal phase shifts of the 1st, 3rd and 6th components since both the cosine and the sine of the oscillators phases were used as supervisors.
The main three frequencies can also be obtained through the FFT of the firing rates of single neurons in our RNN (Fig. 5G, H). Depending on the neuron, all three frequencies are present or only a subset of them. This implies that the individual neurons can not be generally separated into a synchronized and an unsynchronized subgroup and characteristics of the embedded chimeras can permeate the different scales. This suggests that a possible origin of chimera states in the brain is state switching of neural components: the firing rates change dynamically between low and high values and these dynamics can collectively implement a decodable embedded chimera.
Next, we considered if truly macroscopic quantities somehow yielded information about the embedded chimera. To that end, the mean of the firing rates was computed with the resulting time series being highly irregular. Once again, the three main frequencies of the Kuramoto system were identified through the FFT of the spatial mean of the firing rates (Fig. 5I, J). These results show that if a chimera is embedded into a RNN, it may be detected from observations made at the micro-scale of neurons to the macro-scale of average activities.
Switching chimera state
Inspired by uni-hemispheric sleep observed in some aquatic mammals and birds48,49, where one hemisphere is awake, showing asynchronous electroencephalographic (EEG) activity, while the other is sleep, showing synchronized EEG activity, we sought to determine if the RNN could be trained to switch the synchronized and unsynchronized populations with an input. In uni-hemispheric sleep, the two hemispheres switch states due to an external input48,49 (Fig. 6A). The RNN was trained to learn the switching chimera state: any time an external pulse c was provided to the network (Fig. 6, gray line), the synchronized group (from the network output) changed to unsynchronized and the unsynchronized to synchronized. The supervisor was generated with homogeneous Kuramoto oscillators (as in Fig. 1) and the pulse was random (with some constraints, see Methods). The input pulse is configured nominally to c = 0, and changes randomly to c = 10 or to c = −10 to switch the synchronized and unsynchronized populations in the embedded chimera (Fig. 6B, note that we only show the positive pulse). In particular, for c = 10, the embedded chimera would change \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) to the synchronized state and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\) to the unsynchronized one and vice-versa for c = −10. Two subsequent pulses could not have the same sign (see Methods for details).
A Schematic representation of the cortex of a bottlenose dolphin undergoing uni-hemispheric sleep, reproduced from ref. 59: one hemisphere is synchronized (white) and the other one is unsynchronized (gray). The state of each hemisphere can change (synchronized to unsynchronized and vice-versa) due to an external perturbation (here depicted as a gray arrow). B Embedded switching chimera from the network output: the two groups (orange and purple or light and dark gray for b/w printing) change state (synchronized to unsynchronized and vice-versa) if it receives an external pulse (gray line) otherwise it does not change the state. C Individual firing rates of the recurrent neural network before and after the pulse (gray line). D PCA of the network’s firing rates before and after the pulse. We plot the 1st, the 3rd and the 5th components (dark red, turquoise and yellow or black, dark and light gray for b/w printing) vs time before and after the pulse (gray line).
The network was successfully trained to switch the synchrony profiles (Fig. 6B and Supplementary Movie 5) due to the external pulse, and thus a sufficiently strong perturbation to the system can induce a transition from one low-dimensional attractor (i.e., a low-dimensional projection of the RNN’s multidimensional attractor) to another. The RNN trained to switch chimera showed evidence of only two different low-dimensional attractors (for more information about the underlying dynamics or attractor of the trained RNN see Supplementary Note 2 and Supplementary Fig. 3). One attractor was defined by \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) being synchronized and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\) being unsynchronized, and the other attractor was defined by \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) being unsynchronized and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\) being synchronized (Supplementary Fig. 4). However, we found that for some pulses, the embedded chimera did not switch indicating that switching success may depend on the specific state of the system at the time of a pulse50. For those pulses where the embedded chimera did not switch, the following pulse switched the chimera state regardless of the state of \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\). In other words, any time an unsuccessful pulse occurred, the −10 pulse would act as a +10 pulse, and the +10 pulse would act as a −10 pulse (Supplementary Fig. 5). We tested the RNN with only positive pulses and with only negative pulses and it switched states independently of the pulse sign (Supplementary Fig. 6). Altogether, it proves that it is the size rather than the direction/sign of the pulse what leads to a switch in the chimera state.
Finally, we sought to determine how the low-dimensional projection of the RNN’s multidimensional attractor (see Supplementary Fig. 3) would change in the case of the switching chimera (Supplementary Fig. 4). We analyzed the underlying activity by computing the PCA of the RNN firing rates (Fig. 6C, D). The neuronal dynamics themselves readily display a change after the external input. At the network level, this change of dynamics is manifested on the temporal evolution of the 3rd and 5th principal components. The first principal component does not change after the input, and we believe this is due to the fact it accounts for the main frequency of both groups (the synchronized and unsynchronized populations), which is identical for both as we have shown in the FFT of the embedded chimera state (Fig. 5D).