## Embedded chimera states in recurrent neural networks

### Chimera state

In order to train a RNN to display a chimera, we first selected a suitable supervisor. Following ref. ^{4}, we used a system of Kuramoto oscillators consisting of two coupled populations of *n* = 3 identical oscillators (Fig. 1, Methods). Each oscillator is coupled with strength μ within the subpopulation and *ν* between subpopulations, with *μ* > *ν* (Fig. 1B). By using chimera-like initial conditions (see Supplementary Fig. 1), a chimera state is easily achieved where one sub-population synchronizes, while the other is asynchronously oscillating (schematically represented in Fig. 1A by six coupled metronomes.) The phases generated by the oscillators, *θ*_{i}(*t*) (orange/synchronous population) and *ϕ*_{i}(*t*) (purple/asynchronous population), will be critical for constructing the supervisor for the RNN (Fig. 1C–E and Supplementary Movie 1).

Prior to RNN training, we confirmed the simulated network was indeed displaying a chimera with two well-known measures: the mean phase velocity^{42}, which captures the frequency of the oscillators for a given time period and the order parameter *R*^{2,4}, which characterizes the synchronization of the system: for fully synchronized systems *R* = 1 and for asynchronous systems 0 < *R* < 1 (Methods). The mean phase velocity (Fig. 1F) shows distinct profiles for the two groups and with two different regimes: nodes from the asynchronous group oscillate faster than nodes from the synchronized group. The order parameter (Fig. 1G) also confirms distinct behaviors for the asynchronous group [0.4 < *R*(*t*) < 0.8] and for the synchronous group [*R*(*t*) = 1]. Thus, the two-population Kuramoto network displays a chimera state which will become the supervisor for the RNN.

### Recurrent neural networks can be trained to display chimeras

With the chimera solution in hand, we investigated if it could serve as a *m* dimensional supervisor to train a RNN to collectively display a chimera. First, as the oscillators’ phases are discontinuous, and wrapped around the interval [0, 2*π*), the unit circle transformations (\(\cos {\theta }_{i}\), \(\sin {\theta }_{i}\)), and (\(\cos {\phi }_{i}\), \(\sin {\phi }_{i}\)) were applied to the phases. With 2*n* oscillators, this results in a *m* = 4*n* dimensional supervisor for a RNN (throughout the paper we use *n* = 3 and *n* = 25). We use the FORCE method^{30} to train a RNN to autonomously mimic the time series produced by the supervisor. The training is considered successful if the network alone, unguided by the supervisor, can produce an identical chimera.

The RNN is a *N* autonomous dynamical system:

$$\tau {{{{{{{\boldsymbol{z}}}}}}}}^{\prime} =-{{{{{{{\boldsymbol{z}}}}}}}}+{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}{{{{{{{\boldsymbol{r}}}}}}}}$$

(1)

$${{{{{{{\boldsymbol{r}}}}}}}}=\tanh ({{{{{{{\boldsymbol{z}}}}}}}}).$$

(2)

The network is initially sparsely connected with a set of static weights *ω*^{0} which initiates the neuron’s currents *z*_{i}(*t*) into a high dimensional chaotic regime^{43,44,45} (Fig. 2A). During learning a low-rank perturbation *η***d**^{⊤} is added to the weight matrix *ω*^{0}, where ** η** and

**d**are both

*N*×

*m*(Fig. 2B). The matrix

**d**is used to decode the output of the network, with \({{{{{{{{\bf{d}}}}}}}}}^{\top }{{{{{{{\boldsymbol{r}}}}}}}}=\hat{{{{{{{{\boldsymbol{s}}}}}}}}}\) (Fig. 2C). This output is simultaneously fed back into the network:

$$\tau {{{{{{{\boldsymbol{z}}}}}}}}^{\prime} ={{{{{{{\boldsymbol{z}}}}}}}}+({{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}+{{{{{{{\boldsymbol{\eta }}}}}}}}{{{{{{{{\bf{d}}}}}}}}}^{\top }){{{{{{{\boldsymbol{r}}}}}}}}={{{{{{{\boldsymbol{z}}}}}}}}+{{{{{{{{\boldsymbol{\omega }}}}}}}}}^{0}{{{{{{{\boldsymbol{r}}}}}}}}+{{{{{{{\boldsymbol{\eta }}}}}}}}\hat{{{{{{{{\boldsymbol{s}}}}}}}}}$$

(3)

The components of ** η** are fixed and random whereas the components of

**d**are learned with Recursive Least Squares (RLS), which minimizes the sum-squared difference between the network output \({{{\hat{{{{{\boldsymbol{s}}}}}}}}}\), and the chimera supervisor

**(see Supplementary Fig. 2). Training is considered successful when a constant value of the matrix**

*s***d**allows the network to mimic the dynamics of the chimera.

We found that a RNN with *N* = 1500 could indeed autonomously display a chimera dynamics, i.e., the chimera state with *n* = 3 as shown in Fig. 2D, G (see also Supplementary Movie 2). We found that smaller network sizes (*N* = 500 and *N* = 1000) did not learn this chimera state, while larger ones did as well (*N* = 2000). Note that the necessary *N* changes depending on the supervisor’s size (*n* = 3 or *n* = 25) and on the constraints we enforce in the RNN (see section FORCE method robustness for the specific *N* that we used in each case). This is in line with what was discovered in ref. ^{30}, where for more complex supervisors, more neurons were required in a RNN for accurate dynamical fitting.

We refer to the chimera dynamics displayed by the RNN (e.g., Fig. 2D, G and Supplementary Movie 2) as an embedded chimera within the RNN. The chimera is embedded in the sense that a linear decoder (**d**) is used to extract the chimera state from the network as a whole with a linear combination of the firing rates (Fig. 2C, D). The phases of the individual oscillators can also be reconstituted with a Hilbert Transform (Fig. 2E). To confirm the network indeed learned the chimera shown in Fig. 1, we computed both the mean phase velocity and the order parameter (Fig. 2F, G), which are identical to those in Fig. 1.

### FORCE method robustness

Next, we investigated if chimeras states could be generically embedded within RNNs that enforced specific biological constraints. We considered three different constraints. We first enforced Dale’s law, which states that a neuron can be either excitatory or inhibitory, not both. In our model this translates to the final learned synaptic weights *ω*^{1} = *ω*^{0} + *η***d**^{⊤} being constrained by the excitatory/inhibitory nature of each neuron. If neuron *i* is excitatory (inhibitory), all of its outgoing connections will be positive (negative): \({\omega }_{1i}^{1},\cdots \,,{\omega }_{Ni}^{1} \, > \, 0\) (\({\omega }_{1i}^{1},\cdots \,,{\omega }_{Ni}^{1} \, < \,0\))(Fig. 3A, B). See Methods for more details. We found that the chimera from Figs. 1 and 2 could be reliably trained with Dale’s law enforced (Fig. 3C) using a RNN of *N* = 3500. This was confirmed with the mean-phase velocity (Fig. 3D) and Kuramoto order parameter (Fig. 3E). Thus, chimeras can be generically embedded by a RNN respecting Dale’s Law.

Second, biological circuits are often sparsely connected^{43,44,46,47}. This constraint is not immediately satisfied by FORCE training as the perturbation to *ω*^{0} is low rank, which yields all-to-all connectivity. To constrain the network to sparse connectivity, we enforced sparsity to the post-learning weight matrix *ω*^{1} (Fig. 3F) as in ref. ^{33} and tested if a sparsely coupled RNN could still learn the chimera state. In particular, the matrices ** η** and

**d**were kept at a sparsity value of 90% during and after learning. See Methods for more details on the implementation. Post learning, both weight matrices

*ω*^{0}and

*ω*^{1}were observed to have the same Gaussian-like distribution and that

*ω*^{1}was successfully kept at 82% sparse (Fig. 3G). The sparse RNN with

*N*= 3000 was still able to learn how to embed a chimera (Fig. 3H–J).

Finally, we investigated if a RNN could learn more complex chimeras. The supervisor was changed to a larger chimera from two populations of 25 identically coupled Kuramoto oscillators (Fig. 4A and Supplementary Movie 3). The RNN could again successfully reproduce the chimera (Fig. 4B and Supplementary Movie 4). This was confirmed with mean-phase velocity and Kuramoto order parameter for the higher-dimensional supervisor (Fig. 4C, D). Collectively, these results imply that the different chimeras may be generically embedded into RNNs while satisfying some of the main biological constraints in real neural circuits.

### Characteristics of the recurrent neural network

With RNNs successfully and robustly trained to display an embedded chimera, we next investigated if such a chimera can be detected from the micro to the macro scale. To analyze the underlying dynamics of the RNN for chimera signatures, a Fast Fourier Transform (FFT) was applied to both the network output and the original chimera in the Kuramoto system, which allowed us to extract the fundamental frequencies. We observed identical frequencies in the Kuramoto network (Fig. 5A, B) as in the RNN output (Fig. 5C, D). The frequencies correspond to the frequency of the synchronous population, *f*_{1} = 0.021 and to the main frequencies of the asynchronous population (Fig. 5A, B), *f*_{1} = 0.021, *f*_{2} = 0.059 and *f*_{3} = 0.096, which can be qualitatively identified in Fig. 1C.

Next, we searched for these fundamental frequencies in experimentally accessible quantities across different scales. First, we considered a common reduction of high-dimensional neural data: Principal Component Analysis (PCA). This was applied to the set of individual firing rates in the network, with the 1st, 3rd, and 5th principal components shown in Fig. 5E. The three main frequencies found in the original Kuramoto system were found in the FFTs of these principal components (marked with dashed lines, Fig. 5F). Note that the 2nd, 4th and 6th components are orthogonal phase shifts of the 1st, 3rd and 6th components since both the cosine and the sine of the oscillators phases were used as supervisors.

The main three frequencies can also be obtained through the FFT of the firing rates of single neurons in our RNN (Fig. 5G, H). Depending on the neuron, all three frequencies are present or only a subset of them. This implies that the individual neurons can not be generally separated into a synchronized and an unsynchronized subgroup and characteristics of the embedded chimeras can permeate the different scales. This suggests that a possible origin of chimera states in the brain is state switching of neural components: the firing rates change dynamically between low and high values and these dynamics can collectively implement a decodable embedded chimera.

Next, we considered if truly macroscopic quantities somehow yielded information about the embedded chimera. To that end, the mean of the firing rates was computed with the resulting time series being highly irregular. Once again, the three main frequencies of the Kuramoto system were identified through the FFT of the spatial mean of the firing rates (Fig. 5I, J). These results show that if a chimera is embedded into a RNN, it may be detected from observations made at the micro-scale of neurons to the macro-scale of average activities.

### Switching chimera state

Inspired by uni-hemispheric sleep observed in some aquatic mammals and birds^{48,49}, where one hemisphere is awake, showing asynchronous electroencephalographic (EEG) activity, while the other is sleep, showing synchronized EEG activity, we sought to determine if the RNN could be trained to switch the synchronized and unsynchronized populations with an input. In uni-hemispheric sleep, the two hemispheres switch states due to an external input^{48,49} (Fig. 6A). The RNN was trained to learn the switching chimera state: any time an external pulse *c* was provided to the network (Fig. 6, gray line), the synchronized group (from the network output) changed to unsynchronized and the unsynchronized to synchronized. The supervisor was generated with homogeneous Kuramoto oscillators (as in Fig. 1) and the pulse was random (with some constraints, see Methods). The input pulse is configured nominally to *c* = 0, and changes randomly to *c* = 10 or to *c* = −10 to switch the synchronized and unsynchronized populations in the embedded chimera (Fig. 6B, note that we only show the positive pulse). In particular, for *c* = 10, the embedded chimera would change \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) to the synchronized state and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\) to the unsynchronized one and vice-versa for *c* = −10. Two subsequent pulses could not have the same sign (see Methods for details).

The network was successfully trained to switch the synchrony profiles (Fig. 6B and Supplementary Movie 5) due to the external pulse, and thus a sufficiently strong perturbation to the system can induce a transition from one low-dimensional attractor (i.e., a low-dimensional projection of the RNN’s multidimensional attractor) to another. The RNN trained to switch chimera showed evidence of only two different low-dimensional attractors (for more information about the underlying dynamics or attractor of the trained RNN see Supplementary Note 2 and Supplementary Fig. 3). One attractor was defined by \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) being synchronized and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\) being unsynchronized, and the other attractor was defined by \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) being unsynchronized and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\) being synchronized (Supplementary Fig. 4). However, we found that for some pulses, the embedded chimera did not switch indicating that switching success may depend on the specific state of the system at the time of a pulse^{50}. For those pulses where the embedded chimera did not switch, the following pulse switched the chimera state regardless of the state of \({{{\hat{{{{{\boldsymbol{\theta }}}}}}}}}\) and \({{{\hat{{{{{\boldsymbol{\phi }}}}}}}}}\). In other words, any time an unsuccessful pulse occurred, the −10 pulse would act as a +10 pulse, and the +10 pulse would act as a −10 pulse (Supplementary Fig. 5). We tested the RNN with only positive pulses and with only negative pulses and it switched states independently of the pulse sign (Supplementary Fig. 6). Altogether, it proves that it is the size rather than the direction/sign of the pulse what leads to a switch in the chimera state.

Finally, we sought to determine how the low-dimensional projection of the RNN’s multidimensional attractor (see Supplementary Fig. 3) would change in the case of the switching chimera (Supplementary Fig. 4). We analyzed the underlying activity by computing the PCA of the RNN firing rates (Fig. 6C, D). The neuronal dynamics themselves readily display a change after the external input. At the network level, this change of dynamics is manifested on the temporal evolution of the 3rd and 5th principal components. The first principal component does not change after the input, and we believe this is due to the fact it accounts for the main frequency of both groups (the synchronized and unsynchronized populations), which is identical for both as we have shown in the FFT of the embedded chimera state (Fig. 5D).