## Discrimination reveals reconstructability of multiplex networks from partial observations

### Framework for reconstructing multiplex layer structures

We will first introduce the notations by denoting the adjacency matrix of layer *α* in a multiplex network **M** by *M*^{α} (*α* = 1, 2, ⋯ , *L*), and \({M}_{ij}^{\alpha }=1\) if there is an edge between nodes *i*, *j* in layer *α* and vice versa (*i*, *j* = 1, 2, ⋯ , *N*). It is hypothesized that each multiplex network **M** is generated by some process such that the probability of generating a multiplex network with adjacency matrices *M*^{α} is ∏_{α}*P*(*M*^{α}∣*θ*), where *θ* represents the parameters of such a process. An aggregate mechanism is a mapping from a multiplex network **M** to a monoplex (single-layer) topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\), where \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\in {{\mathbb{R}}}^{N\times N}\) describes the adjacency matrix. In this article, for illustration, we describe the framework using multiplex networks aggregated by the OR mechanism, which is the most common case ranging from biological networks to social networks (see Supplementary Note 1 for other aggregate mechanisms). Then, we have

$${A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1-\mathop{\prod }\limits_{\alpha =1}^{L}(1-{M}_{ij}^{\alpha }).$$

(1)

Partial observation Γ indicates a set that contains the observed edges in the multiplex network, where \({{\Gamma }}=\{(i,j,\alpha )| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1,{{{{{{{\rm{and}}}}}}}}\,{M}_{ij}^{\alpha }{{\,{{{{{\rm{is}}}}}}\,{{{{{\rm{observed}}}}}}}}\}\). Supposing that we have the aggregate topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\) and the partial observation Γ from a multiplex network **M** with *L* layers, our goal is to predict whether there is a link between any two nodes *i*, *j* in layer *α* when (*i*, *j*, *α*) ∉ Γ (Fig. 2a). Once given an aggregate topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\), there are in total \(L\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}|\) links are expected to predict, where \(| {A}^{{{{{{{{\mathcal{O}}}}}}}}}|\) indicates the number of edge in the aggregate topology \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\). Notice that, among the \(L\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}|\) potential links, we have already observed *c**a**r**d*(Γ) links, where *card*(⋅) indicates the elements number of a set. Thus, we denote the fraction of partial observations by *c* (0 ≤ *c* < 1), indicating the ratio between the number of observed edges in layers and all edges in the multiplex network, i.e., \(c=\frac{{card}({{\Gamma }})}{L\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}| }\).

The first step of the reconstruction is to find the most probable value of *θ* by maximizing the posterior probability \(P(\theta | {A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }})\), where *θ* is the network model parameter. Notice that the probability here provides a general description for any form of the network parameter *θ* (see Supplementary Note 2 for specific forms with a certain network model). Since there is no prior knowledge about the parameter *θ*, we assume it to have a uniform distribution, i.e., *P*(*θ*) = *const*.^{33}. In this case, based on the Bayesian rule, the maximum posterior estimate is equivalent to maximizing the log-likelihood function

$$l(\theta )=\ln P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| \theta ),$$

(2)

which performs the maximum likelihood estimation (MLE). The second step is to reconstruct the multiplex structure **M**, inferring the probability for each possible structure specifically. Since many potential layer structures can produce the same multiplex aggregate topology, we denote the probability distribution for all multiplex structures by *Q*(**M**), where ∑_{M}*Q*(**M**) = 1. Then, the estimated parameter *θ* can reconstruct the multiplex structure by calculating the posterior distribution

$$Q({{{{{{{\bf{M}}}}}}}})=P({{{{{{{\bf{M}}}}}}}}| {A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},\theta ).$$

(3)

However, there is a gap between the observations and network model parameters *θ*, since the multiplex structure **M** is a hidden variable in *l*(*θ*), where \(l(\theta )=\ln {\sum }_{{{{{{{{\bf{M}}}}}}}}}P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},{{{{{{{\bf{M}}}}}}}}| \theta )\). Notice that the sums over **M** here are expected to be over **M** that are consistent with \({A}^{{{{{{{{\mathcal{O}}}}}}}}}\) and Γ. For any distribution *Q*(**M**), by employing the Jensen’s inequality, we have

$$l(\theta )=\ln \mathop{\sum}\limits_{{{{{{{{\bf{M}}}}}}}}}P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},{{{{{{{\bf{M}}}}}}}}| \theta )\ge \mathop{\sum}\limits_{{{{{{{{\bf{M}}}}}}}}}Q({{{{{{{\bf{M}}}}}}}})\ln \frac{P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},{{{{{{{\bf{M}}}}}}}}| \theta )}{Q({{{{{{{\bf{M}}}}}}}})},$$

(4)

where the equality holds if and only if the Eq. (3) is satisfied. Thus, *l*(*θ*) and *Q*(**M**) are interdependent, and we perform an iterative process to obtain the MLE of the parameter *θ* and the posterior distribution *Q*(**M**) as follows. Given an arbitrary initial value *θ*^{(0)}, we find the optimized posterior distribution *Q*^{(k)}(**M**) by Eq. (3). Then, we update the parameters *θ*^{(k)} that maximize the right-hand side of Eq. (4) by posterior distribution *Q*^{(k−1)}(**M**), which performs a coordinate ascent to maximize the log-likelihood function (Fig. 2b). The iterations above are derived from the expectation-maximization (EM) algorithm^{34} (details in the “Methods” section), and a toy example is shown (Fig. 2c). Note that if there is any prior on the parameters *θ*, the proposed framework above can be improved by maximizing the product of the likelihood function \(P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| \theta )\) and the prior *P*(*θ*), i.e., the so-called maximum a posterior estimation (MAP).

In estimation and statistics theory, an unbiased estimator is called efficient if the variance of the estimator reaches Cramer-Rao lower bound (CRLB)^{35}. Fortunately, the proposed framework yields a maximum likelihood estimation, which is an unbiased estimator, and performs asymptotic normality indicating the estimator converges in distribution to a normal distribution^{36}. With this, we prove that the variance of the estimator designed in our framework decreases as the fraction of partial observations *c* increases, and further reaches the CRLB when the network size *N* approaches infinity (see Supplementary Note 3 and Supplementary Fig. 4 for more details).

### Evaluations for the performance of the reconstruction

We now analyze the performance of reconstruction on various real-world multiplex networks. Notice that the framework works for any given analytical generative model, such as Erdos-Rényi random network model and stochastic block model. For illustration, we employ the configuration model as our network model, which exploits the parameter set *D* to describe model parameters. For a multiplex network composed by *L* layers, specifically, the parameter set *D* contains *L* vectors \(\overrightarrow{{d}^{\alpha }}\in {{\mathbb{R}}}^{N}\) (*α* = 1, 2, ⋯ , *L*), encoding the degree sequences in each layer, where the component *d*^{α}(*i*) describes the degree of node *i* in layer *α*. The configuration model can significantly reduce the complexity from exponential to polynomial by exploiting the independence of each link and this model has been widely applied to analyze the relationship between structure and function of complex networks^{37,38,39}.

As we mentioned in the last section, once a certain network model is determined, we can conduct specific derivations to find the most probable values of all parameters in *D* by maximizing the likelihood \(P({A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }}| D)\) (see Supplementary Note 2 for detailed derivations). After estimating degree sequences *D*, the posterior probability \({Q}_{ij}^{\alpha }\) can be calculated by

$${Q}_{ij}^{\alpha }=P({M}_{ij}^{\alpha }=1| {A}^{{{{{{{{\mathcal{O}}}}}}}}},{{\Gamma }},D),$$

(5)

which is called here link reliability, measuring the probability that a link exists between node *i* and node *j* in layer *α* (see Supplementary Note 2 for complete algorithm and complexity analysis in this case). We examine the reliability of all links in the testing set *E*^{T} consisting of potential links except those of the partial observations, i.e., \({E}^{T}=\{(i,j,\alpha )\ \notin\ {{\Gamma }}| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1\}\) (see Supplementary Fig. 5 for a schematic illustration). For this purpose, we calculate the TP (true positive rate) \(P({M}_{ij}^{\alpha }=1| {Q}_{ij}^{\alpha }\ > \ q)\), FP (false positive rate) \(P({M}_{ij}^{\alpha }=0| {Q}_{ij}^{\alpha }\ > \ q)\), TN (true negative rate) \(P({M}_{ij}^{\alpha }=0| {Q}_{ij}^{\alpha }\ < \ q)\) and FN (false negative rate) \(P({M}_{ij}^{\alpha }=1| {Q}_{ij}^{\alpha }\ < \ q)\) in *E*^{T}, where *q* is the threshold that determines the classifier boundary for varying classes.

We first vary the threshold *q* from 0 to 1, and calculate AUC (area under the receiver operating characteristic (ROC) curve) for two real-world datasets (C. elegans neural network and London transportation network) against different *c*. In the meantime, we compare our results with three-link prediction methods that performed well on inference tasks by partial observations so far. The first relevant work is referred to De Bacco et al., who have proposed a generative model, and an efficient expectation-maximization algorithm, which allows to perform inference tasks such as community detection and link prediction^{40}. It works for multiplex networks with groups, but it may fail in networks without group-based structures. The second is referred to Tarres-Deulofeu et al., who introduced a stochastic block model, which can take full advantage of the information contained in the whole network^{41}. The third baseline is calculated by a single-layer link prediction method, layer by layer, in the multiplex network (Layered model). Notice that all of them did not take the aggregate topology into consideration, while our method can provide valuable insights into how to aggregate topology information helps reconstruction. The AUCs by four methods for C. elegans neural network and London transportation are shown in Fig. 3a, displaying that the aggregate topology truly provides important information (see Supplementary Fig. 13 for more empirical results).

Further, we display the ROC curves in the ROC space for *c* = 0.01, 0.05, 0.25, 0.5, respectively (Fig. 3b, c). Here, the ROC space is defined by plotting the false positive rate on the *x*-axis and the true positive rate on the *y*-axis, displaying the relative trade-offs between false positive (costs) and true positive (benefits). Notice that the ROC curve describes the performance as a continuous function of the threshold *q*, and can well quantify the true positive rate and the false-positive rate for any given threshold *q*. Thus, AUC is a scalar that quantifies the performance and it does not depend on the choice of threshold. The results of the ROC analysis show that our proposed framework is very effective for any threshold in the interval of (0, 1), and thus it is stable for different thresholds in various real-world networks. Further, the true positive rate is positively correlated with the false-positive rate, and there exists a threshold, above which a false positive rate increases faster than the true positive rate. It is, thereby, not justified anymore to improve a true positive rate by increasing the false positive rate beyond such a threshold.

Next, we will set specific thresholds for further analysis. For example, since we do have any prior knowledge about the threshold *q* in the task, we set *q* = 0.5 to avoid arbitrary choice, i.e., when the link reliability is larger than 0.5, an edge is considered to exist, and vice versa. Then, we calculate the four metrics to evaluate the performance of multiplex reconstruction for the nine real-world datasets. As a result, the fraction of partial observations *c*, indicating the portion of the observed edges, exhibits a positive correlation with the accuracy of the reconstruction, showing good performance even for a quite small *c* (Fig. 3f) (see the “Methods” section and Supplementary Fig. 6 for more details for the evaluations). In practice, we sample the partial observations by vertex sampling, and repeat the reconstruction 100 times for each value of *c* (see the “Methods” section for more details about the sampling of partial observations).

Besides link reliability, network characteristics also include average degree, degree distribution, length distribution of the shortest paths, which are significant to network reconstruction. One prominent advantage of the reconstruction framework is that we can simultaneously obtain other micro- or mesoscale network properties. For example, the expectation of the degree distribution in layer *α* can be obtained by *E*(*p*_{α}) = ∑_{M}*Q*(**M**) ⋅ *p*_{α}(**M**), where *p*_{α}(**M**) is the degree distribution in layer *α* for any specific multiplex structure **M**. The reconstructed degree distributions in two layers with different values of *c* are compared to the ground truth (*c* = 1), demonstrating that the degree distributions in all layers can also be well reconstructed as *c* increases (Fig. 3d, e). Generally, the expectation of property *X* is the first raw moment obtained by *E*(*X*) = ∑_{M}*Q*(**M**) ⋅ *X*(**M**), while the corresponding variance is the second central moment, \(D(X)={\sum }_{{{{{{{{\bf{M}}}}}}}}}Q({{{{{{{\bf{M}}}}}}}})\cdot {\left[X({{{{{{{\bf{M}}}}}}}})-E(X)\right]}^{2}\). Moreover, we can obtain skewness, kurtosis, and higher moments of a property *X*. We will discuss how different network characteristics (e.g., average degrees and overlap of edges in different layers) impact the performance of reconstruction in the next section.

### Reconstructability determined by the discrimination indicator

It is of great significance to investigate how various characteristics of multiplex networks affect the reconstructability, i.e., the reconstruction accuracy we measured. Without loss of generality, we conduct an analysis of two-layer multiplex networks to illustrate the method in detail. Once a link is observed in the aggregate network, the probability space of the potential multiplex structure contains three events: the potential link exists (i) only in layer 1, (ii) only in layer 2, or (iii) in both layers. Then, the uncertainty of all links in the multiplex network **M** can be quantified by the information entropy

$${{{{{{{\mathcal{H}}}}}}}}({{{{{{{\bf{M}}}}}}}}| {A}^{{{{{{{{\mathcal{O}}}}}}}}})={{{{{{{\mathcal{H}}}}}}}}({M}^{1}| {A}^{{{{{{{{\mathcal{O}}}}}}}}})+{{{{{{{\mathcal{H}}}}}}}}({M}^{2}| {A}^{{{{{{{{\mathcal{O}}}}}}}}}).$$

(6)

Generally speaking, the smaller the information entropy \({{{{{{{\mathcal{H}}}}}}}}\) is, the more certain the potential multiplex structure is, and vice versa.

To study how different characteristics of multiplex networks impact the information entropy, we first introduce the ratio of average degrees of two layers denoted by *r*, i.e., \(r=\frac{\left\langle {k}_{1}\right\rangle }{\left\langle {k}_{2}\right\rangle }\), where \(\left\langle {k}_{1}\right\rangle\) and \(\left\langle {k}_{2}\right\rangle\) are the average degrees in layer 1 and layer 2, respectively. We assume, without loss of generality, that \(\left\langle {k}_{1}\right\rangle \le \left\langle {k}_{2}\right\rangle\), such that 0 < *r* ≤ 1. Then, we consider the overlap of edges denoted by *v* between the two layers. A high overlap indicates that a link is more likely to exist in one layer if the corresponding link exists in the other layer, i.e., a low uncertainty. To measure the overlap *v* of a multiplex network, we refer to the Jaccard index of \({{{{{{{{\mathcal{E}}}}}}}}}_{1}\) and \({{{{{{{{\mathcal{E}}}}}}}}}_{2}\) that indicate the two edge sets in the two layers, i.e., \(v=| {{{{{{{{\mathcal{E}}}}}}}}}_{1}\cap {{{{{{{{\mathcal{E}}}}}}}}}_{2}”https://www.nature.com/” {{{{{{{{\mathcal{E}}}}}}}}}_{1}\cup {{{{{{{{\mathcal{E}}}}}}}}}_{2}|\) (see Supplementary Fig. 9 for more details about multiplex network characteristics).

We next explore how these factors impact the information entropy of a multiplex network and further determine the performance of reconstruction. By Eq. (6), we can calculate the information entropy \({{{{{{{\mathcal{H}}}}}}}}\) with a mean-field approximation, and obtain

$${{{{{{{\mathcal{H}}}}}}}}({{{{{{{\bf{M}}}}}}}}| {A}^{{{{{{{{\mathcal{O}}}}}}}}})=-| {A}^{{{{{{{{\mathcal{O}}}}}}}}}| \cdot \mathop{\sum }\limits_{\alpha =1}^{2}[{\bar{p}}_{\alpha }\cdot \ln {\bar{p}}_{\alpha }+(1-{\bar{p}}_{\alpha })\cdot \ln (1-{\bar{p}}_{\alpha })],$$

(7)

where

$${\bar{p}}_{1}=E[P({M}_{ij}^{1}=1| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1)]=\frac{\hat{v}+\hat{r}}{1+\hat{r}},$$

(8)

and

$${\bar{p}}_{2}=E[P({M}_{ij}^{2}=1| {A}_{ij}^{{{{{{{{\mathcal{O}}}}}}}}}=1)]=\frac{1+\hat{v}\cdot \hat{r}}{1+\hat{r}},$$

(9)

indicating the average probability for the existence of any link in layer 1 and layer 2, respectively. Notice that in Eqs. (8) and (9), \(\hat{v}\) and \(\hat{r}\) are the estimations of *v* and *r* when we only have partial observations Γ, and we approximate them by *c* ⋅ *v* and *r*^{c} empirically (see Methods section for more details). Thus, we find that the information entropy of a given multiplex network is highly related to the fraction of partial observations *c*, the ratio of average degrees *r* and overlap *v*. It is clear that the uncertainty of the probability space decreases with the increasing of *c* and *v*. Hence, the information entropy \({{{{{{{\mathcal{H}}}}}}}}\) is a monotonously decreasing function of *c* and *v* over the domain. For *r*, however, the information entropy is a monotonously increasing function when *r* increases from 0 to 1 (Fig. 4a). Clearly, \({{{{{{{\mathcal{H}}}}}}}}\) describes the microscale discrimination between layers of a multiplex network, since a high discrimination (e.g., *r* tends to 0) leads to a low information entropy.

Generally, the reconstruction accuracy is expected to be determined by the information entropy \({{{{{{{\mathcal{H}}}}}}}}\), since \({{{{{{{\mathcal{H}}}}}}}}\) is the primary variable that affects the uncertainty of the distribution of potential multiplex structures. Empirically, we find that the accuracy is linearly determined by the indicator \(\rho \cdot {{{{{{{\mathcal{H}}}}}}}}\) (Fig. 4b), i.e.,

$${{{{{{{\rm{Accuracy}}}}}}}}\approx 1-\rho \cdot {{{{{{{\mathcal{H}}}}}}}},$$

(10)

where *ρ* is a scaling factor satisfying

$$\rho =\frac{1}{2\ln 2\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}| }\cdot \left(1-\frac{1-v}{1+v}\cdot {c}^{s}\right).$$

(11)

In Eq. (11) above, *s* = *s*_{(M)} is a constant related to the topology of the multiplex network **M** (see Supplementary Table 2 for approximate values of *s*_{(M)}. The term (1 − *v*)/(1 + *v*) in Eq. (11) indicates the uncertainty of links in the testing set can be reduced by partial observations. The parameter *s* describes the scale how partial observations can reduce the uncertainty of links in the testing set (details in Methods section). We further find that *s*_{(M)} is closely proportional to the cosine similarity of the two degree sequences in each layer, i.e., \(s\propto \cos \langle \overrightarrow{{d}^{1}},\overrightarrow{{d}^{2}}\rangle\) (Fig. 4c). Clearly, \(\cos \langle \overrightarrow{{d}^{1}},\overrightarrow{{d}^{2}}\rangle\) describes the similarity between degree sequences of the two layers in a multiplex network, indicating the mesoscale discrimination, which is not relevant to microscale discrimination including *r*, *v*, and \({{{{{{{\mathcal{H}}}}}}}}\) generally (Fig. 4d).

Thus, the reconstructability is determined by the discrimination indicator (\(1-\rho \cdot {{{{{{{\mathcal{H}}}}}}}}\)) from both microscale and mesoscale views. This discovery indicates that the reconstruction can be predicted accurately by the discrimination indicator, obtaining a high accuracy of reconstruction where either *ρ* or \({{{{{{{\mathcal{H}}}}}}}}\) is small. For example, the reconstructability can be enhanced when the difference in average degrees between layers is vast (*r* tends to 0). Notice that we can approximate *s* by the cosine similarity if we do not meet the exact value of *s* empirically, since *s* is highly related to the cosine similarity. We will next discuss how to allocate the partial observations in different layers when a specific budget \(\bar{c}\) is given.

### Allocating limited budget for partial observations

Usually, we have a limited budget for conducting observations in practice. It is, thereby, necessary to investigate budget allocation (partial observations Γ) in different layers to optimize the performance of reconstruction (e.g., the accuracy) as far as possible. We denote the average fraction of partial observations in each layer by \(\bar{c}\), i.e., \(\bar{c}={\sum }_{\alpha }{c}_{\alpha }/L\), where *c*_{α} indicates the fraction of partial observation in layer *α*, and denote \({{{{{{{\mathcal{H}}}}}}}}({M}^{\alpha }| {A}^{{{{{{{{\mathcal{O}}}}}}}}})\) by \({{{{{{{{\mathcal{H}}}}}}}}}_{\alpha }\) for simplicity. Similarly, employing the mean-field approximation (details in the “Methods” section), we can predict the accuracy by the function *F* defined as

$$F({c}_{1},{c}_{2})= \ 1-\frac{1-{c}_{1}}{1-{c}_{1}+(1-{c}_{2})/\hat{r}}\\ \cdot {\rho }_{1}\cdot {{{{{{{{\mathcal{H}}}}}}}}}_{1}-\frac{(1-{c}_{2})/\hat{r}}{1-{c}_{1}+(1-{c}_{2})/\hat{r}}\cdot {\rho }_{2}\cdot {{{{{{{{\mathcal{H}}}}}}}}}_{2},$$

(12)

where

$${\rho }_{\alpha }=\frac{1}{2\ln 2\cdot | {A}^{{{{{{{{\mathcal{O}}}}}}}}}| }\cdot \left(1-\frac{1-v}{1+v}\cdot {c}_{3-\alpha }^{s}\right),\,\alpha =1,2.$$

(13)

We next explore how the performance of reconstruction is impacted by the ratio *c*_{1}/*c*_{2} when a certain budget is given for partial observations. Once \(\bar{c}\) is given, we regard the function *F* as a unary function of *c*_{1}, i.e., *F* = *F*(*c*_{1}), since \({c}_{2}=2\bar{c}-{c}_{1}\). Then, the domain of *F*(*c*_{1}) is \([0,2\bar{c}]\) if \(\bar{c}\ \le\ 0.5\), and is \([2\bar{c}-1,1]\) if \(\bar{c}\ > \ 0.5\).

We notice that the function *F*(*c*_{1}) is monotonically increasing over the domain if \(\bar{c}\) is small, but decreases at first and increases later if \(\bar{c}\) is large. Theoretical analysis shows that \(F(0)\ge F(2\bar{c})\) if \(\bar{c}\le 0.5\), and \(F(2\bar{c}-1)\ge F(1)\) if \(\bar{c}\ > \ 0.5\) (details in the “Methods” section). The result indicates that it is always better to allocate the budget as much as possible to the layer whose average degree is lower, and we can reach the optimal strategy to obtain the highest accuracy for the given network model then. Moreover, there exists a threshold \(0\ < \ {\bar{c}}_{0}({{{{{{{\bf{M}}}}}}}})\ < \ 1\) for each multiplex network **M**, where \({\bar{c}}_{0}\) is the solution to the equation

$$F(0)=F(2{\bar{c}}_{0}).$$

(14)

If the budget \(\bar{c}\) is less than \({\bar{c}}_{0}\), the accuracy increases when *c*_{1}/*c*_{2} increases, and reaches the maximum as *c*_{1}/*c*_{2} tends to *∞*. If the budget \(\bar{c}\) is large (\(\bar{c}\ > \ {\bar{c}}_{0}\)), however, the accuracy increases when *c*_{1}/*c*_{2} tends to 0 or *∞*, and reaches the maximum as *c*_{1}/*c*_{2} tends to *∞* (Fig. 5a), indicating that the multiplex network can be reconstructed when the aggregate topology and either of the two layers is observed (Fig. 5b). The reason is as follows. The partial observations in different layers can capture the maximal characteristics of each layer when *c*_{1}/*c*_{2} = 1. However, it will lead to more redundancy and lower accuracy if the partial observations in different layers have a high overlap of observations, making the performance even worse when *c*_{1}/*c*_{2} = 1 and \(\bar{c}\) is large. The theoretical results enable us to make the best strategy to allocate budget and thus obtain the optimal reconstruction of multiplex networks in this case. Furthermore, results from real-world data sets verified our theoretical analysis (Fig. 5a, c). We will discuss how different multiplex network characteristics impact the performance of reconstruction from a dynamical behavior point of view in the next section.

### Predicting dynamic processes in multiplex networks

We proceed to investigate the performance of the reconstructed multiplex networks on the prediction of dynamic processes, which is critical to the network functionality. First, we study a percolation process occurring on a two-layer interdependent multiplex network. In such a multiplex network, once a set of nodes is removed (e.g., being attacked or random failure) in one layer, nodes disconnected to the GCC (giant connected component) in the same layer and the counterparts of the removed nodes will also fail and thus be removed. The new removed nodes result in more node removal, and the repetitive processes lead to the catastrophic cascade of failures. For the reconstructed multiplex network encoded by the expectation *E*[*Q*(*M*)], we binarize the matrix *E*[*Q*(*M*)] and randomly remove nodes in one layer with probability 1 − *p* (see Supplementary Note 4 for more details of the process). We calculate the size of GMCC (giant mutually connected component) as a function of *p* and the critical threshold *p*_{c}, above which the GMCC exists. We compare the average size of GMCC in the reconstructed network (repeated 100 times) to the real one with the C. elegans neural network against ranging *c* (Fig. 6a). The performance of reconstruction is well as seen from the size of GMCC, even if *c* is small. The estimates of the size of GMCC and the critical probability *p*_{c} approach those of the real networks (*c* = 1) as *c* increases 1. However, the proposed method slightly underestimates both the size of GMCC and the critical threshold *p*_{c} for the C. elegans neural network. Further, simulations on synthetic networks reveal that the method underestimates much more the robustness and *p*_{c} of the interdependent networks when *r* is small (Fig. 6b). When *r* is small and close to 0, the edges in the multiplex network are concentrated on one layer, leading to the extreme vulnerability. However, the reconstructed method could not capture the unbalance, especially for a small fraction of partial observation.

Second, we consider a random walk process taking place on interconnected multiplex networks, where interlayer links only exist between counterparts. We suppose that a number of walkers start from randomly chosen nodes and walk along with intralayer links with a probability *p*_{intra}, and along with interlayer links with probability *p*_{inter} (see Supplementary Note 5 for more details). We employ the coverage *ϕ*(*t*) as the performance metric, indicating the fraction of nodes that have been visited by the walkers before time *t*. The coverage at each time on reconstructed multiplex networks are compared to the real one (London multiplex transportation network) against different *c*, showing an outstanding good prediction as *c* increases (Fig. 6c). Simulations on synthetic networks show that the coverage will be overestimated no matter *r* is small or large (Fig. 6d). Moreover, we conduct more simulations with real-world multiplex networks with more than two layers. For example, we simulated the random walk processes on the European air transportation network which has three layers (Ryanair, Lufthansa, EasyJet), and the air transportation network in the U.S. which has four layers (SkyWest, Southwest, American Eagle, American Airlines) (see Supplementary Fig. 10c, d for more results on real-world networks).

Last, we investigate a spreading process based on the SI (susceptible-infected) model on temporal networks, where interlayer links only exist between two counterparts at two adjacent time slots^{23}. In the SI model, each node has only two states: “susceptible” (S) or “infected” (I), and 5% nodes are randomly chosen to be sources (infected) at initial time *t* = 0 in our simulations. At each time (which corresponds to one layer in the temporal network), the infected nodes will infect the susceptible neighboring nodes with a specific infection rate *λ* (see Supplementary Note 6 for more details). In the spreading process, the fraction of infected nodes *I*(*T*) at time *t* = *T* on the reconstructed networks is calculated and compared to the real one with the network of social interactions at SFHH (La Société française d’Hygiène Hospitalière) conference (Fig. 6e). Simulations on synthetic networks reveal that the method overestimates *I*(*T*) when *r* is close to 1 (Fig. 6f). We also compare these results with the existing methods (see Supplementary Fig. 11 for more details). Further, we simulated the spreading processes on the Wiki-talk network which has 14 layers (14 weeks), and the CollegeMsg network that has 22 layers (22 days) (see Supplementary Fig. 10e, f for more results on real-world networks). Moreover, we have studied how the performance of reconstruction for dynamics is influenced by more network characteristics, including the overlap of edges and ratio of heterogeneity (see Supplementary Fig. 12 for more results on synthetic network with different characteristics).