Mixed state entanglement classification using artificial neural networks

Cillian Harney; Mauro Paternostro; Stefano Pirandola

doi:10.1088/1367-2630/ac0388

The core tasks of entanglement classification [1–3] and quantification [4–6] are essential for future quantum technologies, and ask the seemingly straightforward questions: given a quantum state ρ, is it entangled? If so, by how much is it entangled? As the system size or dimension of a quantum system grows, these questions become highly non-trivial and in general there are no universal criteria or methods to provide answers. The most popular mathematical recipe for classification, the positive partial transposition (PPT) criterion (or Peres–Horodecki criterion) [7, 8], applies only to (2 ⊗ 2) or (2 ⊗ 3) bipartite systems. As one extends to multipartite, higher-dimensional quantum systems more sophisticated tools are required.

The application of classical machine learning tools for the study of quantum systems, such as artificial neural networks, have seen a surge of interest due to their remarkable expressive power and efficiency [9–11]. In particular, Carleo and Troyer [12] showed that restricted Boltzmann machines offer a resoundingly appropriate classical representation of quantum states, due to their ability to perform dimensionality reduction, their non-local information distribution, and optimization capacity [13]. Ansatzes based on this architecture are known as neural network quantum states (NNS), and they have been a successful classical simulation tool in a variety of contexts such as tomography [14–17], open quantum system dynamics [18–22], and the study of quantum technologies [23–26].

The versatility of NNS also provides an excellent framework for the study of entanglement [27]. As introduced for pure, qubit states in reference [28], it is possible to manipulate and constrain these neural networks in a way that guarantees a strict form of separability. These constrained variational states are known as separable neural network states (SNNS). Combined with a quantum state reconstruction algorithm, this introduces a unique entanglement witness protocol based on the reconstructive performance of an SNNS with a target state.

In this paper, we generalize these results to mixed, d-dimensional quantum states. We show how SNNS can be used to perform highly specific entanglement classification, and approximate entanglement measures to a very high degree of accuracy. The ability to implicitly characterize the space of separable states is extremely valuable, and allows one to compute entanglement measures that are otherwise extremely difficult to measure, such as the relative entropy of entanglement (REE) [29].

This paper is structured as follows: in section 1 we revise the NNS architecture and its variants for pure and mixed states. Section 2 overviews separable architectures, and shows how specific forms of entanglement can be guaranteed. In section 3 the methods of classification and quantification using SNNS are discussed. Section 4 provides numerical evidence for their utility through a number of relevant examples, with interesting applications in the study of noisy tripartite entanglement, bound entanglement, and quantum channel capacities. Finally, conclusions and future directions are addressed in section 5.

1. Neural network quantum states

1.1. Pure states

The simplest neural network model we can introduce is the positive, real NNS. This model uses a real valued restricted Boltzmann machine (RBM) architecture, with n_v visible units $\boldsymbol{s}=\left\{{s}_{1},\dots ,{s}_{{n}_{v}}\right\}$ representing the number of qudits being modelled within the target quantum system, fully interconnected with n_h hidden units $\boldsymbol{h}=\left\{{h}_{1},\dots ,{h}_{{n}_{\text{h}}}\right\}$ . The visible units are typically binary valued to study d = 2-dimensional systems, s_i ∈ {−1, 1} as are the hidden units h_j ∈ {−1, 1}; however this depends on the system being modelled. This network architecture allows us to capture the correlations of the objective quantum system through network parameters:

$\begin{equation}{\Pi}=\left\{{a}_{k},{b}_{j},{W}_{kj}\right\}\quad \text{for}\enspace k\in \left[1,{n}_{v}\right],\quad j\in \left[1,{n}_{\text{h}}\right],\end{equation} \tag{ 1 }$

$\begin{equation}\boldsymbol{a}\in {\mathbb{R}}^{{n}_{v}},\quad \boldsymbol{b}\in {\mathbb{R}}^{{n}_{\text{h}}},\quad W\in {\mathbb{R}}^{{n}_{v}{\times}{n}_{\text{h}}},\end{equation} \tag{ 2 }$

where a are visible biases, b are hidden biases, and W is the network weight matrix. The total number of parameters is |Π| = n_h ⋅ n_v + n_h + n_v (see figure 1).

**Figure 1.** Neural network quantum state architectures for the simulation of pure states. Panel (a) illustrates the standard NNS construction for n qudits. The visible-layer consists of ${n}_{v}{\times}\tilde {d}$ units which encode the accessible basis states of the target system; here $\tilde {d}$ is the number of visible units required to encode a single qudit state where $\mathcal{C}\left(\cdot \right)$ is some encoding function such that $\mathcal{C}\left(\vert d\rangle \right)={\left\{{g}_{i}\right\}}_{i=1}^{\tilde {d}}$ and its inverse $\bar{\mathcal{C}}\left({\left\{{g}_{i}\right\}}_{i=1}^{\tilde {d}}\right)=\vert d\rangle$ . Correlations between qudits are captured by an n_h unit hidden-layer with interconnected weights and biases. Panel (b) illustrates the amplitude/phase machine that uses two hidden-layers and only real valued parameters.
Download figure:
Standard image High-resolution image

The inherent advantage offered by the RBM architecture for generative modelling is that there are no intra-layer connections (i.e. there are no connections between adjacent visible units or hidden units). This allows for an ansatz that is independent from the activations of the hidden state space. Thus, one can define a positive NNS wavefunction as [12]

$\begin{equation}{{\Psi}}_{{\Pi}}\left(\boldsymbol{s}\right)={\text{e}}^{\sum\limits _{k=1}^{{n}_{v}}{a}_{k}{s}_{k}}\prod\limits _{j=1}^{{n}_{\text{h}}}2\enspace \mathrm{cosh}\left(\sum\limits _{k}{W}_{kj}{s}_{k}+{b}_{j}\right),\end{equation} \tag{ 3 }$

and therefore the NNS is |Ψ_Π⟩ = ∑_sΨ_Π( s )| s ⟩.

Whilst NNS have typically been applied to qubit systems using binary visible units, one can extend the modelling to d-dimensional qudits by using a set of visible binary neurons that collectively represent a single qudit [17]. One may choose to encode d-dimensional states using a collection of $\tilde {d}$ visible, binary neurons via an encoding function $\mathcal{C}$ , i.e.

$\begin{equation}\vert s\rangle {\mapsto}\mathcal{C}\left(s\right)=\left\{{g}_{1},{g}_{2},\dots ,{g}_{\tilde {d}}\right\}=\boldsymbol{g}.\end{equation} \tag{ 4 }$

The n_v qudit visible-layer can then be encoded into ${\tilde {n}}_{v}=\tilde {d}{n}_{v}{ >}{n}_{v}$ visible neurons,

$\begin{equation}\boldsymbol{s}=\left\{{s}_{1},{s}_{2},\dots ,{s}_{{n}_{v}}\right\}{\mapsto}\left\{{\boldsymbol{g}}_{1},{\boldsymbol{g}}_{2},\dots ,{\boldsymbol{g}}_{{\tilde {n}}_{v}}\right\}.\end{equation} \tag{ 5 }$

We may identically define the qudit decoding function $\bar{\mathcal{C}}$ such that $\bar{\mathcal{C}}\left(\boldsymbol{g}\right)=\vert s\rangle$ . One may encode qudits into binary codes on the visible-layer |s⟩ ↦ bin(s), requiring ${\tilde {n}}_{v}=\lceil {\mathrm{log}}_{2}\enspace d\rceil {n}_{v}$ visible binary neurons, which however requires d = 2^r for some integer r in order to admit a complete basis set. For arbitrary d it may be more useful to utilize one-hot encoding such that $\vert s\rangle {\mapsto}\text{one}-\text{hot}\left(s\right)={\boldsymbol{e}}_{s}^{d}$ where ${\boldsymbol{e}}_{s}^{d}$ is a d-length vector that is zero at all indices except index s.

In order to study non-positive quantum states one can introduce complex network parameters. Letting a_k = α_k + iβ_k, b_j = γ_j + iλ_j, and W_kj = Γ_kj + iΛ_kj, then the NNS wavefunction is

$\begin{equation}{{\Psi}}_{{\Pi}}\left(\boldsymbol{s}\right)={\text{e}}^{\sum\limits _{k=1}^{{n}_{v}}\left({\alpha }_{k}+\text{i}{\beta }_{k}\right){s}_{k}}\prod\limits _{j=1}^{{n}_{\text{h}}}2\enspace \mathrm{cosh}\left({\theta }_{j}^{\gamma }+\text{i}{\theta }_{j}^{\lambda }\right),\end{equation} \tag{ 6 }$

where ${\theta }_{j}^{\gamma }={\sum }_{k}{{\Gamma}}_{kj}{s}_{k}+{\gamma }_{j}$ , and ${\theta }_{j}^{\lambda }={\sum }_{k}{{\Lambda}}_{kj}{s}_{k}+{\lambda }_{j}$ . Thus the NNS can exhibit phase properties of quantum states. The network parameter set extends to ${\Pi}=\left\{{a}_{k},{b}_{j},{W}_{kj}\right\}\in \mathbb{C}$ .

Alternatively one can preserve reality of network parameters by restructuring the nature of the NNS ansatz itself. In particular we can construct an ansatz that uses two RBMs that unify to represent a complete state. Defining a variational phase state Φ_Ξ( s ), and amplitude state Ψ_Π( s ), this network ansatz is given as [14]

$\begin{equation}\vert {{\Psi}}_{{\Pi},{\Xi}}\rangle =\sum\limits _{\boldsymbol{s}}{\text{e}}^{\text{i}\enspace \mathrm{log}\enspace {{\Phi}}_{{\Xi}}\left(\boldsymbol{s}\right)}{{\Psi}}_{{\Pi}}\left(\boldsymbol{s}\right)\vert \boldsymbol{s}\rangle .\end{equation} \tag{ 7 }$

Therefore both the variational phase and amplitude networks need only be real valued, since the complex/phase properties of the state are managed through the complex exponential. The state is now defined by two parameter sets, ${\Pi}=\left\{{a}_{k},{b}_{j},{W}_{kj}\right\}\in \mathbb{R}$ and ${\Xi}=\left\{{c}_{k},{d}_{j},{U}_{kj}\right\}\in \mathbb{R}$ .

1.2. Mixed states

To extend the variational ansatz to mixed states requires the addition of a hidden mixing-layer with n_m hidden units, capable of encoding the classical probability distribution of the mixed quantum state [19–21]. The network state can be constructed from two sets of variational network parameters: Π = {c_p, U_kp}, ${c}_{p}\in {\mathbb{R}}^{{n}_{\text{m}}}$ and ${U}_{kp}\in {\mathbb{C}}^{{n}_{v}{\times}{n}_{\text{m}}}$ encoding the mixing probabilities [30] and the previously defined ${\Xi}=\left\{{a}_{k},{b}_{j},{W}_{kj}\right\}\in \mathbb{C}$ which encodes the pure state probability distribution. Let the density-matrix row and column degrees of freedom be described by basis vectors { α , β } respectively. As these parameter sets are independent, we may describe a density-matrix element as a contribution from a classical mixing state ${\mathcal{P}}_{{\Pi}}$ and a pure state σ_Ξ.

The contribution from a classical mixing network is given by

$\begin{equation}{\mathcal{P}}_{{\Pi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}=\prod\limits _{p=1}^{{n}_{\text{m}}}\enspace \mathrm{cosh}\left({\phi }_{p}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\right),\end{equation} \tag{ 8 }$

$\begin{equation}{\phi }_{p}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)={c}_{p}+\sum\limits _{k}{U}_{kp}{\alpha }_{k}+{U}_{kp}^{{\ast}}{\beta }_{k},\end{equation} \tag{ 9 }$

where x* denotes complex conjugation. Meanwhile the pure state contribution is

$\begin{equation}{\sigma }_{{\Xi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}={\text{e}}^{\omega \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}\prod\limits _{j=1}^{{n}_{\text{h}}}\mathrm{cosh}\left({\theta }_{j}\left(\boldsymbol{\alpha }\right)\right)\mathrm{cosh}\left({\theta }_{j}^{{\ast}}\left(\boldsymbol{\beta }\right)\right),\end{equation} \tag{ 10 }$

$\begin{equation}\omega \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)=\sum\limits _{k}{a}_{k}{\alpha }_{k}+{a}_{k}^{{\ast}}{\beta }_{k},\end{equation} \tag{ 11 }$

$\begin{equation}{\theta }_{j}\left(\boldsymbol{x}\right)={b}_{j}+\sum\limits _{k}{W}_{kj}{x}_{k}.\end{equation} \tag{ 12 }$

The complete variational state can therefore be constructed as a sum over all density-matrix elements,

$\begin{equation}{\rho }_{{\Pi},{\Xi}}=\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}{\mathcal{P}}_{{\Pi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}\cdot {\sigma }_{{\Xi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}\vert \boldsymbol{\alpha }\rangle \langle \boldsymbol{\beta }\vert ={\mathcal{P}}_{{\Pi}}\odot {\sigma }_{{\Xi}},\end{equation} \tag{ 13 }$

where ⊙ is the Hadamard product. This architecture is presented in figure 2. It is important to emphasize that by construction, the classical mixing state ${\mathcal{P}}_{{\Pi}}$ cannot simulate quantum correlations, only classical correlations (see appendix A). The pure state density-matrix σ_Ξ alone is able to simulate the quantum correlations within the global network state. Just as a mixed state can be decomposed via a statistical ensemble of pure states {p_i; |ϕ_i⟩}, where ρ = ∑_i p_i|ϕ_i⟩ ⟨ϕ_i|, equation (13) can be considered as a matrix element-wise decomposition of the density-matrix which is readily accessible via NNS.

The network parameters in this ansatz are necessarily complex, thus combining the control of phase and amplitude contributions much like equation (6). However, it may be desirable to formulate an ansatz that is similar to equation (7) in which phase/amplitude contributions are controlled by different networks. One could use the NNS in equation (7) to learn a vectorised density-matrix ρ _Π,Ξ = vec(ρ_Π,Ξ) = |ρ_Π,Ξ⟩, where the function vec(⋅) simply reshapes an n-qudit, dⁿ × dⁿ density-matrix into a d²ⁿ column vector. It follows that two real parameter RBMs could then be used to learn phase and amplitude properties respectively, as with pure states. Whilst optimal convergence towards a target vectorised mixed state is possible in this way, the ansatz itself is neither Hermitian or positive semi-definite under reshaping to a matrix. That is, given an inverse vectorisation function vec⁻¹(⋅) which reshapes a d²ⁿ column vector into dⁿ × dⁿ density-matrix, then ρ_Π,Ξ = vec⁻¹( ρ _Π,Ξ) is not a valid density-matrix. Therefore, this form of ansatz may represent states that are non-physical, which is clearly not desirable.

Instead, we can restructure the mixed state ansatz in order to take a closer form to the complex exponential format utilized in the previous section. Let the real parameter sets Ξ, Π be used to describe the pure state phase and amplitude networks respectively, and the complex parameter set Ω used to describe the mixing network. Recall a pure state wavefunction in complex exponential form ${{\Psi}}_{{\Pi},{\Xi}}\left(\boldsymbol{\alpha }\right)={\text{e}}^{\text{i}\enspace \mathrm{log}\enspace {\varphi }_{{\Xi}}\left(\boldsymbol{\alpha }\right)}{\sigma }_{{\Pi}}\left(\boldsymbol{\alpha }\right)$ . It is useful to define the following functions of our pure density-matrix phase/amplitude wavefunctions

$\begin{equation}{{\Phi}}_{{\Xi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}=\frac{{\varphi }_{{\Xi}}\left(\boldsymbol{\alpha }\right)}{{\varphi }_{{\Xi}}\left(\boldsymbol{\beta }\right)},\qquad {{\Gamma}}_{{\Pi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}={\sigma }_{{\Pi}}\left(\boldsymbol{\alpha }\right){\sigma }_{{\Pi}}\left(\boldsymbol{\beta }\right).\end{equation} \tag{ 14 }$

In order to incorporate classical mixing we need a mixing-layer that takes a similar vectorized form. Omitting the visible biases which are already possessed by the pure states, the mixing-layer takes the form

$\begin{equation}{\mathcal{P}}_{{\Omega}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}=\hspace{2pt}\prod\limits _{p=1}^{{n}_{m}}\mathrm{cosh}\left({\mu }_{p}+\text{i}{\psi }_{p}\right)\hspace{2pt}=\hspace{2pt}\prod\limits _{p=1}^{{n}_{m}}{r}_{p}^{\boldsymbol{\alpha },\boldsymbol{\beta }}\enspace {\text{e}}^{\text{i}\enspace \mathrm{log}\enspace {\vartheta }_{p}^{\boldsymbol{\alpha },\boldsymbol{\beta }}},\end{equation} \tag{ 15 }$

$\begin{equation}{\mu }_{p}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)=\hspace{2pt}{c}_{p}+\sum\limits _{k}{R}_{kp}\left({\alpha }_{k}+{\beta }_{k}\right),\end{equation} \tag{ 16 }$

$\begin{equation}{\psi }_{p}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)=\hspace{2pt}\sum\limits _{k}{I}_{kp}\left({\alpha }_{k}-{\beta }_{k}\right),\end{equation} \tag{ 17 }$

where R_kp = Re(U_kp) and I_kp = Im(U_kp) denote the real and imaginary components of the mixing network respectively. One can then construct the following phase and amplitude functions for the classical mixing

$\begin{equation}{r}_{{\Omega}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}=\hspace{2pt}\prod\limits _{p=1}^{{n}_{m}}\sqrt{\mathrm{cosh}\left({\mu }_{p}+\text{i}{\psi }_{p}\right)\mathrm{cosh}\left({\mu }_{p}-\text{i}{\psi }_{p}\right)},\end{equation} \tag{ 18 }$

$\begin{equation}{\vartheta }_{{\Omega}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}=\hspace{2pt}\prod\limits _{p=1}^{{n}_{m}}\mathrm{exp}\left[\frac{1}{2\text{i}}\enspace \mathrm{log}\left(\frac{-\mathrm{cosh}\left({\mu }_{p}+\text{i}{\psi }_{p}\right)}{\mathrm{cosh}\left({\mu }_{p}-\text{i}{\psi }_{p}\right)}\right)\right],\end{equation} \tag{ 19 }$

such that the vectorized mixing state takes the form ${\text{e}}^{\text{i}\enspace \mathrm{log}\enspace \vert {\vartheta }_{{\Omega}}\rangle }\vert {r}_{{\Omega}}\rangle$ . This allows for any element of the complete mixed state to be expressed according to

$\begin{equation}{\rho }_{{\Omega},{\Pi},{\Xi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}={\text{e}}^{\text{i}\enspace \mathrm{log}\left({{\Phi}}_{{\Xi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}{\vartheta }_{{\Omega}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}\right)}{{\Gamma}}_{{\Pi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}{r}_{{\Omega}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}.\end{equation} \tag{ 20 }$

2. Separable neural network architectures

2.1. Separable pure network states

Through restrictions on the connectivity of the weight matrix W_kj, one can guarantee separability of the generative network state. Let us define $\mathcal{K}$ as a collection of K-disjoint subsets $\mathcal{K}={\left\{{\boldsymbol{k}}_{l}\right\}}_{l=1}^{K}$ , that collect qudit indices from an n-qudit system. More precisely,

$\begin{equation}\mathcal{K}=\bigcup\limits _{l=1}^{K}{\boldsymbol{k}}_{l},\quad \text{s.t.}\enspace \left\{1,\dots ,n\right\}\subseteq \mathcal{K},\end{equation} \tag{ 21 }$

$\begin{equation}{\boldsymbol{k}}_{m}\cap {\boldsymbol{k}}_{l}=\varnothing,\quad \forall \enspace m\ne l\in \left\{1,\dots ,n\right\}.\end{equation} \tag{ 22 }$

In equation (21) we have demanded that the global partition set necessarily contains all n-qudits in the system, and that subsets of qudits are disjoint in equation (22). Hence, an n-qudit, pure state |Ψ⟩ is defined to be $\mathcal{K}$ -separable if it can expressed as a tensor-product of sub-states $\vert {\Psi}\rangle ={\bigotimes}_{\boldsymbol{k}\in \mathcal{K}}\vert {\psi }_{\boldsymbol{k}}\rangle$ , i.e. it is separable with respect to the partition set $\mathcal{K}$ . This is a very precise format of separability, as it precisely specifies the arrangement of entangled parties. If we were to disregard specific party orderings we would refer to $\left(\vert \mathcal{K}\vert =K\right)$ -separability.

Disjointedness in this definition of $\mathcal{K}$ -separability ensures that each qudit is only entangled with respect to a single subset of the quantum system. This provides a specific level of detail to the entanglement structure, while also degenerating many forms of entanglement that we may not be interested in. For example, genuine tripartite entanglement under disjoint $\mathcal{K}$ -separability allows for only a single set $\mathcal{K}=\left\{{\boldsymbol{k}}_{1}\right\}=\left\{1,2,3\right\}$ with no partitions. We may then define non-disjoint $\mathcal{K}$ -separability as an extension of the previous definition simply by removing the conditions in equation (22). Using this non-disjoint definition, genuine tripartite entanglement allows for many more definitions, $\mathcal{K}=\left\{1,2,3\right\},\left\{1,2\vert 2,3\right\},\left\{1,2\vert 2,3\vert 1,3\right\},\dots$ , which is studied in later sections (see figure 3 for an example).

To strictly impose either type of separability on an NNS, the goal is to express the wavefunction of the network state in the following form

$\begin{equation}{{\Psi}}_{{\Pi}}\left(\boldsymbol{s}\right)=\prod\limits _{l=1}^{K}{\psi }_{{\Pi}}^{{\boldsymbol{k}}_{l}}\left(\boldsymbol{s}\right),\end{equation} \tag{ 23 }$

where ${\psi }_{{\Pi}}^{{\boldsymbol{k}}_{l}}$ are separable sub-wavefunctions that describe the behaviour of qudits in the partition k _l. We may then construct an analogous hidden-layer partition set $\mathcal{H}={\left\{{\boldsymbol{h}}_{l}\right\}}_{l=1}^{K}$ , which assigns a subset of hidden units to each visible subset of entangled qudits $\mathcal{K}={\left\{{\boldsymbol{k}}_{l}\right\}}_{l=1}^{K}$ . By segmenting the layer of hidden units into these K-subsets and applying the following restriction to the weight matrix

$\begin{equation}{W}_{ij}=0\quad \text{for}\enspace i\in {\boldsymbol{k}}_{l},\quad j\notin {\boldsymbol{h}}_{l},\quad \forall \enspace l\in \left\{1,\dots ,K\right\},\end{equation} \tag{ 24 }$

this condition then provides the complete, $\mathcal{K}$ -separable network state

$\begin{align}{{\Psi}}_{{\Pi}\vert \mathcal{K}}\left(\boldsymbol{s}\right)=\prod\limits _{l=1}^{K}{\text{e}}^{{\tilde {\omega }}_{l}\left(\boldsymbol{s}\right)}\prod\limits _{j\in {\boldsymbol{h}}_{l}}2\enspace \mathrm{cosh}\left({\theta }_{l}^{j}\left(\boldsymbol{s}\right)\right),\\ {\theta }_{l}^{j}\left(\boldsymbol{s}\right)=\sum\limits _{i\in {\boldsymbol{k}}_{l}}{W}_{ij}{s}_{i}+{b}_{j},\qquad {\tilde {\omega }}_{l}\left(\boldsymbol{s}\right)=\sum\limits _{i\in {\boldsymbol{k}}_{l}}{a}_{i}{s}_{i}.\end{align} \tag{ 25 }$

2.2. Separable neural network density matrices

Whilst pure states are $\mathcal{K}$ -separable when they can be expressed as the tensor product of $\vert \mathcal{K}\vert =k$ local sub-states, a mixed state possesses a form of separability iff it can be expressed as a convex combination of local sub-states ${\rho }^{{\left\{{\boldsymbol{k}}_{l}\right\}}_{l=1}^{K}}$ . It is now useful to define two distinct forms of separability; consistent and inconsistent mixed-multipartite separability.

**Figure 3.** Different pure state network architectures used to simulate genuine tripartite entanglement. Panel (a) depicts a form of GHZ-type entanglement according to the partition set ${\mathcal{K}}_{\text{GHZ}}=\left\{1,2\vert 2,3\right\}$ . Notice that qudits 1 and 3 do not possess a direct connection, but may relay correlations through qudit 2. Panel (b) illustrates a non-disjoint, W-type entanglement structure according to $\mathcal{K}=\left\{1,2\vert 2,3\vert 1,3\right\}$ .
Download figure:
Standard image High-resolution image

**Figure 3.** Different pure state network architectures used to simulate genuine tripartite entanglement. Panel (a) depicts a form of GHZ-type entanglement according to the partition set ${\mathcal{K}}_{\text{GHZ}}=\left\{1,2\vert 2,3\right\}$ . Notice that qudits 1 and 3 do not possess a direct connection, but may relay correlations through qudit 2. Panel (b) illustrates a non-disjoint, W-type entanglement structure according to $\mathcal{K}=\left\{1,2\vert 2,3\vert 1,3\right\}$ .
Download figure:
Standard image High-resolution image

A state is consistently $\mathcal{K}$ -separable if it can be expressed as a convex combination of states which all admit an identical form of separability,

$\begin{equation}{\rho }^{\mathcal{K}}=\sum\limits _{j}{p}_{j}\underset{\boldsymbol{k}\in \mathcal{K}}{\bigotimes}{\rho }_{j}^{\boldsymbol{k}}.\end{equation} \tag{ 26 }$

On the contrary, a state is inconsistently $\left\{{\mathcal{K}}_{j}\right\}$ -separable if it is a mixture of states with different entanglement properties,

$\begin{equation}{\rho }^{\left\{{\mathcal{K}}_{j}\right\}}=\sum\limits _{j}{p}_{j}\underset{\boldsymbol{k}\in {\mathcal{K}}_{j}}{\bigotimes}{\rho }_{j}^{\boldsymbol{k}},\end{equation} \tag{ 27 }$

so its entanglement properties are defined by a combination of constituent ${\mathcal{K}}_{j}$ -separabilities. Precise classification methods are much more difficult for mixed states, however there are still some very useful approaches that can be introduced using NNS.

Consistently $\mathcal{K}$ -separable states require a direct application of the separability conditions given by equation (24) onto the pure state of the NNS. Since the mixing state cannot capture quantum correlations, it is already separable and requires no restrictions. It is thus expedient to apply the separability conditions of equation (24) onto the pure states of the mixed NNS, restricting the capacity of the neural network to simulate quantum correlations. Enforcing separability on the pure density-matrix in this way

$\begin{align}{\sigma }_{{\Xi}\vert \mathcal{K}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}& =\prod\limits _{l=1}^{K}{\text{e}}^{{\omega }_{l}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}\prod\limits _{j\in {\boldsymbol{h}}_{l}}\mathrm{cosh}\left({\theta }_{l}^{j}\left(\boldsymbol{\alpha }\right)\right)\mathrm{cosh}\left({\theta }_{l}^{j{\ast}}\left(\boldsymbol{\beta }\right)\right),\\ {\omega }_{l}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)& =\sum\limits _{i\in {\boldsymbol{k}}_{l}}{a}_{i}{\alpha }_{i}+{a}_{i}^{{\ast}}{\beta }_{i},\end{align} \tag{ 28 }$

thus provides an NNS guaranteed to be consistently $\mathcal{K}$ -separable

$\begin{equation}{\rho }_{{\Pi},{\Xi}}^{\mathcal{K}}={\mathcal{P}}_{{\Pi}}\odot {\sigma }_{{\Xi}\vert \mathcal{K}}.\end{equation} \tag{ 29 }$

If one wishes to enforce complete separability such that for an n-qudit state $\rho ={\sum }_{j}{p}_{j}{\bigotimes}_{m=1}^{n}{\rho }_{j}^{m}$ , one can of course just apply consistent separability onto the network state via the separability set $\mathcal{K}=\left\{1\vert 2\vert ,\dots ,\vert n\right\}$ in an identical manner as before. However, as the state is completely separable, there are no quantum correlations and the pure states in the network ansatz are not necessary for simulation of the state. It can then be simplified to ${\rho }_{{\Pi}}={\mathcal{P}}_{{\Pi}}$ , and we can simulate completely separable mixed quantum systems using an RBM with a classical mixing-layer only [31]

$\begin{equation}{\rho }_{{\Pi}}^{\text{Sep}}=\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}{\text{e}}^{\omega \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}\prod\limits _{p=1}^{{n}_{\text{m}}}\mathrm{cosh}\left({\phi }_{p}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\right)\vert \boldsymbol{\alpha }\rangle \langle \boldsymbol{\beta }\vert .\end{equation} \tag{ 30 }$

Unfortunately, it is not possible to strictly classify an inconsistently separable mixed state according to ansatzes discussed in this section. Take the tripartite example

$\begin{equation}\rho =\sum\limits _{j}{p}_{j}{\rho }_{j}^{\left\{1,2\vert 3\right\}}+\sum\limits _{k}{p}_{k}{\rho }_{k}^{\left\{1\vert 2,3\right\}}+\sum\limits _{m}{p}_{m}{\rho }_{m}^{\left\{1,3\vert 2\right\}},\end{equation} \tag{ 31 }$

which can be thought of as 'cheap' genuine tripartite entangled state. We can certainly define an NNS that can reconstruct a state of this form (trivially, one can utilize a fully connected NNS that can reconstruct ρ); however we cannot specify all three forms of separability in ρ without also allowing the NNS to potentially manifest genuine, pure tripartite entanglement. One can instead utilize independent consistently separable NNS according to the partitions {1, 2|3}, {1, 3|2} and {2, 3|1} in order to quantify the amount of entanglement in the target state with respect to each partition.

3. Classifying and quantifying entanglement

3.1. Learning of quantum states

We present a learning protocol for a pure NNS |Ψ_Π,Ξ⟩ to reconstruct a target state |φ⟩ using the ansatz from equation (7), which is then extendible to mixed states. We employ a unified learning approach, where the variational state optimizes the global, vectorized fidelity with a target state, rather than separate phase and amplitude fidelities. We may define the loss function as the negative logarithmic fidelity between two pure states as a function of our set of variational parameters

$\begin{equation}\mathcal{L}=-\mathrm{log}\sqrt{\frac{\vert \langle {{\Psi}}_{{\Pi},{\Xi}}\vert \varphi \rangle {\vert }^{2}}{\langle {{\Psi}}_{{\Pi},{\Xi}}\vert {{\Psi}}_{{\Pi},{\Xi}}\rangle \langle \varphi \vert \varphi \rangle }}.\end{equation} \tag{ 32 }$

Splitting these wavefunctions into respective phase and amplitude functions,

$\begin{equation}{{\Psi}}_{{\Pi},{\Xi}}\left(\boldsymbol{s}\right)={\psi }_{{\Pi}}\left(\boldsymbol{s}\right){\text{e}}^{\text{i}\enspace \mathrm{log}\left({\phi }_{{\Xi}}\left(\boldsymbol{s}\right)\right)},\qquad \varphi \left(\boldsymbol{s}\right)=\lambda \left(\boldsymbol{s}\right){\text{e}}^{\text{i}\enspace \mathrm{log}\left(\xi \left(\boldsymbol{s}\right)\right)},\end{equation} \tag{ 33 }$

we wish to compute the derivatives of the unified cost function with respect to the parameter sets {Π, Ξ}. Since these wavefunctions utilize only real parameters, it is expedient to compute the derivatives using the following chain rule formulation,

$\begin{equation}{\nabla }_{k}^{{\psi }_{{\Pi}}}\mathcal{L}=\frac{\partial \mathcal{L}}{\partial \vert {\psi }_{{\Pi}}\rangle }\cdot \frac{\partial \vert {\psi }_{{\Pi}}\rangle }{\partial {{\Pi}}_{k}},\qquad {\nabla }_{k}^{{\phi }_{{\Xi}}}\mathcal{L}=\frac{\partial \mathcal{L}}{\partial \vert {\phi }_{{\Xi}}\rangle }\cdot \frac{\partial \vert {\phi }_{{\Xi}}\rangle }{\partial {{\Xi}}_{k}}.\end{equation} \tag{ 34 }$

Computing these gradients will provide the necessary parameter update rules at the mth iteration to the kth network parameter by gradient descent, taking the form

$\begin{equation}{{\Pi}}_{k}^{m+1}={{\Pi}}_{k}^{m}-\eta {\nabla }_{k}^{{\psi }_{{\Pi}}}\mathcal{L},\qquad {{\Xi}}_{k}^{m+1}={{\Xi}}_{k}^{m}-\eta {\nabla }_{k}^{{\phi }_{{\Xi}}}\mathcal{L},\end{equation} \tag{ 35 }$

where η is some learning rate small enough such that the network state converges to the target state over sufficient iterations of the learning scheme.

Defining the quantity

$\begin{equation}{\Delta}\left(\boldsymbol{s}\right)={\langle {{\Psi}}_{{\Pi},{\Xi}}\vert \varphi \rangle }^{-1}\enspace {\text{e}}^{\text{i}\enspace \mathrm{log}\enspace \frac{{\phi }_{{\Xi}}\left(\boldsymbol{s}\right)}{\xi \left(\boldsymbol{s}\right)}},\end{equation} \tag{ 36 }$

complete gradients with respect to variational parameters can therefore be computed as

$\begin{equation}{\nabla }_{k}^{{\psi }_{{\Pi}}}\mathcal{L}=\sum\limits _{\boldsymbol{s}}\left[\frac{{\psi }_{{\Pi}}\left(\boldsymbol{s}\right)}{\vert {{\Psi}}_{{\Pi},{\Xi}}{\vert }^{2}}-\lambda \left(\boldsymbol{s}\right)\text{Re}\left[{\Delta}\left(\boldsymbol{s}\right)\right]\right]{\mathcal{O}}_{k}^{{\Pi}}\vert {\psi }_{{\Pi}}\rangle ,\end{equation} \tag{ 37 }$

$\begin{equation}{\nabla }_{k}^{{\phi }_{{\Xi}}}\mathcal{L}=-\sum\limits _{\boldsymbol{s}}\left[\frac{\lambda \left(\boldsymbol{s}\right){\psi }_{{\Pi}}\left(\boldsymbol{s}\right)}{{\phi }_{{\Xi}}\left(\boldsymbol{s}\right)}\enspace \text{Im}\left[{\Delta}\left(\boldsymbol{s}\right)\right]\right]{\mathcal{O}}_{k}^{{\Xi}}\vert {\phi }_{{\Xi}}\rangle ,\end{equation} \tag{ 38 }$

where ${\mathcal{O}}_{k}^{{\Pi}}=\mathrm{diag}\left({\partial }_{{{\Pi}}_{k}}\enspace \mathrm{log}\enspace \vert {\psi }_{{\Pi}}\rangle \right)$ , ${\mathcal{O}}_{k}^{{\Xi}}=\mathrm{diag}\left({\partial }_{{{\Xi}}_{k}}\enspace \mathrm{log}\enspace \vert {\phi }_{{\Xi}}\rangle \right)$ denote diagonal matrices containing the logarithmic derivatives of the network state with respect to the kth amplitude and phase network parameters respectively. Utilizing equation (38) in the update rule given by equation (35), the phase and amplitude properties will optimize in a unified manner, maximizing the fidelity between the network and the target state endowed with non-trivial phase structure.

Fortunately this learning procedure is readily extended to mixed states via the ansatz in equation (20). Since the variational state is in a complex exponential format, one then formulates a cost function based on the fidelity between the vectorized density-matrix and the vectorized target state. The extension is straightforward and explained in appendix B.

As shown in reference [28] separable neural network states can be used to perform entanglement classification and provide entanglement measures of pure, two-dimensional quantum states. Using qudit sub-encoding and the mixed state architectures discussed in the previous sections, these ideas can be extended to classification of more complex quantum systems.

Let us devise a precise decision rule for classification. Consider a target n-qudit state σ, a $\mathcal{K}$ -separable learner ${\rho }_{{\Omega}}^{\mathcal{K}}$ , and a free, entangled learner ${\rho }_{{\Omega}}^{\text{Ent}}$ which have both been optimized with respect to reconstructing σ. Using the Bures fidelity, $F\left(\sigma ,\rho \right)=\mathrm{Tr}\sqrt{\sqrt{\sigma }\rho \sqrt{\sigma }}$ , we denote the reconstruction fidelity of a learning process as the final/optimal fidelity achieved after a given number of learning iterations. A target σ is learnable via ${\rho }_{{\Omega}}^{\text{Ent}}$ iff its reconstruction fidelity satisfies

$\begin{equation}F\left(\sigma ,{\rho }_{{\Omega}}^{\text{Ent}}\right){\geqslant}{F}_{\text{opt}}=1-{\epsilon},\end{equation} \tag{ 39 }$

for a sufficiently small threshold . The choice of F_opt determines the reliability of classification, and in our numerical experiments we fix ⩽ 10⁻⁴. The accuracy of this reconstruction via free learning also benchmarks the satisfactory computational resources required in the network, informing the separable reconstruction.

One can reliably infer that a target state is $\mathcal{K}$ -separable if it is learnable by both a free NNS ( ${\rho }_{{\Omega}}^{\text{Ent}}$ ), and a $\mathcal{K}$ -separable NNS ( ${\rho }_{{\Omega}}^{\mathcal{K}}$ ). Then the NNS reconstruction fidelities must satisfy

$\begin{equation}F\left(\sigma ,{\rho }_{{\Omega}}^{\mathcal{K}}\right){\geqslant}F\left(\sigma ,{\rho }_{{\Omega}}^{\text{Ent}}\right){\geqslant}{F}_{\text{opt}}.\end{equation} \tag{ 40 }$

Otherwise, the state is entangled to a higher degree. One may then quantify the entanglement content of the target by investigating the distance between σ and an approximation to the closest $\mathcal{K}$ -separable state.

3.2. Quantifying entanglement

The most difficult aspect of quantifying entanglement stems from the complicated nature of characterising the space of separable quantum states. Thanks to the implicit guarantee of specific separability, SNNS offer an extremely useful tool to help with this, and provide the opportunity to study a variety of entanglement measures that are otherwise much too difficult to explore.

Let us consider measures E that satisfy the general properties of a valid entanglement measure [4]. Many important types of E are constructed as a geometric optimization problem with respect to the space of all fully separable states ${\mathcal{D}}_{\text{Sep}}$ . That is, given a target state σ and a distance measure (possibly quasi-distance measure) f,

$\begin{equation}E\left(\sigma \right)=\underset{\rho \in {\mathcal{D}}_{\text{Sep}}}{\mathrm{min}}f\left(\sigma ,\rho \right),\end{equation} \tag{ 41 }$

$\begin{equation}\text{if}\enspace \sigma \in {\mathcal{D}}_{\text{Sep}}\enspace \Longrightarrow\enspace E\left(\sigma \right)=0,\end{equation} \tag{ 42 }$

$\begin{equation}\text{if}\enspace \sigma \notin {\mathcal{D}}_{\text{Sep}}\enspace \Longrightarrow\enspace E\left(\sigma \right){ >}0.\end{equation} \tag{ 43 }$

These are entanglement measures which are computed by locating the closest separable state (CSS) σ^⋆ to σ, with respect to the distance measure f. For such measures, the employment of SNNS to parameterize the separable states ${\rho }_{{\Omega}}\in {\mathcal{D}}_{\text{Sep}}$ is extremely useful, as it offers an efficient way to perform this optimization. Furthermore, since SNNS are inherently separable, they will always approximate an upper bound on E, since they are certifiably limited in the quantum correlations that they are able to simulate. This is,

$\begin{equation}E\left(\sigma \right){\leqslant}{E}_{{\Omega}}\left(\sigma \right)=\underset{{\rho }_{{\Omega}}\in {\mathcal{D}}_{\text{Sep}}}{\mathrm{min}}f\left(\sigma ,{\rho }_{{\Omega}}\right).\end{equation} \tag{ 44 }$

To generalize, we may construct a measure ${E}^{\mathcal{K}}$ which is analogous to E, but is defined with respect to the space of all states which are at most $\mathcal{K}$ -separable. Defining the set of all states that are $\mathcal{K}$ -separable as ${\mathcal{D}}_{\mathcal{K}}$ , then the set of all states that are at most $\mathcal{K}$ -separable is given by [32]

$\begin{equation}{\tilde {\mathcal{D}}}_{\mathcal{K}}={\mathcal{D}}_{\mathcal{K}}\bigcup\limits _{\vert {\mathcal{K}}^{\prime }\vert { >}\vert \mathcal{K}\vert }{\mathcal{D}}_{{\mathcal{K}}^{\prime }}.\end{equation} \tag{ 45 }$

Assuming a measure of the form equation (41), then we can define

$\begin{equation}{E}^{\mathcal{K}}\left(\sigma \right)=\underset{\rho \in {\tilde {\mathcal{D}}}_{\mathcal{K}}}{\mathrm{min}}f\left(\sigma ,\rho \right){\leqslant}{E}_{{\Omega}}^{\mathcal{K}}\left(\sigma \right),\end{equation} \tag{ 46 }$

$\begin{equation}\text{if}\enspace \sigma \in {\tilde {\mathcal{D}}}_{\mathcal{K}}\enspace \Longrightarrow\enspace {E}^{\mathcal{K}}\left(\sigma \right)=0,\end{equation} \tag{ 47 }$

$\begin{equation}\text{if}\enspace \sigma \notin {\tilde {\mathcal{D}}}_{\mathcal{K}}\enspace \Longrightarrow\enspace {E}^{\mathcal{K}}\left(\sigma \right){ >}0.\end{equation} \tag{ 48 }$

${E}^{\mathcal{K}}$ satisfies all the general properties of an entanglement measure, but now with respect to ${\tilde {\mathcal{D}}}_{\mathcal{K}}$ , and is therefore able to classify/quantify more complex forms of entanglement.

Let us specify some important entanglement measures which SNNS can utilize, starting from the geometric measure of entanglement (GME) [33]. For pure states, the GME is the maximum fidelity that can be obtained between a target state |σ⟩ and the set of pure, at most $\mathcal{K}$ -separable states ${\tilde {\mathcal{B}}}_{\mathcal{K}}$

$\begin{equation}{E}_{\text{G}}\left(\vert \sigma \rangle \right)=\underset{\vert \varphi \rangle \in {\tilde {\mathcal{B}}}_{\mathcal{K}}}{\mathrm{max}}F\left(\vert \sigma \rangle ,\vert \varphi \rangle \right).\end{equation} \tag{ 49 }$

For more sophisticated mixed state approaches, it is expedient to employ any number of density-matrix distance measures. Several important examples include the trace distance

$\begin{equation}{E}_{{\text{C}}_{1}}\left(\sigma \right)=\frac{1}{2}\underset{\rho \in {\mathcal{D}}_{\text{Sep}}}{\mathrm{min}}{\Vert}\sigma -\rho {{\Vert}}_{1},\end{equation} \tag{ 50 }$

where ${\Vert}X{{\Vert}}_{1}=\mathrm{Tr}\sqrt{{X}^{{\dagger}}X}$ or the Bures metric

$\begin{equation}{E}_{\text{B}}\left(\sigma \right)=\underset{\sigma \in {\mathcal{D}}_{\text{Sep}}}{\mathrm{min}}\left[1-{F}^{2}\left(\rho ,\sigma \right)\right],\end{equation} \tag{ 51 }$

where F is the Bures fidelity as before. These quantities are readily approximated via SNNS, and easily specified to different forms of $\mathcal{K}$ -separability.

Of particular interest is the REE [29], an entanglement measure that has many applications in quantum communications and channel capacities [34]. The REE is based on the quantum relative entropy (QRE), a kind of distance measure between two quantum states where

$\begin{equation}S\left(\rho {\Vert}\sigma \right)=\mathrm{Tr}\left[\rho \left(\mathrm{log}\enspace \rho -\mathrm{log}\enspace \sigma \right)\right],\end{equation} \tag{ 52 }$

such that S(ρ||σ) ∈ [0, +∞). Due to its asymmetry and the fact that it is infinite on pure states, it is not a true metric. However, the QRE is an important distinguishability measure between quantum states which provides access to important entropic quantities such as the Shannon entropy. Minimizing the relative entropy with respect to the set of all separable quantum states results in the REE

$\begin{equation}{E}_{\text{R}}\left(\rho \right)=\underset{\sigma \in {\mathcal{D}}_{\text{Sep}}}{\mathrm{min}}S\left(\rho {\Vert}\sigma \right),\end{equation} \tag{ 53 }$

which can be readily employed with respect to parameterized NNS. This can of course generalize to ${E}_{\text{R}}^{\mathcal{K}}\left(\sigma \right)$ given a form of separability. Interestingly, the REE is sub-additive and in general

$\begin{equation}{E}_{\text{R}}\left(\rho \otimes \sigma \right){\leqslant}{E}_{\text{R}}\left(\rho \right)+{E}_{\text{R}}\left(\sigma \right).\end{equation} \tag{ 54 }$

This lets us define a regularized n-shot REE

$\begin{equation}{E}_{\text{R}}^{n}\left(\rho \right)=\frac{1}{n}\underset{\sigma \in {\mathcal{D}}_{\text{Sep}}}{\mathrm{min}}S\left({\rho }^{\otimes n}{\Vert}\sigma \right){\leqslant}{E}_{\text{R}}\left(\rho \right).\end{equation} \tag{ 55 }$

The single-shot, standard REE alone is an extremely difficult quantity to compute, largely due to the characterization of ${\mathcal{D}}_{\text{Sep}}$ and the unruliness of the QRE. Its computation has recently been explored using an active learning strategy [35], in which the authors use active learning to compress ${\mathcal{D}}_{\text{Sep}}$ into a more relevant subset of the separable state space that contributes strongly to the REE. Thanks to the implicit separability of NNS, we may choose an alternative approach where it is possible to optimise some other cost function such as fidelity/trace distance that will simultaneously minimise the QRE towards the optimal REE. In doing so, SNNS should allow for the accurate and efficient approximation of E_R, and previously unexplored REEs with respect to other forms of separability ${E}_{\text{R}}^{\mathcal{K}}$ .

4. Applications and results

4.1. Mixed states in d-dimensions

The most substantial generalisation of the methods introduced in reference [28] is the ability to classify and quantify entanglement in mixed, d-dimensional states. To illustrate this improvement, consider the d-dimensional Werner state, parameterized by

$\begin{equation}{\varrho }_{\eta ,d}=\frac{\left(d-\eta \right){\mathbb{I}}_{d}^{\otimes 2}+\left(d\eta -1\right){\mathbb{F}}_{d}}{d\left({d}^{2}-1\right)},\end{equation} \tag{ 56 }$

where ${\mathbb{F}}_{d}={\sum }_{i,j=0}^{d-1}\vert ij\rangle \langle ji\vert$ is the two-qudit flip operator, ${\mathbb{I}}_{d}$ is the d-dimensional identity operator, and η characterizes the entanglement properties of the state. For η ∈ [−1, 0] the state is entangled, and we can easily quantify this entanglement using the analytically known REE [36],

$\begin{equation}{E}_{\text{R}}\left({\varrho }_{\eta ,d}\right)=\frac{1+\eta }{2}\enspace {\mathrm{log}}_{2}\left(1+\eta \right)+\frac{1-\eta }{2}\enspace {\mathrm{log}}_{2}\left(1-\eta \right).\end{equation} \tag{ 57 }$

In figure 4 we display an optimization procedure for d = 5, η = −0.75 using an entangled learner ${\rho }_{{\Omega}}^{\text{Ent}}$ and a fully separable learner ${\rho }_{{\Omega}}^{\text{Sep}}$ . The free, entangled learner is able to reconstruct the target Werner state with ease, and an extremely high fidelity, while the fully separable learner correctly classifies the target as entangled.

epsilon — **Figure 4.** The classification and entanglement quantification of a d = 5 Werner state ϱ_η,d, defined in equation (56) for η = −0.75. Using NNS, the REE was approximated to within < 10⁻⁵ precision of the known analytical value E_R(ϱ_η,d) ≈ 0.4564 [36]. The entangled network used 10 hidden mixing neurons and 10 hidden pure state neurons, whilst the separable network used 10 hidden mixing neurons. The density matrices of the (approximate) CSS ${\rho }_{{\Omega}}^{\text{Sep}}\approx {\varrho }_{\eta ,5}^{\star }$ and target state approximations are also shown.
Download figure:
Standard image High-resolution image

Beyond the obvious entanglement classification, the SNNS is able to quantify the REE of the state, by monitoring the relative entropy ${E}_{\text{R}}^{{\Omega}}\left({\varrho }_{\eta ,d}\right)=S\left({\varrho }_{\eta ,d}{\Vert}{\rho }_{{\Omega}}^{\text{Sep}}\right)$ throughout the learning process. As the optimization converges, ${E}_{\text{R}}^{{\Omega}}\to {E}_{\text{R}}$ , we gather an approximation to the REE of the state. Indeed, under typical optimization settings, the REE is approximated to within < 10⁻⁵ precision of the known analytical value E_R(ϱ_−0.75,5) ≈ 0.4564, reinforcing the strength of this approach.

4.2. Classification of bound entangled states

The positivity of a partially transposed quantum system can be a signature of separability. However it is not universal, and there exist classes of states which are PPT but are entangled, known as bound entangled (BE) states. Here we consider the following two-qutrit state,

$\begin{align}{\sigma }_{+}& =-\frac{1}{3}\left(\vert 01\rangle \langle 01\vert +\vert 12\rangle \langle 12\vert +\vert 20\rangle \langle 20\vert \right),\\ {\sigma }_{-}& =\frac{1}{3}\left(\vert 10\rangle \langle 10\vert +\vert 21\rangle \langle 21\vert +\vert 02\rangle \langle 02\vert \right),\\ {\sigma }_{\alpha }& =\frac{2}{7}\vert {{\Phi}}^{+}\rangle \langle {{\Phi}}^{+}\vert +\frac{\alpha }{7}{\sigma }_{+}+\frac{5-\alpha }{7}{\sigma }_{-},\end{align} \tag{ 58 }$

where $\vert {{\Phi}}^{+}\rangle =\frac{1}{\sqrt{3}}\left(\vert 00\rangle +\vert 11\rangle +\vert 22\rangle \right)$ is a d = 3-dimensional Bell state. This state is known to satisfy the following entanglement properties [37]:

$\begin{equation}{\sigma }_{\alpha }\;\text{is}\;\begin{cases}\;\text{Separable}\quad \text{if}\enspace 2{\leqslant}\alpha {\leqslant}3,\quad \\ \;\text{Bound}\;\text{entangled}\quad \text{if}\enspace 3{< }\alpha {\leqslant}4,\quad \\ \;\text{Free}\;\text{entangled}\quad \text{if}\enspace 4{< }\alpha {\leqslant}5.\quad \end{cases}\end{equation} \tag{ 59 }$

Here we investigate the target state in the BE region, and show that this bipartite state cannot be optimally reconstructed via SNNS. Figure 5 depicts the employment of entangled learners ${\rho }_{{\Omega}}^{\text{Ent}}$ (blue), and fully separable learners ${\rho }_{{\Omega}}^{\text{Sep}}$ (red) to reconstruct σ_α across the domain 3 < α ⩽ 4.

For all values of α, ${\rho }_{{\Omega}}^{\text{Ent}}$ is able to reconstruct the state to a high degree of precision such that the trace distance is ${\Vert}{\sigma }_{\alpha }-{\rho }_{{\Omega}}^{\text{Ent}}{{\Vert}}_{1}{\leqslant} 1{0}^{-4}$ . However, the separable learners are unable to reach this level of reconstruction accuracy. Hence, since σ_α are learnable via free NNS, the inability of ${\rho }_{{\Omega}}^{\text{Sep}}$ to reconstruct σ_α implies that these states are entangled in this region. Since they are also PPT in this region, we have successfully shown the ability of SNNS to classify bound entanglement.

During each constrained optimization we gather an upper bound on the distance between the target BE state, and its CSS. As said before, this is an upper bound since ${\rho }_{{\Omega}}^{\text{Sep}}$ offers an approximation to the CSS, and is potentially loose. Nonetheless the inferred classification is informative. Figure 5 plots the trace distance ${\Vert}{\sigma }_{\alpha }-{\rho }_{{\Omega}}^{\text{Sep}}{{\Vert}}_{1}$ , shown to steadily rise as α increases, which is expected as σ_α becomes freely entangled for 4 < α ⩽ 5.

4.3. Detection and measurement of multipartite entanglement

The versatility of the $\mathcal{K}$ -separable state design means that we can explore entanglement classification and quantification methods that are otherwise very difficult. In particular, we may construct an NNS protocol that is able to witness W/GHZ-state entanglement, and measure W/GHZ-type correlations in both pure and mixed quantum states. Consider the three-qubit W and GHZ states respectively [38, 39]

$\begin{align*}& \vert \text{W}\rangle =\frac{1}{\sqrt{3}}\left(\vert 001\rangle +\vert 010\rangle +\vert 100\rangle \right),\\ & \vert \text{GHZ}\rangle =\frac{1}{\sqrt{2}}\left(\vert 000\rangle +\vert 111\rangle \right).\end{align*}$

These are both maximally entangled three party states. However they possess two inequivalent forms of tripartite entanglement, such that |W⟩ cannot be transformed into |GHZ⟩ by means of LOCC (local operations and classical communications) strategies. The key difference in these forms of entanglement is their robustness i.e. when a party is removed from a GHZ state the remaining states are separable, whilst a W-state remains entangled. Therefore a W-state possesses strict bipartite entanglement between all three parties, whereas GHZ entanglement can be achieved via 'relayed entanglement' [40].

To classify between these states, we must define a partition set that is capable of capturing GHZ correlations, but incompletely capture W-type correlations. The non-disjoint separability set

$\begin{equation}{\mathcal{K}}_{\text{W}}=\left\{1,2\vert 2,3\vert 1,3\right\},\end{equation} \tag{ 60 }$

is capable of learning both W and GHZ entangled states, as it strictly specifies bipartite entanglement between all parties. However, one can construct the partition set

$\begin{equation}{\mathcal{K}}_{\text{GHZ}}=\left\{i,j\vert i,k\right\},\quad i\ne j\ne k\in \left\{1,2,3\right\},\end{equation} \tag{ 61 }$

which is any possible permutation of two subsets of ${\mathcal{K}}_{\text{W}}$ . Programming an NNS according to ${\mathcal{K}}_{\text{GHZ}}$ does not allow the network to capture direct correlations between qubits j and k, and will therefore provide an insufficient ansatz to reconstruct W-states. This forms a witness for W-type entanglement; if a target state is learnable via an NNS endowed with ${\mathcal{K}}_{\text{W}}$ -separability, but is not learnable via ${\mathcal{K}}_{\text{GHZ}}$ -separability, then the state is verified as possessing W-type entanglement. Furthermore, by constructing entanglement measures ${E}_{{\Omega}}^{{\mathcal{K}}_{\text{GHZ}}}$ we are able to measure the amount of W-type correlations within a target state.

Figure 6(a) shows the pure state classification of a three-qubit W-state, where the non-disjoint network architectures perform classification easily. Note that these three-qubit partitions can be analogously embedded into larger, n-qudit systems in order to study more complex forms of entanglement.

Realistically, multipartite entangled resources for future quantum communication/computing protocols will be noisy and imperfect. Generating and distributing multipartite entanglement over noisy quantum channels is fundamental for many future quantum technologies, particularly for secure communications and quantum networks [41–48]. Therefore it is a more interesting challenge to consider the classification and quantification of tripartite entanglement subject to decoherence. For instance, one can consider versions of |W⟩/|GHZ⟩ in which each qudit has been passed through a depolarizing channel

$\begin{equation}{\mathcal{E}}_{\text{D}}\left(\rho \right)=\left(1-p\right)\rho +\frac{p}{{d}^{n}}{\mathbb{I}}_{d}^{\otimes n},\end{equation} \tag{ 62 }$

where n denotes the number of qudits being acted on (in this case n = 3). We denote these noisy, three-qubit states as

$\begin{equation}{\sigma }_{\text{W}}^{p}=\left(1-p\right)\vert \text{W}\rangle \langle \text{W}\vert +\frac{p}{8}{\mathbb{I}}_{2}^{\otimes 3},\end{equation} \tag{ 63 }$

$\begin{equation}{\sigma }_{\text{GHZ}}^{p}=\left(1-p\right)\vert \text{GHZ}\rangle \langle \text{GHZ}\vert +\frac{p}{8}{\mathbb{I}}_{2}^{\otimes 3}.\end{equation} \tag{ 64 }$

Using mixed NNS programmed with different separabilities, we may then easily distinguish between the entanglement properties of noisy W/GHZ-states subject to depolarizing channels. Indeed, figure 6(b) shows that for $p=\frac{1}{3}$ we can perform this classification. Given two learners ${\rho }_{{\Omega}}^{{\mathcal{K}}_{\text{W}}}$ and ${\rho }_{{\Omega}}^{{\mathcal{K}}_{\text{GHZ}}}$ , it is clear that both are able to optimally reconstruct the noisy GHZ-state, whilst only ${\rho }_{{\Omega}}^{{\mathcal{K}}_{\text{W}}}$ is able to optimally reconstruct the noisy W-state, completing the classification.

This is taken a step further in figure 6(c) where different versions of the REE of ${\sigma }_{\text{W}}^{p}$ is monitored for various depolarizing probabilities. This plot describes three forms of REE:

The standard E_R (red) defined on the space of all fully separable states (using the partition set ${\mathcal{K}}_{\text{FS}}=\left\{1\vert 2\vert 3\right\}$ ) which measures the amount of any entanglement present.
The genuine tripartite entangled REE, ${E}_{\text{R}}^{\text{Gen}}$ (green), using the bi-separable partition sets ${\mathcal{K}}_{\text{BS}}=\left\{i,j\vert k\right\},i\ne j\ne k\in \left\{1,2,3\right\}$ , which measures the amount of genuine tripartite entanglement in the state (W or GHZ correlations).
The W-REE, ${E}_{\text{R}}^{\text{W}}$ (blue) using the partition set ${\mathcal{K}}_{\text{GHZ}}$ in equation (61), which measures the amount of genuine, tripartite, strictly W-type entanglement within the state.

By employing more complex separable architectures, we may study how different forms of entanglement behave with respect to environmental properties, such as depolarization. By measuring ${E}_{\text{R}}^{\text{Gen}}$ and ${E}_{\text{R}}^{\text{W}}$ for instance, we may monitor the decoherence of genuine tripartite entanglement, rather than any entangle-ment as done so by E_R. Such characterizations could prove very useful in communication/networking scenarios, where genuine multipartite entanglement is critical to performance.

It is important to remind the reader that these are upper bounds. The standard REE upper bound is expected to be tight, as fully separable NNS architectures precisely capture full separability. However, ${\mathcal{K}}_{\text{BS}}$ and ${\mathcal{K}}_{\text{GHZ}}$ are degenerate, e.g. ${\mathcal{K}}_{\text{BS}}=\left\{i,j\vert k\right\}$ has 3 unique forms. Since mixed SNNS are restricted to consistent separabilities, there may be convex combinations of states of these separabilities that produce tighter bounds. It is unknown if this is the case, nonetheless ${E}_{\text{R}}^{\text{Gen}}$ and ${E}_{\text{R}}^{\text{W}}$ provide informative upper bounds on these unique entanglement measures.

4.4. Ultimate limits for channel capacities

We may provide a more practical example for the use of SNNS in the realm of quantum communications, using them to approximate upper bounds of quantum channel capacities. Introduced in reference [34], the Pirandola–Laurenza–Ottaviani–Banchi (PLOB) bound is an ultimate upper bound on the two-way assisted quantum (and secret-key) capacity $\mathcal{C}\left(\mathcal{E}\right)$ for a given quantum channel $\mathcal{E}$ . Its derivation is based on the techniques of channel simulation and teleportation stretching, which have proven to be extremely versatile in a number of settings [42, 49–53]. An essential class of quantum channels are those which are teleportation covariant, meaning that they satisfy the condition

$\begin{equation}\mathcal{E}\left(U\rho {U}^{{\dagger}}\right)=V\mathcal{E}\left(\rho \right){V}^{{\dagger}},\end{equation} \tag{ 65 }$

for some pair of teleportation unitaries {U, V}. Let us define the Choi matrix of a d-dimensional channel $\mathcal{E}$ as the result of passing one mode of a maximally entangled state Φ⁺ through the $\mathcal{E}$ , and the other through an identity channel $\mathcal{I}$

$\begin{equation}{\rho }_{\mathcal{E}}=\mathcal{I}\otimes \mathcal{E}\left[{{\Phi}}^{+}\right],\end{equation} \tag{ 66 }$

where the maximally entangled state may take the form ${{\Phi}}^{+}=\frac{1}{d}{\sum }_{i,j=0}^{d-1}\vert ii\rangle \langle jj\vert$ . For teleportation covariant channels, the ultimate channel capacity can then be upper bounded in a remarkably simple way [34]

$\begin{equation}\mathcal{C}\left(\mathcal{E}\right){\leqslant}{E}_{\text{R}}^{n}\left({\rho }_{\mathcal{E}}\right){\leqslant}{E}_{\text{R}}\left({\rho }_{\mathcal{E}}\right),\end{equation} \tag{ 67 }$

where E_R is the standard REE (and ${E}_{\text{R}}^{n}$ its n-shot version). SNNS can be used to approximate upper bounds on these channel capacities, via constrained reconstruction of the Choi state of the desired quantum channel.

We consider two important, teleportation covariant, d-dimensional quantum channels in an effort to illustrate the effectiveness of our approach: the depolarizing channel considered in equation (62), and the HW channel [54–56]. The Choi states of these channels are the classes of isotropic states and Werner states respectively, whose REE bounds are known analytically. Therefore, we can compare the numerical performance of computing the REE via SNNS with the known, exact bounds.

Figure 7(a) reports REE bounds on the capacity of depolarising channels for dimensions d = 2, 3, 4. Approximating these bounds via separable network states requires the targeted reconstruction of the isotropic state,

$\begin{equation}{\rho }_{{\mathcal{E}}_{\text{D}}}=\left(1-p\right){{\Phi}}^{+}+\frac{p}{{d}^{2}}{\mathbb{I}}_{d}^{\otimes 2}.\end{equation} \tag{ 68 }$

Using a bipartite SNNS ${\rho }_{{\Omega}}^{\text{Sep}}$ , and attempting to learn the target Choi state leads to an approximation of the REE of said state. Performing this optimization for many depolarizing probabilities p, the results in figure 7(a) can be produced. This is be achieved to a very high degree of accuracy, reproducing the analytical bounds with an average error ∼ < 10⁻⁵. Furthermore, these bounds can be computed very efficiently by performing each optimization sequentially, initializing the network parameters using the results of previous optimizations (see appendix C).

In figure 7(b) we give REE upper bounds for the HW channel, which takes the form

$\begin{equation}{\mathcal{E}}_{\;\text{HW}}^{\eta ,d}\left(\rho \right)=\frac{\left(d-\eta \right){\mathbb{I}}_{d}^{\otimes 2}+\left(d\eta -1\right){\rho }^{\text{T}}}{{d}^{2}-1},\end{equation} \tag{ 69 }$

such that T superscript denotes the transposition. The Choi state of the HW channel is the d-dimensional Werner state, introduced in equation (56). The single shot REE bounds for the HW channel are analytically known and given in equation (57), and are independent of dimension d. Again, this single shot bound is approximated to a good precision, as shown in the results.

For Werner states of dimension d > 2, their REE is known to be strictly sub-additive when $\eta {< }-\frac{d}{2}$ , and previous studies have explored the two-shot REE for these Choi states [55], which can therefore be used to tighten these upper bounds. For instance, in figure 7(b) the two-shot capacity can be seen to significantly tighten the bounds for d = 3. In order to compute these tighter bounds, one must modify the definition of the n-shot quantities slightly. Now the minimization is performed with respect to the space of all locally bi-separable states. Consider the n-copy Werner state, and let us label each copy with indices of its modes {i, j},

$\begin{equation}{\varrho }_{\eta ,d}^{\otimes n}={\varrho }_{\eta ,d}^{\left\{1,2\right\}}\otimes {\varrho }_{\eta ,d}^{\left\{3,4\right\}}\otimes \dots \otimes {\varrho }_{\eta ,d}^{\left\{2n-1,2n\right\}}.\end{equation} \tag{ 70 }$

The goal is now to find the CSS that possesses the following bi-separability

$\begin{equation}{\sigma }^{n}={\sigma }_{a}^{\left\{1,3,5,\dots ,2n-1\right\}}\otimes {\sigma }_{b}^{\left\{2,4,6,\dots ,2n\right\}},\end{equation} \tag{ 71 }$

where we have permuted the labels into a bi-separable decomposition such that each state belongs to exclusively even or odd mode labels. This corresponds to a situation where two users each possess n local modes, and their goal is to produce the closest state to ${\varrho }_{\eta ,d}^{\otimes n}$ that is bi-separable between them. In general this is a very difficult task, and while beyond the scope of this paper, poses as an interesting future application for SNNS.

5. Conclusions and outlook

We have generalized the concept of NNS with programmable separability to mixed, d-dimensional quantum states. We discussed a number of neural network architectures for the description of quantum states, and detailed how their entanglement properties may be controlled via constraints placed on network connectivity. It was shown that network connectivity controls entanglement structure on a very specific level, requiring distinctions between certain forms of entanglement. Outlining one of many possible optimisation protocols, methods of classification and quantification via SNNS have been logically developed, and applied in a number of important settings. We then studied a practical application of these tools in the bounding of ultimate quantum channel capacities, showing that they can reproduce the PLOB bounds for DV channels with high precision.

There are a number of valuable future directions in which SNNS may be explored and expanded. While an optimization scheme based on the vectorized fidelity is effective for a variety of applications (as shown in this work) more sophisticated optimization protocols could enhance performance for more specific entanglement measures. In particular, a gradient descent method that directly minimizes the relative entropy (or some variant thereof) would provide a more effective computation of the REE for complex states. This would also lend well to the study of n-shot REE quantities with applications in quantum channel capacities, and the characterization of more complex BE states (such as those constructed from un-extendible product bases). Combining these tools with those from practical quantum tomography could also be extremely useful, e.g. where SNNS may be used to certify the effectiveness an entanglement distribution protocol.

Acknowledgments

CH acknowledges funding from the EPSRC via a Doctoral Training Partnership (EP/R513386/1). MP acknowledges the H2020-FETOPEN-2018-2020 Project TEQ (Grant No. 766900), the DfE-SFI Investigator Programme (Grant 15/IA/2864), COST Action CA15220, the Royal Society Wolfson Research Fellowship (RSWF∖R3∖183013), the Leverhulme Trust Research Project Grant (Grant No. RGP-2018-266), the UK EPSRC (Grant No. EP/T028106/1). SP acknowledges funding from the European Union's Horizon 2020 Research and Innovation Action under Grant Agreement No. 862644 (Quantum readout techniques and technologies, QUARTET).

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A.: Neural network mixed state ansatz

We briefly review the construction of the mixed NNS ansatz (see reference [19–21] for more detailed derivations) to illustrate the emergence of the classical mixing state and pure state ansatzes. A generic density-matrix element with respect to row and column basis vectors { α , β } can be expressed as

$\begin{equation}{\rho }^{\boldsymbol{\alpha },\boldsymbol{\beta }}=\sum\limits _{n}{p}_{n}{\phi }_{n}\left(\boldsymbol{\alpha }\right){\phi }_{n}^{{\ast}}\left(\boldsymbol{\beta }\right),\end{equation} \tag{ A1 }$

where p_n is the classical probability of a pure state ϕ_n existing within ensemble, and the sum ∑_n may run over many contributing states.

We can use NNS in order to translate this expression into a variational ansatz. As stated in the main text, the inherent advantage to a pure NNS is that its output is independent from the activations of the hidden layer $\boldsymbol{h}\in {\left\{-1,1\right\}}^{{n}_{\text{h}}}$ , which consists of n_h neurons. Prior to tracing out this hidden layer, a pure NNS wavefunction is given by,

$\begin{equation}{{\Psi}}_{{\Pi}}\left(\boldsymbol{s}\right)=\sum\limits _{\boldsymbol{h}}\mathrm{exp}\left(\sum\limits _{k=1}^{{n}_{v}}{a}_{k}{s}_{k}+\sum\limits _{j=1}^{{n}_{h}}{b}_{j}{h}_{h}+\sum\limits _{k,j=1}^{{n}_{v},{n}_{h}}{W}_{kj}{h}_{j}{s}_{k}\right).\end{equation} \tag{ A2 }$

Using this NNS wavefunction, it is then easy to construct a pure density-matrix, such that ${\sigma }^{\boldsymbol{\alpha },\boldsymbol{\beta }}={{\Psi}}_{{\Pi}}\left(\boldsymbol{\alpha }\right){{\Psi}}_{{\Pi}}^{{\ast}}\left(\boldsymbol{\beta }\right)$ , using two visible layers in order to encode density-matrix entries, as shown in figure (2).

In order to construct the mixed state ansatz, we introduce an additional mixing layer $\boldsymbol{m}\in {\left\{-1,1\right\}}^{{n}_{\text{m}}}$ which is used to represent the classical probabilities ${p}_{n}=\mathrm{exp}\left({\sum }_{p}{c}_{p}{m}_{p}\right)$ , where ${c}_{p}\in \mathbb{R}$ are the real-valued hidden mixing neural biases. This mixing layer is interconnected with the visible layers in order to capture classical correlations, mediated via the weight matrix ${U}_{kp}\in {\mathbb{C}}^{{n}_{v}{\times}{n}_{\text{m}}}$ . Combining all the RBM contributions the ansatz reads,

$\begin{equation}\begin{aligned}{\rho }^{\boldsymbol{\alpha },\boldsymbol{\beta }}& =\sum\limits _{\boldsymbol{m}}\sum\limits _{{\boldsymbol{h}}_{\boldsymbol{\alpha }},{\boldsymbol{h}}_{\boldsymbol{\beta }}}\mathrm{exp}\left(\sum\limits _{p=1}^{{n}_{\text{m}}}{c}_{p}{m}_{p}\right){\times}\mathrm{exp}\left(\sum\limits _{k}{a}_{k}{\alpha }_{k}+\sum\limits _{j}{b}_{j}{{h}_{\alpha }}_{j}+\sum\limits _{k,j}{W}_{kj}{{h}_{\alpha }}_{j}{\alpha }_{k}+\sum\limits _{k,p}{U}_{kp}{m}_{p}{\alpha }_{k}\right)\\ & \quad {\times}\mathrm{exp}\left(\sum\limits _{k}{a}_{k}^{{\ast}}{\beta }_{k}+\sum\limits _{l}{b}_{l}^{{\ast}}{{h}_{\beta }}_{l}+\sum\limits _{k,l}{W}_{kl}^{{\ast}}{{h}_{\beta }}_{l}{\beta }_{k}+\sum\limits _{k,p}{U}_{kp}^{{\ast}}{m}_{p}{\beta }_{k}\right).\end{aligned}\end{equation} \tag{ A3 }$

Since there are no intra-layer connections, the hidden layers can be effectively traced out, leaving the mixed state ansatz used in equation (13).

In this work, we make use of a matrix element-wise version of the mixed state decomposition in equation (A1), such that

$\begin{equation}\rho =\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}\sum\limits _{n}{p}_{n}{\phi }_{n}\left(\boldsymbol{\alpha }\right){\phi }_{n}^{{\ast}}\left(\boldsymbol{\beta }\right)\vert \boldsymbol{\alpha }\rangle \langle \boldsymbol{\beta }\vert =\mathcal{P}\odot \sigma ,\end{equation} \tag{ A4 }$

where $\mathcal{P}$ describes classical contributions to the density-matrix, while σ describes quantum contributions from pure states. This representation is readily accessible via the NNS ansatz, and extremely useful for programming forms of entanglement. Importantly, it is by construction that the contributions from the mixing layer are purely classical. On its own, the mixing layer is capable of simulating classical correlations only, and is therefore implicitly separable.

Appendix B.: Learning with complex-exponential ansatz for mixed states

As discussed in section 1.1, one can make use of a restructuring of the mixed state ansatz into complex exponential form in order to take better control of the learning procedure. Indeed, the total mixed state can be expressed as

$\begin{equation}{\rho }_{{\Omega},{\Pi},{\Xi}}^{\boldsymbol{\alpha },\boldsymbol{\beta }}={\text{e}}^{\text{i}\enspace \mathrm{log}\left({{\Phi}}_{{\Xi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){\vartheta }_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\right)}{{\Gamma}}_{{\Pi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){r}_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right),\end{equation} \tag{ B1 }$

such that the state is constructed from three variational parameter sets, where r_Ω and Γ_Π assume responsibility for the magnitude of any element of the density-matrix, while functions Φ_Ξ and ϑ_Ω are responsible for the complex phase of such elements. Consider a target state χ which also admits the following decomposition

$\begin{equation}{\chi }^{\boldsymbol{\alpha },\boldsymbol{\beta }}=\lambda \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){\text{e}}^{\text{i}\enspace \mathrm{log}\enspace \xi \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}.\end{equation} \tag{ B2 }$

The pure density-matrix phase/amplitude functions Φ_Ξ and Γ_Π respectively, are parameterized by real valued parameter sets. Furthermore, they are decomposed with respect to their pure state wavefunctions, as shown in equation (14). The logarithmic derivatives of the pair of pure state phase functions take the form

$\begin{equation}\frac{\partial \enspace \mathrm{log}\enspace \vert {{\Phi}}_{{\Xi}}\rangle }{\partial {{\Xi}}_{k}}=\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}\left(\frac{\partial \enspace \mathrm{log}\enspace \varphi \left(\boldsymbol{\alpha }\right)}{\partial {{\Xi}}_{k}}-\frac{\partial \enspace \mathrm{log}\enspace \varphi \left(\boldsymbol{\beta }\right)}{\partial {{\Xi}}_{k}}\right),\end{equation} \tag{ B3 }$

while the amplitude function derivatives are

$\begin{equation}\frac{\partial \enspace \mathrm{log}\enspace \vert {{\Gamma}}_{{\Pi}}\rangle }{\partial {{\Pi}}_{k}}=\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}\left(\frac{\partial \enspace \mathrm{log}\enspace \sigma \left(\boldsymbol{\alpha }\right)}{\partial {{\Pi}}_{k}}+\frac{\partial \enspace \mathrm{log}\enspace \sigma \left(\boldsymbol{\beta }\right)}{\partial {{\Pi}}_{k}}\right).\end{equation} \tag{ B4 }$

Meanwhile, the mixing state phase/amplitude wavefunctions ϑ_Ω and r_Ω respectively are based on complex parameters. In this case, it is expedient to take derivatives with respect to real and imaginary components, i.e. $\frac{\partial \enspace \mathrm{log}\enspace \vert {r}_{{\Omega}}\rangle }{\partial \enspace \text{Re}\left({{\Omega}}_{k}\right)}$ , $\frac{\partial \enspace \mathrm{log}\enspace \vert {r}_{{\Omega}}\rangle }{\partial \enspace \text{Im}\left({{\Omega}}_{k}\right)}$ , $\frac{\partial \enspace \mathrm{log}\enspace \vert {\vartheta }_{{\Omega}}\rangle }{\partial \enspace \text{Re}\left({{\Omega}}_{k}\right)}$ and $\frac{\partial \enspace \mathrm{log}\enspace \vert {\vartheta }_{{\Omega}}\rangle }{\partial \enspace \text{Re}\left({{\Omega}}_{k}\right)}$ which can be treated separately. All these derivatives take real, compact and easily derived forms with respect to the neural network parameters, making gradient computations straightforward.

The learning procedure of minimising the negative logarithmic fidelity between a target vectorized density-matrix |χ⟩ and the mixed NNS is given by the usual update rule in section 3. Defining the quantity

$\begin{equation}{\Delta}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)={\langle {\rho }_{{\Omega},{\Pi},{\Xi}}\vert \chi \rangle }^{-1}\enspace {\text{e}}^{\text{i}\enspace \mathrm{log}\enspace \frac{{{\Phi}}_{{\Xi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){\vartheta }_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}{\xi \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}},\end{equation} \tag{ B5 }$

where ⟨ρ_Ω,Π,Ξ|χ⟩ is the vectorized overlap between the variational and target state, we can then make use of the following gradients,

$\begin{equation}{\nabla }_{k}^{{{\Gamma}}_{{\Pi}}}\mathcal{L}=\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}\left[\frac{{r}_{{\Omega}}^{2}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){{\Gamma}}_{{\Pi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}{\vert {\rho }_{{\Omega},{\Pi},{\Xi}}{\vert }^{2}}-\lambda \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){r}_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\text{Re}\left[{\Delta}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\right]\right]\cdot {\mathcal{O}}_{k}^{{\Pi}}\vert {{\Gamma}}_{{\Pi}}\rangle ,\end{equation} \tag{ B6 }$

$\begin{equation}{\nabla }_{k}^{{r}_{{\Omega}}}\mathcal{L}=\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}\left[\frac{{{\Gamma}}_{{\Pi}}^{2}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){r}_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}{\vert {\rho }_{{\Omega},{\Pi},{\Xi}}{\vert }^{2}}-\lambda \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){{\Gamma}}_{{\Pi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\text{Re}\left[{\Delta}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\right]\right]\cdot {\mathcal{O}}_{k}^{{{\Omega}}_{r}}\vert {r}_{{\Omega}}\rangle ,\end{equation} \tag{ B7 }$

$\begin{equation}{\nabla }_{k}^{{{\Phi}}_{{\Xi}}}\mathcal{L}=-\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}\left[\frac{{r}_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\lambda \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){{\Gamma}}_{{\Pi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}{{{\Phi}}_{{\Xi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}\enspace \text{Im}\left[{\Delta}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\right]\right]\cdot {\mathcal{O}}_{k}^{{\Xi}}\vert {{\Phi}}_{{\Xi}}\rangle ,\end{equation} \tag{ B8 }$

$\begin{equation}{\nabla }_{k}^{{\vartheta }_{{\Omega}}}\mathcal{L}=-\sum\limits _{\boldsymbol{\alpha },\boldsymbol{\beta }}\left[\frac{{r}_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\lambda \left(\boldsymbol{\alpha },\boldsymbol{\beta }\right){{\Gamma}}_{{\Pi}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}{{\vartheta }_{{\Omega}}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)}\enspace \text{Im}\left[{\Delta}\left(\boldsymbol{\alpha },\boldsymbol{\beta }\right)\right]\right]\cdot {\mathcal{O}}_{k}^{{{\Omega}}_{\vartheta }}\vert {\vartheta }_{{\Omega}}\rangle .\end{equation} \tag{ B9 }$

Here, |ρ_Ω,Π,Ξ|² is the magnitude of the vectorized density-matrix. Furthermore ${\mathcal{O}}_{k}^{{{\Omega}}_{r}}=\mathrm{diag}\left({\partial }_{{{\Omega}}_{k}}\enspace \mathrm{log}\vert {r}_{{\Omega}}\rangle \right)$ and ${\mathcal{O}}_{k}^{{{\Omega}}_{\vartheta }}=\mathrm{diag}\left({\partial }_{{{\Omega}}_{k}}\enspace \mathrm{log}\vert {\vartheta }_{{\Omega}}\rangle \right)$ are the diagonal matrices with mixing layer gradients. Again, these are treated separately with respect to real and imaginary valued parameters in Ω.

Appendix C.: Details on numerical simulation

The gradient descent optimization procedures utilized throughout this work were facilitated by an adaptive learning rate scheme using the AdaMax optimizer [57] with a typical initial learning rate of the order η_init ∈ [10⁻⁴, 10⁻³]. The number of learning iterations varied dependent on the complexity of the target state, i.e. complexity of entanglement needed to be simulated/classified, the dimension of the qudit system being considered (and therefore size of the target density-matrix). Since the time-to-convergence is shorter for states with smaller degrees of entanglement, it is intuitively more efficient to perform classification with a separable NNS than to explicitly reconstruct an entangled state.

A scenario in which the efficiency of learning can be greatly enhanced is the study of evolving, or 'nearby' states. Consider the results from figures 5–7. In a number of instances, we are classifying/quantifying the entanglement of a target state which is changing incrementally (and by a small amount) throughout an interval. Consider an NNS ρ_Ω that learns a state σ. It is logical to assume that if the target state is perturbed/evolved by some small amount, σ' = σ + δσ, the network Ω will only need to be optimized by a small amount Ω' = Ω + δΩ. Therefore, when studying evolving target states, it is extremely useful to initialize each state using the parameter distribution of the previous learner. This not only simplifies learning and performance, but increases efficiency dramatically; the initial target can be reconstructed over a number of optimization steps S, but subsequent alterations to the network only require a fraction of S steps.

Importantly, when performing this method with SNNS one should ensure that the chosen initialization complies with the entanglement properties of the separable variational state, i.e. a separable network should be initialized with a nearby separable network state. If an SNNS is initialized in with the network parameters of a nearby entangled NNS, when separability conditions are imposed the network state will change rapidly and potentially end up in a state that is very different to the target, contrary to the desired effect.

Mixed state entanglement classification using artificial neural networks

Article metrics

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Neural network quantum states

1.1. Pure states

1.2. Mixed states

2. Separable neural network architectures

2.1. Separable pure network states

2.2. Separable neural network density matrices

3. Classifying and quantifying entanglement

3.1. Learning of quantum states

3.2. Quantifying entanglement

4. Applications and results

4.1. Mixed states in d-dimensions

4.2. Classification of bound entangled states

4.3. Detection and measurement of multipartite entanglement

4.4. Ultimate limits for channel capacities

5. Conclusions and outlook

Acknowledgments

Data availability statement

Appendix A.: Neural network mixed state ansatz

Appendix B.: Learning with complex-exponential ansatz for mixed states

Appendix C.: Details on numerical simulation

Mixed state entanglement classification using artificial neural networks

Article metrics

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Neural network quantum states

1.1. Pure states

1.2. Mixed states

2. Separable neural network architectures

2.1. Separable pure network states

2.2. Separable neural network density matrices

3. Classifying and quantifying entanglement

3.1. Learning of quantum states

3.2. Quantifying entanglement

4. Applications and results

4.1. Mixed states in d-dimensions

4.2. Classification of bound entangled states

4.3. Detection and measurement of multipartite entanglement

4.4. Ultimate limits for channel capacities

5. Conclusions and outlook

Acknowledgments

Data availability statement

Appendix A.: Neural network mixed state ansatz

Appendix B.: Learning with complex-exponential ansatz for mixed states

Appendix C.: Details on numerical simulation