If you see this, something is wrong
To get acquainted with the document, the best thing to do is to select the "Collapse all sections" item from the "View" menu. This will leave visible only the titles of the top-level sections.
Clicking on a section title toggles the visibility of the section content. If you have collapsed all of the sections, this will let you discover the document progressively, from the top-level sections to the lower-level ones.
Generally speaking, anything that is blue is clickable.
Clicking on a reference link (like an equation number, for instance) will display the reference as close as possible, without breaking the layout. Clicking on the displayed content or on the reference link hides the content. This is recursive: if the content includes a reference, clicking on it will have the same effect. These "links" are not necessarily numbers, as it is possible in LaTeX2Web to use full text for a reference.
Clicking on a bibliographical reference (i.e., a number within brackets) will display the reference.
Speech bubbles indicate a footnote. Click on the bubble to reveal the footnote (there is no page in a web document, so footnotes are placed inside the text flow). Acronyms work the same way as footnotes, except that you have the acronym instead of the speech bubble.
By default, discussions are open in a document. Click on the discussion button below to reveal the discussion thread. However, you must be registered to participate in the discussion.
If a thread has been initialized, you can reply to it. Any modification to any comment, or a reply to it, in the discussion is signified by email to the owner of the document and to the author of the comment.
First published on Saturday, Dec 28, 2024 and last modified on Wednesday, Apr 9, 2025 by François Chaplais.
Department of Mathematics, University of California, Berkeley, Challenge Institute of Quantum Computation, University of California, Berkeley, Computational Research Division, Lawrence Berkeley National Laboratory
With availability of near-term quantum devices and the breakthrough of quantum supremacy experiments, quantum computation has received an increasing amount of attention from a diverse range of scientific disciplines in the past few years. Despite the availability of excellent textbooks as well as lecture notes such as [1, 2, 3, 4, 5, 6, 7, 8], these materials often cover all aspects of quantum computation, including complexity theory, physical implementations of quantum devices, quantum information theory, quantum error correction, quantum algorithms etc. This leaves little room for introducing how a quantum computer is supposed to be used to solve challenging computational problems in scientific and engineering. For instance, after the initial reading of (admittedly, selected chapters of) the classic textbook by Nielsen and Chuang [1], I was both amazed by the potential power of a quantum computer, and baffled by its practical range of applicability: are we really trying to build a quantum computer, either to perform a quantum Fourier transform or to perform a quantum search? Is quantum phase estimation the only bridge connecting a quantum computer on one side, and virtually all scientific computing problems on the other, such as solving linear systems, eigenvalue problems, least squares problems, differential equations, numerical optimization etc.?
Thanks to the significant progresses in the development of quantum algorithms, it should be by now self-evident that the answer to both questions above is no. This is a fast evolving field, and many important progresses have only been developed in the past few years. However, many such developments are theoretically and technically involved, and can be difficult to penetrate for someone with only basic knowledge of quantum computing. I think it is worth delivering some of these exciting results, in a somewhat more accessible way, to a broader community interested in using future fault-tolerant quantum computers to solve scientific problems.
This is a set of lecture notes used in a graduate topic class in applied mathematics called “Quantum Algorithms for Scientific Computation” at the Department of Mathematics, UC Berkeley during the fall semester of 2021. These lecture notes focus only on quantum algorithms closely related to scientific computation, and in particular, matrix computation. In fact, this is only a small class of quantum algorithms viewed from the perspective of the “quantum algorithm zoo” . This means that many important materials are consciously left out, such as quantum complexity theory, applications in number theory and cryptography (notably, Shor’s algorithm), applications in algebraic problems (such as the hidden subgroup problems) etc. Readers interested in these topics can consult some of the excellent aforementioned textbooks. Since the materials were designed to fit into the curriculum of one semester, several other topics relevant to scientific computation are not included, notably adiabatic quantum computation (AQC), and variational quantum algorithms (VQA). These materials may be added in future editions of the lecture notes. To my knowledge, some of the materials in these lecture notes may be new and have not been presented in the literature. The sections marked by * can be skipped upon first reading without much detriment.
I would like to thank Dong An, Yulong Dong, Di Fang, Fabian M. Faulstich, Cory Hargus, Zhen Huang, Subhayan Roy Moulik, Yu Tong, Jiasu Wang, Mathias Weiden, Jiahao Yao, Lexing Ying for useful discussions and for pointing out typos in the notes. I would like also like to thank Nilin Abrahamsen, Di Fang, Subhayan Roy Moulik, Yu Tong for contributing some of the exercises, and Jiahao Yao for providing the cover image of the notes. For errors / comments / suggestions / general thoughts on the lectures notes, please send me an email: Email.
We introduce the four main postulates of quantum mechanics related to this course. For more details, we refer readers to [1, Section 2.2]. All postulates concern closed quantum systems (i.e., systems isolated from environments) only.
The set of all quantum states of a quantum system forms a complex vector space with inner product structure (i.e., it is a Hilbert space, denoted by \( \mathcal{H}\) ), called the state space. If the state space \( \mathcal{H}\) is finite dimensional, it is isomorphic to some \( \mathbb{C}^N\) , written as \( \mathcal{H}\cong \mathbb{C}^N\) . Without loss of generality we may simply take \( \mathcal{H}=\mathbb{C}^N\) . We always assume \( N=2^n\) for some non-negative integer \( n\) , often called the number of quantum bits (or qubits). A quantum state \( \psi\in\mathbb{C}^N\) can be expressed in terms of its components as
(1)
Its Hermitian conjugate is
(2)
where \( \overline{c}\) is the complex conjugation of \( c\in \mathbb{C}\) . We also use the Dirac notation, which uses \( \ket{\psi}\) to denote a quantum state, \( \bra{\psi}\) to denote its Hermitian conjugation \( \psi^{†}\) , and the inner product
(3)
Here \( [N]=\set{0,…,N-1}\) . Let \( \{\ket{i}\}\) be the standard basis of \( \mathbb{C}^N\) . The \( i\) -th entry of \( \psi\) can be written as an inner product \( \psi_i=\braket{i|\psi}\) . Then \( \ket{\psi}\bra{\varphi}\) should be interpreted as an outer product, with \( (i,j)\) -th matrix element given by
(4)
Two state vectors \( \ket{\psi}\) and \( c\ket{\psi}\) for some \( 0\ne c\in \mathbb{C}\) always refer to the same physical state, i.e., \( c\) has no observable effects. Hence without loss of generality we always assume \( \ket{\psi}\) is normalized to be a unit vector, i.e., \( \braket{\psi|\psi}=1\) . Sometimes it is more convenient to write down an unnormalized state, which will be denoted by \( \psi\) without the ket notation \( \ket{\cdot}\) . Restricting to normalized state vectors, the complex number \( c=e^{\mathrm{i}\theta}\) for some \( \theta\in [0,2\pi)\) , called the global phase factor.
Example 1 (Single qubit system)
A (single) qubit corresponds to a state space \( \mathcal{H}\cong\mathbb{C}^2\) . We also define
(5)
Since the state space of the spin-\( \frac12\) system is also isomorphic to \( \mathbb{C}^2\) , this is also called the single spin system, where \( \ket{0},\ket{1}\) are referred to as the spin-up and spin-down state, respectively. A general state vector in \( \mathcal{H}\) takes the form
(6)
and the normalization condition implies \( \left\lvert a\right\rvert ^2+\left\lvert b\right\rvert ^2=1\) . So we may rewrite \( \ket{\psi}\) as
(7)
If we ignore the irrelevant global phase \( \gamma\) , the state is effectively
(8)
So we may identify each single qubit quantum state with a unique point on the unit three-dimensional sphere (called the Bloch sphere) as
(9)
The evolution of a quantum state from \( \ket{\psi}\to\ket{\psi'}\in \mathbb{C}^N\) is always achieved via a unitary operator \( U\in\mathbb{C}^{N\times N}\) , i.e.,
(10)
Here \( U^{†}\) is the Hermitian conjugate of a matrix \( U\) , and \( I_N\) is the \( N\) -dimensional identity matrix. When the dimension is apparent, we may also simply write \( I\equiv I_N\) . In quantum computation, a unitary matrix is often referred to as a gate.
Example 2
For a single qubit, the Pauli matrices are
(11)
Together with the two-dimensional identity matrix, they form a basis of all linear operators on \( \mathbb{C}^2\) .
Some other commonly used single qubit operators include, to name a few:
Hadamard gate
(12)
Phase gate
(13)
\( \mathrm{T}\) gate:
(14)
When there are notational conflicts, we will use the roman font such as \( \mathrm{H},\mathrm{X}\) for these single-qubit gates (one common scenario is to distinguish the Hadamard gate \( \mathrm{H}\) from a Hamiltonian \( H\) ). An operator acting on an \( n\) -qubit quantum state space is called an \( n\) -qubit operator.
Starting from an initial quantum state \( \ket{\psi(0)}\) , the quantum state can evolve in time, which gives a single parameter family of quantum states denoted by \( \{\ket{\psi(t)}\}\) . These quantum states are related to each other via a quantum evolution operator \( U\) :
(15)
where \( U(t_2,t_1)\) is unitary for any given \( t_1,t_2\) . Here \( t_2>t_1\) refers to quantum evolution forward in time, \( t_2<t_1\) refers to quantum evolution backward in time, and \( U(t_1,t_1)=I\) for any \( t_1\) .
The quantum evolution under a time-independent Hamiltonian \( H\) satisfies the time-independent Schrödinger equation
(16)
Here \( H=H^{†}\) is a Hermitian matrix. The corresponding time evolution operator is
(17)
In particular, \( U(t_2,t_1)=U(t_2-t_1,0)\) .
On the other hand, for any unitary matrix \( U\) , we can always find a Hermitian matrix \( H\) such that \( U=e^{\mathrm{i} H}\) ((1)).
Example 3
Let the Hamiltonian \( H\) be the Pauli-X gate. Then
(18)
Starting from an initial state \( \ket{\psi(0)}=\ket{0}\) , after time \( t=\pi/2\) , the state evolves into \( \ket{\psi(\pi/2)}=-\mathrm{i} \ket{1}\) , i.e., the \( \ket{1}\) state (up to a global phase factor).
Without loss of generality, we only discuss a special type of quantum measurements called the projective measurement. For more general types of quantum measurements, see [1, Section 2.2.3]. All quantum measurements expressed as a positive operator-valued measure (POVM) can be expressed in terms of projective measurements in an enlarged Hilbert space via the Naimark dilation theorem.
In a finite dimensional setting, a quantum observable always corresponds to a Hermitian matrix \( M\) , which has the spectral decomposition
(19)
Here \( \lambda_m\in\mathbb{R}\) are the eigenvalues of \( M\) , and \( P_m\) is the projection operator onto the eigenspace associated with \( \lambda_m\) , i.e., \( P_m^2=P_m\) .
When a quantum state \( \ket{\psi}\) is measured by a quantum observable \( M\) , the outcome of the measurement is always an eigenvalue \( \lambda_m\) , with probability
(20)
After the measurement, the quantum state becomes
(21)
Note that this is not a unitary process!
In order to evaluate the expectation value of a quantum observable \( M\) , we first use the resolution of identity:
(22)
This implies the normalization condition,
(23)
Together with \( p_m\ge 0\) , we find that \( \{p_m\}\) is indeed a probability distribution.
The expectation value of the measurement outcome is
(24)
Example 4
Again let \( M=X\) . From the spectral decomposition of \( X\) :
(25)
where \( \ket{\pm}:=\frac{1}{\sqrt{2}}(\ket{0}\pm\ket{1}), \lambda_{\pm}=\pm 1\) , we obtain the eigendecomposition
(26)
Consider a quantum state \( \ket{\psi}=\ket{0}=\frac{1}{\sqrt{2}}(\ket{+}+\ket{-})\) , then
(27)
Therefore the expectation value of the measurement is \( \braket{\psi|M|\psi}=0.\)
For a quantum state consists of \( m\) components with state spaces \( \{\mathcal{H}_i\}_{i=0}^{m-1}\) , the state space is their tensor products denoted by \( \mathcal{H}=\otimes_{i=0}^{m-1} \mathcal{H}_i\) . Let \( \ket{\psi_i}\) be a state vector in \( \mathcal{H}_i\) , then
(28)
in \( \mathcal{H}\) . However, not all quantum states in \( \mathcal{H}\) can be written in the tensor product form above. Let \( \{\ket{e^{(i)}_j}\}_{j\in [N_i]}\) be the basis of \( \mathcal{H}_i\) , then a general state vector in \( \mathcal{H}\) takes the form
(29)
Here \( \psi_{j_0… j_{m-1}}\in\mathbb{C}\) is an entry of a \( m\) -way tensor, and the dimension of \( \mathcal{H}\) is therefore \( \prod_{i\in[m]} N_i\) .
The state space of \( n\) -qubits is \( \mathcal{H}=(\mathbb{C}^2)^{\otimes n}\cong \mathbb{C}^{2^n}\) , rather than \( \mathbb{C}^{2n}\) . We also use the notation
(30)
Furthermore, \( x\in\{0,1\}^n\) is called a classical bit-string, and \( \{\ket{x}|x\in\{0,1\}^n\}\) is called the computational basis of \( \mathbb{C}^{2^n}\) .
Example 5 (Two qubit system)
The state space is \( \mathcal{H}=(\mathbb{C}^2)^{\otimes 2}\cong \mathbb{C}^{4}\) . The standard basis is (row-major order, i.e., last index is the fastest changing one)
(31)
The Bell state (also called the EPR pair) is defined to be
(32)
which cannot be written as any product state \( \ket{a}\otimes\ket{b}\) ((4)).
There are many important quantum operators on the two-qubit quantum system. One of them is the CNOT gate, with matrix representation
(33)
In other words, when acting on the standard basis, we have
(34)
This can be compactly written as
(35)
Here \( a\oplus b=(a+b)\mod 2\) is the “exclusive or” (XOR) operation.
Example 6 (Multi-qubit Pauli operators)
For a \( n\) -qubit quantum system, the Pauli operator acting on the \( i\) -th qubit is denoted by \( P_i\) (\( P=X,Y,Z\) ). For instance
(36)
So far all quantum states encountered can be described by a single \( \ket{\psi}\in\mathcal{H}\) , called the pure state. More generally, if a quantum system is in one of a number of states \( \ket{\psi_i}\) with respective probabilities \( p_i\) , then \( \{p_i, \ket{\psi_i}\}\) is an ensemble of pure states. The density operator of the quantum system is
(37)
For a pure state \( \ket{\psi}\) , we have
(38)
is a rank-\( 1\) matrix.
Consider a quantum observable in (19) associated with the projectors \( \{P_m\}\) . For a pure state, it can be verified that the probability result of returning \( \lambda_m\) , and the expectation value of the measurement are respectively,
(39)
The expression (39) also holds for general density operators \( \rho\) .
An operator \( \rho\) is the density operator associated to some ensemble \( \{p_i, \ket{\psi_i}\}\) if and only if (1) \( \operatorname{Tr} \rho=1\) (2) \( \rho \succeq 0\) , i.e., \( \rho\) is a positive semidefinite matrix (also called a positive operator). All postulates in (2.1) can be stated in terms of density operators (see [1, Section 2.4.2]). Note that a pure state satisfies \( \rho^2=\rho\) . In general we have \( \rho^2\preceq \rho\) . If \( \rho^2\prec \rho\) , then \( \rho\) is called (the density operator of) a mixed state. Furthermore, an ensemble of admissible density operators is also a density operator.
A quantum operator \( U\) that transforms \( \ket{\psi}\) to \( U\ket{\psi}\) also transforms the density operator according to
(40)
However, not all quantum operations on density operators need to be unitary! See [1, Section 8.2] for more general discussions on quantum operations.
Most of the discussions in this course will be restricted to pure states, and unitary quantum operations. Even in this restricted setting, the density operator formalism can still be convenient, particularly for describing a subsystem of a composite quantum system. Consider a quantum system of \( (n+m)\) -qubits, partitioned into a subsystem \( A\) with \( n\) qubits (the state space is \( \mathcal{H}_A=\mathbb{C}^{2^n}\) ) and a subsystem \( B\) with \( m\) qubits (the state space is \( \mathcal{H}_B=\mathbb{C}^{2^m}\) ) respectively. The quantum state is a pure state \( \ket{\psi}\in \mathbb{C}^{2^{n+m}}\) with density operator \( \rho_{AB}\) . Let \( \ket{a_1},\ket{a_2}\) be two state vectors in \( \mathcal{H}_A\) , and \( \ket{b_1},\ket{b_2}\) be two state vectors in \( \mathcal{H}_B\) . Then the partial trace over system \( B\) is defined as
(41)
Since we can expand the density operator \( \rho_{AB}\) in terms of the basis of \( \mathcal{H}_A,\mathcal{H}_B\) , the definition of (41) can be extended to define the reduced density operator for the subsystem \( A\)
(42)
The reduced density operator for the subsystem \( B\) can be similarly defined. The reduced density operators \( \rho_A,\rho_B\) are generally mixed states.
Example 7 (Reduced density operator of tensor product states)
If \( \rho_{AB}=\rho_1\otimes \rho_2\) , then
(43)
If a quantum observable is defined only on the subsystem \( A\) , i.e., \( M=M_A\otimes I\) where \( M_A\) has the decomposition (19), then the success probability of returning \( \lambda_m\) , and the expectation value are respectively
(44)
Nearly all quantum algorithms operate on multi-qubit quantum systems. When quantum operators operate on two or more qubits, writing down quantum states in terms of its components as in (29) quickly becomes cumbersome. The quantum circuit language offers a graphical and compact manner for writing down the procedure of applying a sequence of quantum operators to a quantum state. For more details see [1, Section 4.2, 4.3].
In the quantum circuit language, time flows from the left to right, i.e., the input quantum state appears on the left, and the quantum operator appears on the right, and each “wire” represents a qubit i.e.,
Here are a few examples:
which is a graphical way of writing
(45)
The relation between these states can be expressed in terms of the following diagram
Also verify that
which is a graphical way of writing
(46)
Note that the input state can be general, and in particular does not need to be a product state. For example, if the input is a Bell state (32), we just apply the quantum operator to \( \ket{00}\) and \( \ket{11}\) , respectively and multiply the results by \( 1/\sqrt{2}\) and add together. To distinguish with other symbols, these single qubit gates may be either written as \( X,Y,Z,H\) or (using the roman font) \( \mathrm{X,Y,Z,H}\) .
The quantum circuit for the CNOT gate is
Here the “dot” means that the quantum gate connected to the dot only becomes active if the state of the qubit \( 0\) (called the control qubit) is \( a=1\) . This justifies the name of the CNOT gate (controlled NOT).
Similarly,
is the controlled \( U\) gate for some unitary \( U\) . Here \( U^a=I\) if \( a=0\) . The CNOT gate can be obtained by setting \( U=X\) .
Another commonly used two-qubit gate is the SWAP gate, which swaps the state in the \( 0\) -th and the \( 1\) -st qubits.
Quantum operators applied to multiple qubits can be written in a similar manner:
For a multi-qubit quantum circuit, unless stated otherwise, the first qubit will be referred to as the qubit 0, and the second qubit as the qubit 1, etc.
When the context is clear, we may also use a more compact notation for the multi-qubit quantum operators:
One useful multiple qubit gate is the Toffoli gate (or controlled-controlled-NOT, CCNOT gate).
We may also want to apply a \( n\) -qubit unitary \( U\) only when certain conditions are met
where the empty circle means that the gate being controlled only becomes active when the value of the control qubit is \( 0\) . This can be used to write down the quantum “if” statements, i.e., when the qubits \( 0,1\) are at the \( \ket{1}\) state and the qubit \( 2\) is at the \( \ket{0}\) state, then apply \( U\) to \( \ket{x}\) .
A set of qubits is often called a register (or quantum variable). For example, in the picture above, the main quantum state of interest (an \( n\) qubit quantum state \( \ket{x}\) ) is called the system register. The first \( 3\) qubits can be called the control register.
One of the most striking early results of quantum computation is the no-cloning theorem (by Wootters and Zurek, as well as Dieks in 1982), which forbids generic quantum copy operations (see also [1, Section 12.1]). The no-deleting theorem is a consequence of linearity of quantum mechanics.
Assume there is a unitary operator \( U\) that acts as the copy operations, i.e.,
(47)
for any black-box state \( x\) , and a chosen target state \( \ket{s}\) (e.g. \( \ket{0^n}\) ). Then take two states \( \ket{x_1},\ket{x_2}\) , we have
(48)
Taking the inner product of the two equations, we have
(49)
which implies \( \braket{x_1|x_2}=0\) or \( 1\) . When \( \braket{x_1|x_2}=1\) , \( \ket{x_1},\ket{x_2}\) refer to the same physical state. Therefore a cloning operator \( U\) can at most copy states which are orthogonal to each other, and a general quantum copy operation is impossible.
Given the ubiquity of the copy operation in scientific computing like \( y=x\) , the no-cloning theorem has profound implications. For instance, all classical iterative algorithms for solving linear systems require storing some intermediate variables. This operation is generally not possible, or at least cannot be efficiently performed.
There are two notable exceptions to the range of applications of the no-cloning theorem. The first is that we know how a quantum state is prepared, i.e., \( \ket{x}=U_x\ket{s}\) for a known unitary \( U_x\) and some \( \ket{s}\) . Then we can of course copy this specific vector \( \ket{x}\) via
(50)
The second is the copying of classical information. This is an application of the CNOT gate.
i.e.,
(51)
The same principle applies to copying classical information from multiple qubits. (1) gives an example of copying the classical information stored in 3 bits.
In general, a multi-qubit CNOT operation can be used to perform the classical copying operation in the computational basis. Note that in the circuit model, this can be implemented with a depth 1 circuit, since they all act on different qubits.
Example 8
Let us verify that the CNOT gate does not violate the no-cloning theorem, i.e., it cannot be used to copy a superposition of classical bits \( \ket{x}=a\ket{0}+b\ket{1}\) . Direct calculation shows
(52)
In particular, if \( \ket{x}=\ket{+}\) , then CNOT creates a Bell state.
The quantum no-cloning theorem implies that there does not exist a unitary \( U\) that performs the deleting operation, which resets a black-box state \( \ket{x}\) to \( \ket{0^n}\) . This is because such a deleting unitary can be viewed as a copying operation
(53)
Then take \( \ket{x_1},\ket{x_2}\) that are orthogonal to each other, apply the deleting gate, and compute the inner products, we obtain
(54)
which is a contradiction.
A more common way to express the no-deleting theorem is in terms of the time reversed dual of the no-cloning theorem: in general, given two copies of some arbitrary quantum state, it is impossible to delete one of the copies. More specifically, there is no unitary \( U\) performing the following operation using known states \( \ket{s},\ket{s'}\) ,
(55)
for an arbitrary unknown state \( \ket{x}\) ((7)).
The quantum measurement applied to any qubit, by default, measures the outcome in the computational basis. For example,
outputs \( 0\) or \( 1\) each w.p. \( 1/2\) . We may also measure some of the qubits in a multi-qubit system.
There are two important principles related to quantum measurements: the principle of deferred measurement, and the principle of implicit measurement. At a first glance, both principles may seem to be counterintuitive.
The principle of deferred measurement states that measurement operations can always be moved from an intermediate stage of a quantum circuit to the end of the circuit. This is because even if a measurement is performed as an intermediate step in a quantum circuit, and the result of the measurement is used to conditionally control subsequent quantum gates, such classical controls can always be replaced by quantum controls, and the result of the quantum measurement is postponed to later.
Example 9 (Deferring quantum measurements)
Consider the circuit
Here the double line denotes the classical control operation. The outcome is that qubit \( 0\) has probability \( 1/2\) of outputting \( 0\) , and the qubit \( 1\) is at state \( \ket{0}\) . Qubit \( 0\) also has probability \( 1/2\) of outputting \( 1\) , and the qubit \( 1\) is at state \( \ket{1}\) .
However, we may replace the classical control operation after the measurement by a quantum controlled \( X\) (which is CNOT), and measure the qubit \( 0\) afterwards:
It can be verified that the result is the same. Note that CNOT acts as the classical copying operation. So qubit \( 1\) really stores the classical information (i.e., in the computational basis) of qubit \( 0\) .
Example 10 (Deferred measurement requires extra qubits)
The procedure of deferring quantum measurements using CNOTs is general, and important. Consider the following circuit:
The probability of obtaining \( 0,1\) is \( 1/2\) , respectively. However, if we simply “defer” the measurement to the end by removing the intermediate measurement, we obtain
The result of the measurement is deterministically \( 0\) ! The correct way of deferring the intermediate quantum measurement is to introduce another qubit
Measuring the qubit \( 0\) , we obtain \( 0\) or \( 1\) w.p. \( 1/2\) , respectively. Hence when deferring quantum measurements, it is necessary to store the intermediate information in extra (ancilla) qubits, even if such information is not used afterwards.
The principle of implicit measurements states that at the end of a quantum circuit, any unmeasured qubit may be assumed to be measured. More specifically, assume the quantum system consists of two subsystems \( A\) and \( B\) . If qubits \( A\) are to be measured at the end of the circuits, the results of the measurements does not depend on whether the qubits \( B\) are measured or not. Recall from (44) that a measurement on the subsystem \( A\) only depends on the reduced density matrix \( \rho_A\) . So we only need to show that \( \rho_A\) does not depend on the measurement in \( B\) . To see why this is the case, let \( \{P_i\}\) be the projectors onto the computational basis of \( B\) . Before the measurement, the density operator is \( \rho\) . If we measure the subsystem \( B\) , the resulting density operator is transformed into
(56)
Then it can be verified that
(57)
This proves the principle of implicit measurements.
By definition, the output of all quantum algorithms must be obtained through measurements, and hence the measurement outcome is probabilistic in general. If the goal is to compute the expectation value of a quantum observable \( M_A\) acting on a subsystem \( A\) , then its variance is
(58)
The number of samples \( \mathcal{N}\) needed to estimate \( \operatorname{Tr}\) to additive precision \( \epsilon\) satisfies
(59)
which only depends on \( \rho_A\) .
Example 11 (Estimating success probability on one qubit)
Let \( A\) be the single qubit to be measured in the computational basis, and we are interested in the accuracy in estimating the success probability of obtaining \( 1\) , i.e., \( p\) . This can be realized as an expectation value with \( M_A=\ket{1}\bra{1}\) , and \( p=\operatorname{Tr}\) . Note that \( M_A^2=M_A\) , then
(60)
Hence to estimate \( p\) to additive error \( \epsilon\) , the number of samples needed satisfies
(61)
Note that if \( p\) is close to \( 0\) or \( 1\) , the number of samples needed is also very small: indeed, the outcome of the measurement becomes increasing deterministic in this case!
If we are interested in estimating \( p\) to multiplicative accuracy \( \epsilon\) , then the number of samples is
(62)
and the task becomes increasingly more difficult when \( p\) approaches \( 0\) .
If a quantum algorithm denoted by a unitary \( U\) can be decomposed into a series of simpler unitaries as \( U=U_{K}… U_1\) , and if we can implement each \( U_i\) to precision \( \epsilon\) , then what is the global error? We now introduce a simple technique connecting the local error with the global error. In the context of quantum computation, this is often referred to as the “hybrid argument”.
Proposition 1 (Hybrid argument)
Given unitaries \( U_1,\widetilde{U}_1,…, U_K,\widetilde{U}_K\in\mathbb{C}^{N\times N}\) satisfying
(63)
we have
(64)
Proof
Use a telescoping series
(65)
Since all \( U_i,\widetilde{U}_i\) are unitary matrices, we readily have
(66)
In other words, if we can implement each local unitary to precision \( \epsilon\) , the global error grows at most linearly with respect to the number of gates and is bounded by \( K\epsilon\) . The telescoping series (65), as well as the hybrid argument can also be seen as a discrete analogue of the variation of constants method (also called Duhamel’s principle).
Proposition 2 (Duhamel’s principle for Hamiltonian simulation)
Let \( U(t),\widetilde{U}(t)\in \mathbb{C}^{N\times N}\) satisfy
(67)
where \( H\in \mathbb{C}^{N\times N}\) is a Hermitian matrix, and \( B(t)\in \mathbb{C}^{N\times N}\) is an arbitrary matrix. Then
(68)
and
(69)
Proof
Directly verify that (68) is the solution to the differential equation.
As a special case, consider \( B(t)=E(t)\widetilde{U}(t)\) , then (68) becomes
(70)
and
(71)
This is a direct analogue of the hybrid argument in the continuous setting.
In classical computation, there are many universal gate sets, in the sense that any classical gate can be represented as a combination of gates from the set. For example, the NAND gate (“Not AND”) alone forms a universal gate set [1, Section 3.1.2]. The NOR gate (“Not OR”) is also a universal gate set.
In the quantum setting, any unitary operator on \( n\) qubits can be implemented using \( 1\) - and \( 2\) -qubit gates[1, Section 4.5]. It is desirable to come up with a set of discrete universal gates, but this means that we need to give up the notion that the unitary \( U\) can be exactly represented. Instead, a set of quantum gates \( \mathcal{S}\) is universal if given any unitary operator \( U\) and desired precision \( \epsilon\) , we can find \( U_1,…,U_m\in \mathcal{S}\) such that
(72)
Here \( \left\lVert A\right\rVert =\sup_{\braket{\psi|\psi}=1} \left\lVert A\ket{\psi}\right\rVert \) is the operator norm (also called the spectral norm) of \( A\) , and \( \left\lVert \ket{\psi}\right\rVert =\sqrt{\braket{\psi|\psi}}\) is the vector \( 2\) -norm). There are many possible choices of universal gate sets, e.g. \( \{H,T, \operatorname{CNOT} \}\) . Another universal gate set is \( \{H, \operatorname{Toffoli} \}\) , which only involves real numbers.
Are some universal gate sets better than others? The Solovay-Kitaev theorem states that all choices of universal gate sets are asymptotically equivalent (see e.g. [8, Chapter 2]):
Theorem 1 (Solovay-Kitaev)
Let \( \mathcal{S},\mathcal{T}\) be two universal gate sets that are closed under inverses. Then any \( m\) -gate circuit using the gate set \( \mathcal{S}\) can be implemented to precision \( \epsilon\) using a circuit of \( \mathcal{O}(m\cdot \operatorname{poly}\log(m/\epsilon))\) gates from the gate set \( \mathcal{T}\) , and there is a classical algorithm for finding this circuit in time \( \mathcal{O}(m\cdot \operatorname{poly}\log(m/\epsilon))\) .
Another natural question is about the computational power of quantum computers. Perhaps surprisingly, it is very difficult to prove that quantum computer is more powerful than classical computer. But is quantum computer at least as powerful as classical computers? The answer is yes! More specifically, any classical circuit can also be asymptotically efficiently implemented using a quantum circuit.
The proof rests on that the classical universal gate can be efficiently simulated using quantum circuits. Note that this is not a straightforward process: NAND, and other classical gates (such as AND, OR etc.) are not reversible gates! Hence the first step is to perform classical computation with reversible gates. More specifically, any irreversible classical gate \( x\mapsto f(x)\) can be made into a reversible classical gate
(73)
In particular, we have \( (x,0)\mapsto (x,f(x))\) computed in a reversible way. The key idea is to store all intermediate steps of the computation (see [1, Section 3.2.5] for more details).
On the quantum computer, storing all intermediate computational steps indefinitely creates two problems: (1) tremendous waste of quantum resources (2) the intermediate results stored in some extra qubits are still entangled to the quantum state of interest. So if the environments interfere with intermediate results, the quantum state of interest is also affected.
Fortunately, both problems can be solved by a step called “uncomputation”. In order to implement a Boolean function \(f :\{0,1\}^{n} \rightarrow\{0,1\}^{m}\), we assume there is an oracle
(74)
where \(\ket{0^m}\) comes from a \(m\)-qubit output register. The oracle is often further implemented with the help of a working register (a.k.a. “garbage” register) such that
(75)
From the no-deleting theorem, there is no generic unitary operator that can set a black-box state to \(\ket{0^w}\). In order to set the working register back to \(\ket{0^w}\) while keeping the input and output state, we introduce yet another \(m\)-qubit ancilla register initialized at \(\ket{0^m}\). Then we can use an \( n\) -qubit CNOT controlled on the output register and obtain
(76)
It is important to remember that in the operation above, the multi-qubit CNOT gate only performs the classical copying operation in the computational basis, and does not violate the no-cloning theorem.
Recall that \( U_f^{-1}=U_f^{†}\) , we have
(77)
Finally we apply an \( n\) -qubit SWAP operator on the ancilla and output registers to obtain
(78)
After this procedure, both the ancilla and the working register are set to the initial state. They are no longer entangled to the input or output register, and can be reused for other purposes. This procedure is called uncomputation. The circuit is shown in (2).
Remark 1 (Discarding working registers)
After the uncomputation as shown in (2), the first two registers are unchanged before and after the application of the circuit (though they are changed during the intermediate steps). Therefore (2) effectively implements a unitary
(79)
or equivalently
(80)
In the definition of \( V_f\) , all working registers have been discarded (on paper). This allows us to simplify the notation and focus on the essence of the quantum algorithms under study.
Using the technique of uncomputation, if the map \( x\mapsto f(x)\) can be efficiently implemented on a classical computer, then we can implement this map efficiently on a quantum computer as well. To do this, we first turn it into a reversible map (73). All reversible single-bit and two-bit classical gates can be implemented using single-qubit and two-qubit quantum gates. So the reversible map can be made into a unitary operator
(81)
on a quantum computer. This proves that a quantum computer is at least as powerful as classical computers.
The unitary transformation \( U_f\) in (81) can be applied to any superposition of states in the computational basis, e.g.
(82)
This does not necessarily mean that we can efficient implement the map \( \ket{x}\mapsto \ket{f(x)}\) . However, if \( f\) is a bijection, and we have access to the inverse of the reversible circuit for computing \( f^{-1}\) , then we may use the technique of uncomputation to implement such a map ((9)).
Let \([N]=\set{0,1,…,N-1}\). Any integer \(k\in[N]\) where \(N=2^n\) can be expressed as an \(n\)-bit string as \(k=k_{n-1}… k_0\) with \(k_i\in\{0,1\}\). This is called the binary representation of the integer \( k\) . It should be interpreted as
(83)
The number \( k\) divided by \( 2^m\) (\( 0\le m\le n\) ) can be written as (note that the decimal is shifted to be after \( k_m\) ):
(84)
The most common case is \( m=n\) , where
(85)
Sometimes we may also write \( a=0.k_{1}… k_{n}\) , so that \( k_i\) is the \( i\) -th decimal of \( a\) in the binary representation. For a given floating number \( 0\le a<1\) written as
(86)
the number \( (0.k_1… k_n)\) is called the \( n\) -bit fixed point representation of \( a\) . Therefore to represent \( a\) to additive precision \( \epsilon\) , we will need \( n=\lceil\log_2 \epsilon\rceil\) qubits. If the sign of \( a\) is also important, we may reserve one extra bit to indicate its sign.
Together with the reversible computational model, we can perform classical arithmetic operations, such as \( (x,y)\mapsto x+y\) , \( (x,y)\mapsto xy\) , \( x\mapsto x^{\alpha}\) , \( x\mapsto \cos(x)\) etc. using reversible quantum circuits. The number of ancilla qubits, and the number of elementary gates needed for implementing such quantum circuits is \( \mathcal{O}(\operatorname{poly}(n))\) (see [4, Chapter 6] for more details).
It is worth commenting that while quantum computer is theoretically as powerful as classical computers, there is a very significant overhead in implementing reversible classical circuits on quantum devices, both in terms of the number of ancilla qubits and the circuit depth.
All previous discussions assume that quantum operations can be perfectly performed. Due to the immense technical difficulty for realizing quantum computers, both quantum gates and quantum measurements may involve (significant) errors, particularly on near-term quantum devices. However, the threshold theorem states that if the noise in individual quantum gates is below a certain constant threshold (around \( 10^{-4}\) or above), it is possible to efficiently perform an arbitrarily large quantum computation with any desired precision (see [1, Section 10.6]). This procedure requires quantum error correction protocols.
This course will not discuss any details on quantum error corrections. We always assume fault-tolerant protocols have been implemented, and all errors come from either approximation errors at the mathematical level, or Monte Carlo errors in the readout process due to the probabilistic nature of the measurement process.
Let \( n\) be the number of qubits needed to represent the input. A quantum algorithm is efficient if the number of gates in the quantum circuit is \( \mathcal{O}(\operatorname{poly}(n))\) . Due to the probabilistic nature of the measurement outcome, we are typically satisfied if a quantum algorithm can produce the correct answer with sufficiently high probability \( p\) . For a decision problem that asks for a binary answer \( 0\) or \( 1\) , we require \( p>2/3\) (or at least \( p > 1/2+1/\operatorname{poly}(n)\) ). For other problems that we have an efficient procedure to check the correctness of the answer, we require \( p=\Omega(1)\) . Repeating this process many times and apply the Chernoff bound [1, Box 3.4], we can make the probability of outputting an incorrect answer vanishingly small.
In quantum algorithms, the computational cost is often measured in terms of the query complexity. Assume that we have access to black-box unitary operator \( U_f\) (e.g. the one used in the reversible computation), which is often called a quantum oracle. Our goal is to perform a given task using as few queries as possible to \( U_f\) .
Example 12 (Query access to a boolean function)
Let \( f:\{0,1\}^n\to\{0,1\}\) be a boolean function, which can be queried via the following unitary
(87)
This is called a phase kickback, i.e., the value of \( f(x)\) is returned as a phase factor. The phase kickback is an important tool in many quantum algorithms, e.g. Grover’s algorithm. Note that 1) \( U_f\) can be applied to a superposition of states in the computational basis, and 2) Having query access to \( f(x)\) does not mean that we know everything about \( f(x)\) , e.g. finding the set \( \set{x|f(x)=0}\) can still be a difficult task.
Example 13 (Partially specified quantum oracles)
When designing quantum algorithms, it is common that we are not interested in the behavior of the entire unitary matrix \( U_f\) , but only \( U_f\) applied to certain vectors. For instance, for a \( (n+1)\) -qubit state space, we are only interested in
(88)
This means that we have only defined the first block-column of \( U_f\) as (remember that the row-major order is used)
(89)
Here \( A,B\) are \( N\times N\) matrices, and \( *\) stands for an arbitrary \( N\times N\) matrix so that \( U_f\) is unitary. Of course in order to implement \( U_f\) into quantum gates, we still need to specify the content of \( *\) . However, at the conceptual level, the partially specified unitary (88) simplifies the design process of quantum oracles.
The concept of query complexity hides the implementation details of \( U_f\) , and in some cases we can prove lower bounds on the number of queries to solve a certain problem, e.g. in the case of Grover’s search (proving a lower bound of the number of gates among all quantum algorithms can be much harder). Furthermore, once we have access to the number of elementary gates needed to implement \( U_f\) , we obtain immediately the gate complexity of the total algorithm. However, some queries can be (provably) difficult to implement, and then there can be a large gap between the query complexity and gate complexity. In order to obtain a meaningful query complexity analysis, one should also make sure that other components of the quantum algorithm will not end up dominating the total gate complexity, when all factors are taken into account.
Another important measure of the complexity is the circuit depth, i.e., the maximum number of gates along any path from an input to an output. Since quantum gates can be performed in parallel, the circuit depth is approximately equivalent to the concept of “wall-clock time” in classical computation, i.e., the real time needed for a quantum computer to carry out a certain task. Since quantum states can only be preserved for a short period of time (called the coherence time), the circuit depth also provides an approximate measure of whether the quantum algorithm exceeds the coherence limit of a given quantum computer. In many scenarios, the maximum coherence time is the most severe limiting factor. When possible, it is often desirable to reduce the circuit depth, even if it means that the quantum circuit needs to be carried out many more times.
Let us summarize the basic components of a typical quantum algorithm: the set of qubits can be separated into system registers (storing quantum states of interest) and ancilla registers (auxiliary registers needed to implement the unitary operation acting on system registers). Starting from an initial state, apply a series of one-/two-qubit gates, and perform measurements. Uncomputation should be performed whenever possible. Within the ancilla registers, if a register can be “freed” after the uncomputation, it is called a working register. Since working registers can be reused for other purposes, the cost of working registers is often not (explicitly) factored into the asymptotic cost analysis in the literature.
We use \( \|\cdot\|\) to denote vector or matrix 2-norm: when \( v\) is a vector we denote by \( \|v\|\) its 2-norm, and when \( A\) is matrix we denote by \( \|A\|\) its operator norm. Other matrix and vector norms will be introduced when needed. Unless otherwise specified, a vector \( v\in\mathbb{C}^N\) is an unnormalized vector, and a normalized vector (stored as a quantum state) is denoted by \( \ket{v}=v/\left\lVert v\right\rVert \) . A vector \( v\) can be expressed in terms of \( j\) -th component as \( v=(v_j)\) or \( (v)_j=v_j\) . We use a \( 0\) -based indexing, i.e., \( j=0,…,N-1\) or \( j\in [N]\) . When \( 1\) -based indexing is used, we will explicitly write \( j=1,…,N\) . We use the following asymptotic notations besides the usual \( \mathcal{O}\) (or “big-O”) notation: we write \( f=\Omega(g)\) if \( g=\mathcal{O}(f)\) ; \( f=\Theta(g)\) if \( f=\mathcal{O}(g)\) and \( g=\mathcal{O}(f)\) ; \( f=\widetilde{\mathcal{O}}(g)\) if \( f=\mathcal{O}(g\operatorname{polylog}(g))\) .
Exercise 1
Prove that any unitary matrix \( U\in \mathbb{C}^{N\times N}\) can be written as \( U=e^{\mathrm{i} H}\) , where \( H\) is an Hermitian matrix.
Exercise 2
Prove (26).
Exercise 3
Write down the matrix representation of the SWAP gate, as well as the \( \sqrt{ \operatorname{SWAP} }\) and \( \sqrt{ \operatorname{iSWAP} }\) gates.
Exercise 4
Prove that the Bell state (32) cannot be written as any product state \( \ket{a}\otimes\ket{b}\) .
Exercise 5
Prove (39) holds for a general mixed state \( \rho\) .
Exercise 6
Prove that an ensemble of admissible density operators is also a density operator.
Exercise 7
Prove the no-deleting theorem in (55).
Exercise 8
Work out the circuit for implementing (76) and (78).
Exercise 9
Prove that if \( f:\{0,1\}^n\to\{0,1\}^n\) is a bijection, and we have access to the inverse mapping \( f^{-1}\) , then the mapping \( U_f:\ket{x}\mapsto \ket{f(x)}\) can be implemented on a quantum computer.
Exercise 10
Prove (56) and (57).
Now we will introduce a few basic quantum algorithms, of which the ideas and variants are present in numerous other quantum algorithms. Hence they are called “quantum primitives”. There is no official ruling on which quantum algorithms qualify for being included in the set of quantum primitives, but the membership of Grover’s algorithm, quantum Fourier transform, quantum phase estimation, and Trotter based Hamiltonian simulation should not be controversial. We first introduce Deutsch’s algorithm, which is arguably one of the simplest quantum algorithms carrying out a well-defined task.
Assume we have two boxes, each of them may contain either an apple or an orange. We would like to answer: whether the two boxes contain the same type of fruit (but do not need to answer whether it is apple or orange).
This seems to be a weird question. If the content of a box can only be checked by opening it, then to answer the question we would need to open two boxes and check what is inside. It is impossible to answer whether the fruit types are the same without knowing the types! Nevertheless, this is precisely the question to be addressed by Deutsch’s algorithm.
Mathematically, consider a boolean function \( f:\{0,1\}\to\{0,1\}\) . The question is whether \( f(0)=f(1)\) or \( f(0)\ne f(1)\) ? A quantum fruit-checker assumes the access to \( f\) via the following quantum oracle:
(90)
A classical fruit-checker can only query \( U_f\) in the computational basis as
(91)
After these two queries, we can measure qubit \( 1\) with a deterministic outcome, and answer whether \( f(0)=f(1)\) . However, a quantum fruit-checker can apply \( U_f\) to a linear combination of states in the computational basis.
Let us first check that \( U_f\) is unitary:
(92)
which gives \( U_f^{†}U_f=I\) .
The idea behind Deutsch’s algorithm is to convert the oracle (90) into a phase kickback in (87). Take \( \ket{y}=\ket{-}=\frac{1}{\sqrt{2}}(\ket{0}-\ket{1})\) . Then
(93)
Note that \( \ket{y}=H X\ket{0}\) , (93) can also be interpreted as
(94)
The application of \( XH\) can be viewed as the step of uncomputation. Neglecting the qubit 1 which does not change before and after the application, we can focus on the first qubit only, which effectively defines a unitary
(95)
Hence the information of \( f(x)\) is stored as a phase factor (\( 0\) or \( \pi\) ). Recall that using a Hadamard gate \( H\ket{0}=\ket{+}, H\ket{1}=\ket{-}\) , the quantum circuit of Deutsch’s algorithm is
or in the commonly seen form in (3).
The answer is embedded in the measurement outcome of qubit \( 0\) . To verify this:
(96)
So if \( f(0)=f(1)\) , the final state is \( \pm\ket{0,-}\) . Measuring qubit \( 0\) returns \( 0\) deterministically (the globally phase factor is irrelevant). Similarly if \( f(0)\ne f(1)\) , the final state is \( \pm\ket{1,-}\) . Measuring qubit \( 0\) returns \( 1\) deterministically. In summary, only one query to \( U_f\) is sufficient to answer whether the two boxes contain the same type of fruit. The procedure is equally counterintuitive. Note that a classical fruit checker as implemented in (91) naturally receives the information by measuring qubit 1. On the other hand, Deutsch’s algorithm only uses qubit 1 as a signal qubit, and all the information is retrieved by measuring qubit 0, which, at least from the classical perspective of (91), seems to contain no information at all!
Now we have seen that in terms of the query complexity, a quantum fruit-checker is clearly more efficient. However, it is a fair question how to implement the oracle \( U_f\) , especially in a way that somehow does not already reveal the values of \( f(0),f(1)\) . We give the implementation of some cases of \( U_f\) in (14). In general, proving the query complexity alone may not be convincing enough that quantum computers are better than classical computers, and the gate complexity matters. This will not be the last time we hide away such “implementation details” of quantum oracles.
Remark 2 (Deutsch–Jozsa algorithm)
The single-qubit version of the Deutsch algorithm can be naturally generalized to the \( n\) -qubit version, called the Deutsch–Jozsa algorithm. Given \( N=2^n\) boxes with an apple or an orange in each box, and the promise that either 1) all boxes contain the same type of fruit or 2) exactly half of the boxes contain apples and the other half contain oranges, we would like to distinguish the two cases. Mathematically, given the promise that a boolean function \( f:\{0,1\}^n\to\{0,1\}\) is either a constant function (i.e., \( |\set{x|f(x)=0}|=0\) or \( 2^n\) ) or a balanced function (i.e., \( |\set{x|f(x)=0}|=2^{n-1}\) ), we would like to decide to which type \( f\) belongs. We refer to [1, Section 1.4.4] for more details.
Example 14 (Qiskit example of Deutsch’s algorithm)
For \( f(0)=f(1)=1\) (constant case), we can use \( U_f=I\otimes X\) . For \( f(0)=0,f(1)=1\) (balanced case), we can use \( U_f= \operatorname{CNOT} \) .
Assume we have \( N=2^{n}\) boxes, and we are given the promise that only one of the boxes contains an orange, and each of the remaining boxes contains an apple. The goal is to find the box that contains the orange.
Mathematically, given a boolean function \( f:\{0,1\}^n\to\{0,1\}\) and the promise that there exists a unique marked state \( x_0\) that \( f(x_0)=1\) , we would like to find \( x_0\) . This is called an unstructured search problem. Classically, there is no simpler methods than opening \( (N-1)\) boxes in the worst case to determine \( x_0\) .
The quantum algorithm below, called Grover’s algorithm, relies on access to an oracle
(97)
and can find \( x_0\) using \( \mathcal{O}(\sqrt{N})\) queries. A classical computer again can only query \( U_f\) in the computational basis, and Grover’s algorithm achieves a quadratic speedup in terms of the query complexity.
The origin of the quadratic speedup can be summarized as follows: while classical probabilistic algorithms work with probability densities, quantum algorithms work with wavefunction amplitudes, of which the square gives the probability densities. More specifically, we start from a uniform superposition of all states as the initial state
(98)
This state can be prepared using Hadamard gates as
(99)
We would like to amplify the desired amplitude corresponding to \( \ket{x_0}\) from \( 1/\sqrt{N}\) to \( \sqrt{p}=\Omega(1)\) . We demonstrate below that this requires \( \mathcal{O}(\sqrt{N})\) queries to \( U_f\) . After this procedure, by measuring the final state in the computational basis, we obtain some output state \( \ket{x}\) . We can check whether \( x=x_0\) by applying another query of \( U_f\) according to \( U_f \ket{x,0}=\ket{x,f(x)}\) . The probability of obtaining \( f(x)=1\) is \( p\) . If \( f(x)\ne 1\) , we repeat the process. Then after \( \mathcal{O}(1/p)\) times of repetition, we can obtain \( x_0\) with high probability.
The first step of Grover’s algorithm is to turn the oracle (97) into a phase kickback. For this we take \( \ket{y}=\ket{-}\) , and for any \( x\in\{0,1\}^n\) ,
(100)
Any quantum state \( \ket{\psi}\) can be decomposed as
(101)
where \( \ket{\psi^{\perp}}\) is the component of \( \ket{\psi}\) orthogonal to \( \ket{x_0}\) , i.e., \( \braket{\psi^{\perp}|x_0}=0\) . We have
(102)
Here the minus sign is gained through the phase kickback. Discarding the \( \ket{-}\) which is unchanged by applying \( U_f\) , we obtain an \( n\) -qubit unitary
(103)
Therefore \( R_{x_0}\) is a reflection operator across the hyperplane orthogonal to \( \ket{x_0}\) , i.e., the Householder reflector
(104)
Let us write
(105)
with \( \theta=2\arcsin \frac{1}{\sqrt{N}}\approx \frac{2}{\sqrt{N}}\) , and \( \ket{\psi_0^{\perp}}=\frac{1}{\sqrt{N-1}}\sum_{x\ne x_0} \ket{x}\) is a normalized state orthogonal to \( \ket{x_0}\) . Then
(106)
So \( \operatorname{span} \{\ket{x_0},\ket{\psi_0^{\perp}}\}\) is an invariant subspace of \( R_{x_0}\) .
The next key step is to consider another Householder reflector with respect to \( \ket{\psi_0}\) . For later convenience we add a global phase factor \( -1\) (which is irrelevant to the physical outcome):
(107)
Direct computation shows
(108)
So define \( G=R_{\psi_0}R_{x_0}\) as the product of the two reflection operators (called the Grover operator), then it amplifies the angle from \( \theta/2\) to \( 3\theta/2\) . The geometric picture is in fact even clearer in (4) and the conclusion can be observed without explicit computation.
Applying the Grover operator \( k\) times, we obtain
(109)
So for \( \sin((2k+1)\theta/2)\approx 1\) , we need \( k\approx \frac{\pi}{2\theta}-\frac{1}{2}\approx \frac{\sqrt{N}\pi}{4}\) . This proves that Grover’s algorithm can solve the unstructured search problem with \( \mathcal{O}(\sqrt{N})\) queries to \( U_f\) .
Another derivation of the Grover method is to focus on the operator, instead of the initial vector \( \ket{\psi_0}\) at each step of the calculation. In the orthonormal basis \( \mathcal{B}=\{\ket{x_0},\ket{\psi_0^{\perp}}\}\) , the matrix representation of the reflector \( R_{x_0}=I-2\ket{x_0}\bra{x_0}\) is
(110)
The matrix representation for the Grover diffusion operator \( R_{\psi_0}=2\ket{\psi_0}\bra{\psi_0}-I\) is
(111)
Here \( \sin(\theta/2)=a=1/\sqrt{N}\) . Therefore for the matrix representation of the Grover iterate \( G=R_{\psi_0}R_{x_0}\) is
(112)
i.e., \( G\) is a rotation matrix restricted to the two-dimensional space \( \mathcal{H}= \operatorname{span} \mathcal{B}\) .
The initial vector satisfies
(113)
so Grover’s search can be applied as before, via \( G^k\) for \( k\approx \frac{\pi\sqrt{N}}{4}\) times.
To draw the quantum circuit of Grover’s algorithm, we need an implementation of \( R_{\psi_0}\) . Note that
(114)
This can be implemented via the following circuit using one ancilla qubit: }
Here the controlled-NOT gate is an \( n\) -qubit controlled-\( X\) gate, and is only active if the system qubits are in the \( 0^n\) state. Discarding the signal qubit, we obtain and implementation of \( R_{\psi_0}\) . Since the signal qubit \( \ket{-}\) only changes up to a sign, it can be reused for both \( R_{\psi_0}\) and \( R_{x_0}\) .
The reflector \( R_{\psi_0}\) can also be implemented without using the ancilla qubit as (use a 3-qubit system as an example)
Remark 3 (Multiple marked states)
The Grover search algorithm can be naturally generalized to the case when there are \( M>1\) marked states. The query complexity is \( \mathcal{O}(\sqrt{N/M})\) .
Grover’s algorithm is not restricted to the problem of unstructured search. One immediate application is called amplitude amplification (AA), which is used ubiquitously as a subroutine to achieve quadratic speedups.
Let \(\ket{\psi_0}\) be prepared by an oracle \( U_{\psi_0}\) , i.e., \( U_{\psi_0}\ket{0^n}=\ket{\psi_0}\) . We have the knowledge that
(115)
and \( p_0\ll 1\) . Here \( \ket{\psi_{\mathrm{bad}}}\) is an orthogonal state to the desired state \( \ket{\psi_{\mathrm{good}}}\) . We cannot get access to \( \ket{\psi_{\mathrm{good}}}\) directly, but would like to obtain a state that has a large overlap with \( \ket{\psi_{\mathrm{good}}}\) , i.e., amplify the amplitude of \( \ket{\psi_{\mathrm{good}}}\) .
In the problem of unstructured search, \( \ket{\psi_{\mathrm{good}}}=\ket{x_0}\) , and \( p_0=1/N\) . Although we do not have access to the answer \( \ket{x_0}\) , we assume access to the reflection oracle \( R_{x_0}\) . Here we also assume access to the reflection oracle
(116)
From \( U_{\psi_0}\) , we can construct the reflection with respect to the initial state
(117)
via the \( n\) -qubit controlled-\( X\) gate. So following exactly the same procedure as the unstructured search problem, we can construct the Grover iterate
(118)
Applying \( G^k\) to \( \ket{\psi_0}\) for some \( k=\mathcal{O}(1/\sqrt{p_0})\) , we obtain a state that has \( \Omega(1)\) overlap with \( \ket{\psi_{\mathrm{good}}}\) .
Example 16 (Reflection with respect to signal qubits)
One common scenario is that the implementation of \( U_{\psi_0}\) requires \( m\) ancilla qubits (also called signal qubits), i.e.,
(119)
where \( \ket{\perp}\) is some orthogonal state satisfying
(120)
Therefore
(121)
This setting is special since the “good” state can be verified by measuring the ancilla qubits after applying \( U_{\psi_0}\) in (119), and post-select the outcome \( 0^m\) . In particular, the expected number of measurements needed to obtain \( \ket{\psi_{\mathrm{good}}}\) is \( 1/p_0\) .
In order to employ the AA procedure, we first note that the reflection operator can be simplified as
(122)
This is because \( \ket{\psi_{\mathrm{good}}}\) can be entirely identified by measuring the ancilla qubits. Meanwhile
(123)
Let \( G=R_{\psi_0}R_{\mathrm{good}}\) , and applying \( G^k\) to \( \ket{\psi_0}\) for some \( k=\mathcal{O}(1/\sqrt{p_0})\) times, we obtain a state that has \( \Omega(1)\) overlap with \( \ket{\psi_{\mathrm{good}}}\) . This achieves the desired quadratic speedup.
Example 17 (Amplitude damping)
Assuming access to an oracle in (119), where \( p_0\) is large, we can easily dampen the amplitude to any number \( \alpha\le \sqrt{p_0}\) .
We introduce an additional signal qubit. Then (119) becomes
(124)
Define a single qubit rotation operation as
(125)
and we have
(126)
Here \( (\ket{0^{m+1}}\bra{0^{m+1}}\otimes I_n)\ket{\perp'}=0\) . We only need to choose \( \sqrt{p_0}\cos \theta=\alpha\) .
Recall that the unstructured search problem tries to find a marked state \( x_0\in[N]\) , using a reflection oracle
(127)
Grover’s algorithm can find \( x_0\) with constant probability (e.g. at least \( 1/2\) ) by making \( \mathcal{O}(\sqrt{N})\) times querying \( R_{x_0}\) . It turns out that this is asymptotically optimal, i.e., no quantum algorithm can perform this task using fewer than \( \Omega(\sqrt{N})\) access to \( R_{x_0}\) .
Any quantum search algorithm that starts from a universal initial state \( \ket{\psi_0}\) and queries \( R_{x_0}\) for \( k\) steps can be written in the following form:
(128)
for some unitaries \( U_1,…,U_k\) . For simplicity we assume no ancilla qubits are used, and the proof can be generalized to the case in the presence of ancilla qubits. The superscript \( x_0\) indicates that the state depends on the marked state \( x_0\) . Specifically, by “solving” the search problem, it means that there exists a \( k\) so that for each marked state \( x_0\) ,
(129)
In other words, measuring \( \ket{\psi^{x_0}_k}\) in the computational basis, the probability of obtaining \( \ket{x_0}\) is at least \( 1/2\) .
To prove the lower bound, we compare the action of \( \mathcal{U}^{x_0}_k\) with another “fake algorithm” \( \mathcal{U}_k\) , defined as
(130)
In particular, \( \ket{\psi_k}\) does not involve any information of the solution \( x_0\) and therefore cannot possibly solve the search problem.
For a set of vectors \( \{f^{x_0}\}_{x_0\in [N]}\) , and each \( f^{x_0}\in \mathbb{C}^N\) , we will extensively use the following discrete \( \ell_2\) -norm:
(131)
In particular, we have the following triangle inequality
(132)
The proof contains two steps. First, we show that the true solution and the fake solution differs significantly, in the sense that
(133)
Second, we prove that (define \( D_0=0\) ):
(134)
Therefore to satisfy (133), we must have \( k=\Omega(\sqrt{N})\) .
In the first step, since multiplying a phase factor \( e^{\mathrm{i} \theta}\) to \( \ket{\psi^{x_0}_k}\) does not have any physical consequences, we may choose a particular phase \( \theta\) so that (129) becomes
(135)
Therefore
(136)
This means that
(137)
On the other hand, for the “fake algorithm”, using the Cauchy-Schwarz inequality,
(138)
This violates the bound in (137), and the fake algorithm cannot solve the search problem for arbitrarily large \( k\) .
So from (137,138), and the triangle inequality, we have
(139)
This proves (133). In other words, the true solution and the fake solution must be well separated in \( \ell_2\) -norm.
In the second step, we prove (134) inductively. Clearly (134) is true for \( k=0\) . Assume this is true, then
(140)
The last inequality uses the triangle inequality of the discrete \( \ell_2\) -norm. Note that
(141)
and
(142)
we have
(143)
which finishes the induction.
Finally, combining the lower bound (133,134), we find that the necessary condition to solve the unstructured search problem is \( 4k^2=\Omega(N)\) , or \( k=\Omega(\sqrt{N})\) .
Remark 4 (Implication for amplitude amplification)
Due to the close relation between unstructred search and amplitude amplification, it means that given a state \( \ket{\psi}\) of which the amplitude of the “good” component is \( \alpha\ll 1\) , no quantum algorithms can amplify the amplitude to \( \Omega(1)\) using \( o(\alpha^{-\frac12})\) queries to the reflection operators.
Exercise 11
In Deutsch’s algorithm, demonstrate why not assuming access to an oracle \( V_f:\ket{x}\mapsto\ket{f(x)}\) .
Exercise 12
For all possible mappings \( f:\{0,1\}\to\{0,1\}\) , draw the corresponding quantum circuit to implement \( U_f:\ket{x,0}\mapsto\ket{x,f(x)}\) .
Exercise 13
Prove (109).
Exercise 14
Draw the quantum circuit for (117).
Exercise 15
Prove that when ancilla qubits are used, the complexity of the unstructured search problem is still \( \Omega(\sqrt{N})\) .
The setup of the phase estimation problem is as follows. Let \( U\) be a unitary, and \( \ket{\psi}\) is an eigenvector, i.e.,
(144)
The goal is to find \( \varphi\) up to certain precision. This is a quantum primitive with numerous applications: prime factorization (Shor’s algorithm), linear system (HHL), eigenvalue problem, amplitude estimation, quantum counting, quantum walk, etc.
Using a classical computer, we can estimate \( \varphi\) using \( U\ket{\psi}\oslash\ket{\psi}\) , where \( \oslash\) stands for the element-wise division operation. Specifically, if \( \ket{\psi}\) is indeed an eigenvector and \( \braket{j|\psi}\ne 0\) for any \( j\) in the computational basis, then we can extract the phase from
(145)
Unfortunately, such a element-wise division operation cannot be efficiently implemented on quantum computers, and alternative methods are needed.
Quantum phase estimation has numerous variants, and still receives intensive research attention till today. This chapter only introduces some of the simplest variants.
We first introduce the Hadamard test, which is a useful tool for computing the expectation value of an unitary operator with respect to a state, i.e., \( \braket{\psi|U|\psi}\) . Since \( U\) is generally not Hermitian, this does not correspond to the measurement of a physical observable. Instead the real and imaginary part of the expectation value need to be measured separately.
The (real) Hadamard test is the quantum circuit in (6) for estimating the real part of \( \braket{\psi|U|\psi}\) .
To verify this, we find that the circuit transforms \( \ket{0}\ket{\psi}\) as
The probability of measuring the qubit \( 0\) to be in state \( \ket{0}\) is
(146)
This is well defined since \( -1\le \operatorname{Re} \braket{\psi|U|\psi}\le 1\) .
To obtain the imaginary part, we can use the circuit in (7) called the (imaginary) Hadamard test, where
(147)
is called the phase gate.
Similar calculation shows the circuit transforms \( \ket{0}\ket{\psi}\) to the state
(148)
Therefore the probability of measuring the qubit \( 0\) to be in state \( \ket{0}\) is
(149)
Combining the results from the two circuits, we obtain the estimate to \( \braket{\psi|U|\psi}\) .
Example 18 (Overlap estimate using swap test)
An application of the Hadamard test is called the swap test, which is used to estimate the overlap of two quantum states \( \left\lvert \braket{\varphi|\psi}\right\rvert \) . The quantum circuit for the swap test is
Note that this is exactly the Hadamard test with \( U\) being the \( n\) -qubit swap gate. Direct calculation shows that the probability of measuring the qubit \( 0\) to be in state \( \ket{0}\) is
(150)
Example 19 (Overlap estimate with relative phase information)
In the swap test, the quantum states \( \ket{\varphi},\ket{\psi}\) can be black-box states, and in such a scenario obtaining an estimate to \( \left\lvert \braket{\varphi|\psi}\right\rvert \) is the best one can do. In order to retrieve the relative phase information and to obtain \( \braket{\varphi|\psi}\) , we need to have access to the unitary circuit preparing \( \ket{\varphi},\ket{\psi}\) , i.e.,
(151)
Then we have \( \braket{\varphi|\psi}=\braket{0^n|U_{\varphi}^{†}U_{\psi}|0^n}\) .
Example 20 (Single qubit phase estimation)
The Hadamard test can also be used to derive the simplest version of the phase estimate based on success probabilities. Apply the Hadamard test in (6) with \( U,\psi\) satisfying (144). Then the probability of measuring the qubit \( 0\) to be in state \( \ket{1}\) is
(152)
Therefore
(153)
In order to quantify the efficiency of the procedure, recall from (11) that if \( p(1)\) is far away from \( 0\) or \( 1\) , i.e., \( (2\varphi \mod 1)\) is far away from \( 0\) , in order to approximate \( p(1)\) (and hence \( \varphi\) ) to additive precision \( \epsilon\) , the number of samples needed is \( \mathcal{O}(1/\epsilon^2)\) .
Now assume \( \varphi\) is very close to \( 0\) and we would like to estimate \( \varphi\) to additive precision \( \epsilon\) . Note that
(154)
Then \( p(1)\) needs to be estimated to precision \( \mathcal{O}(\epsilon^2)\) , and again the number of samples needed is \( \mathcal{O}(1/\epsilon^2)\) . The case when \( \varphi\) is close to \( 1/2\) or \( 1\) is similar.
Note that the circuit in (6) cannot distinguish the sign of \( \varphi\) (or whether \( \varphi\ge 1/2\) when restricted to the interval \( [0,1)\) ). To this end we need (7), but replace \( \mathrm{S}^{†}\) by \( \mathrm{S}\) , so that the success probability of measuring \( 1\) in the computational basis is
(155)
This gives
(156)
Unlike the previous estimate, in order to correctly estimate the sign, we only require \( \mathcal{O}(1)\) accuracy, and run (7) for a constant number of times (unless \( \varphi\) is very close to \( 0\) or \( \pi\) ).
Example 21 (Qiskit example for phase estimation using the Hadamard test)
Here is a qiskit example of the simple phase estimation for
with \( \varphi=0.5+\frac{1}{2^d}\equiv 0.{10… 01}\) (d bits in total). In this example, \( d=4\) and the exact value is \( \varphi=0.5625\) .
In (20), the number of measurements needed to estimate \( \varphi\) to precision \( \epsilon\) is \( \mathcal{O}(1/\epsilon^2)\) . A quadratic improvement in precision (i.e., \( \mathcal{O}(1/\epsilon)\) ) can be achieved by means of the quantum phase estimation. One such procedure is called Kitaev’s method [2, Section 13.5].
In the fixed point representation in (2.8), for simplicity we assume that the eigenvalue can be exactly represented using \( d\) bits, i.e.,
(157)
In the simplest scenario, we assume \( d=1\) , and \( \varphi=.\varphi_0,\varphi_0\in\{0,1\}\) . Then \( e^{\text{i}2 \pi \varphi}=e^{\text{i} \pi \varphi_0}\) .
Performing the real Hadamard test in (20), we have \( p(1)=0\) if \( \varphi_0=0\) , and \( p(1)=1\) if \( \varphi_0=1\) . In either case, the result is deterministic, and one measurement is sufficient to determine the value of \( \varphi_0\) .
Next, consider \( \varphi=.\underbrace{0… 0\varphi_0}_{d bits }\) . To determine the value of \( \varphi_0\) , we need to reach precision \( \epsilon<2^{-d}\) . The method in (20) requires \( \mathcal{O}(1/\epsilon^2)=\mathcal{O}(2^{2d})\) repeated measurements, or number of queries to \( U\) . The observation from Kitaev’s method is that if we can have access to \( U^j\) for a suitable power \( j\) , then the number of queries to \( U\) can be reduced. More specifically, if we can query \( U^{2^{d-1}}\) , then the circuit in (9) with \( j=d-1\) gives
(158)
The result is again deterministic. Therefore the total number of queries to \( U\) becomes \( \mathcal{O}(2^{-d})=\mathcal{O}(\epsilon^{-1})\) .
This is the basic idea behind Kitaev’s method: use a more complex quantum circuit (and in particular, with a larger circuit depth) to reduce the total number of queries. As a general strategy, instead of estimating \( \varphi\) from a single number, we assume access to \( U^{2^j}\) , and estimate \( \varphi\) bit-by-bit. In particular, changing \( U\to U^{2^j}\) in the Hadamard test allows us to estimate
(159)
One immediate difficulty of the bit-by-bit estimation is that we need to tell \( 0.0111…\) apart from \( 0.1000…\) , and the two numbers can be arbitrarily close to each other (though the two numbers can also differ at some number of digits), and some careful work is needed. We will first describe the algorithm, and then analyze its performance. The algorithm works for any \( \varphi\) , and then the goal is to estimate its \( d\) bits. For simplicity of the analysis, we assume \( \varphi\) is exactly represented by \( d\) bits. We will use extensively the distance
(160)
which is the distance on the unit circle.
First, by applying the circuit in (9) (and the corresponding circuit to determine the sign) with \( j=0,1,…,d-3\) , for each \( j\) we can estimate \( p(0)\) , so that the error in \( 2^j\varphi\) is less than \( 1/16\) for all \( j\) (this can happen with a sufficiently high success probability. For simplicity, let us assume that this happens with certainty). The measured result is denoted by \( \alpha_j\) . This means that any perturbation must be due to the \( 5\) th digit in the binary representation. For example, if \( 2^j\varphi=0.11100\) , then \( \alpha_j=0.11011\) is an acceptable result with an error \( 0.00001=1/32\) , but \( \alpha_j=0.11110\) is not acceptable since the error is \( 0.0001=1/16\) . We then round \( \alpha_j\) \( \pmod 1\) by its closest \( 3\) -bit estimate denoted by \( \beta_j\) , i.e., \( \beta_j\) is taken from the set \( \set{0.000,0.001,0.010,0.011,0.100,0.101,0.110,0.111}\) . Consider the example of \( 2^j\varphi=0.11110\) , if \( \alpha_j=0.11101\) , then \( \beta_j=0.111\) . But if \( \alpha_j=0.11111\) , then \( \beta_j=0.000\) . Another example is \( 2^j\varphi=0.11101\) , if \( \alpha_j=0.11110\) , then both \( \beta_j=0.111\) (rounded down) and \( \beta_j=0.000\) (rounded up) are acceptable. We can pick one of them at random. We will show later that the uncertainty in \( \alpha_j,\beta_j\) is not detrimental to the success of the algorithm.
Second, we perform some post-processing. Start from \( j=d-3\) , we can estimate \( .\varphi_2\varphi_1\varphi_0\) to accuracy \( 1/16\) , which recovers these three bits exactly. The values of these three bits will be taken from \( \beta_{d-3}\) directly. Then we proceed with the iteration: for \( j=d-4,…,0\) , we assign
(161)
Here \( \left\lvert \cdot\right\rvert _{\bmod 1}\) is the periodic distance on \( [0,1)\) and its value is always \( \le 1/2\) . Since the two possibilities are separated by \( 1/2\) , for each \( j\) , there will be at most one case that is satisfied. We will also show that in all circumstances, there is always one case that is satisfied, regardless of the ambiguity of the choice of \( \beta_j\) above.
After running the algorithm above, we recover \( \varphi=.\varphi_{d-1}… \varphi_0\) exactly. The total cost of Kitaev’s method measured by the number of queries to \( U\) is \( \mathcal{O}\left(\sum_{j=0}^{d-3} 2^{j}\right)=\mathcal{O}(\epsilon^{-1})\) .
If \( \varphi\) is exactly represented by \( d\) bits, we will obtain an estimate
(162)
Example 22
Consider \( \varphi=0.\varphi_4\varphi_3\varphi_2\varphi_1\varphi_0=0.11111\) and \( d=5\) . Running Kitaev’s algorithm with \( j=0,1,2\) gives the following possible choices of \( \beta_j\) :
Start with \( j=2\) . We have only one choice of \( \beta_j\) , and can recover \( 0.\varphi_2\varphi_1\varphi_0=0.111\) . Then for \( j=1\) , we need to use (161) to decide \( \varphi_3\) . If we choose \( \beta_j=0.111\) , we have \( \varphi_3=1\) . But if we choose \( \beta_j=0.000\) , we still need to choose \( \varphi_3=1\) , since \( \left\lvert .011-0.000\right\rvert _{\bmod 1}=0.100=1/2>1/4\) , and \( \left\lvert .111-0.000\right\rvert _{\bmod 1}=0.001=1/8<1/4\) . Similar analysis shows that for \( j=0\) we have \( \varphi_4=1\) . This recovers \( \varphi\) exactly.
Example 23 (A variant of Kitaev’s algorithm that does not work)
Let us modify Kitaev’s algorithm as follows: for each \( 2^j \varphi\) is determined to precision \( 1/8\) , and round the result to \( \beta_j\in\set{0.00,0.01,0.10,0.11}\) . Start from \( j=d-2\) , we estimate \( .\varphi_1\varphi_0\) exactly. Then for \( j=d-3,…,0\) , we assign
(163)
Now that the inequality \( <1/2\) above can be equivalently written as \( \le 1/4\) .
Let us run the algorithm above for \( \varphi=0.\varphi_4\varphi_3\varphi_2\varphi_1\varphi_0=0.1111\) and \( d=4\) . This gives:
Start with \( j=2\) . We have only one choice of \( \beta_j\) , and can recover \( 0.\varphi_1\varphi_0=0.11\) . Then for \( j=1\) , if we choose \( \beta_j=0.11\) , we have \( \varphi_2=1\) . But if we choose \( \beta_j=0.00\) , then \( \left\lvert .01-0.00\right\rvert _{\bmod 1}=0.0.01=1/4\) , and \( \left\lvert .11-0.000\right\rvert _{\bmod 1}=0.01=1/4\) . So the algorithm cannot distinguish the two possibilities and fails.
Let us now inductively show why Kitaev’s algorithm works. Again assume \( \varphi\) is exactly represented by \( d\) bits. For \( j=d-3\) , we know that \( \varphi_2\varphi_1\varphi_0\) can be recovered exactly. Then assume \( \varphi_{d-j-2}…\varphi_0\) have all been exactly computed, at step \( j\) we would like to determine the value of \( \varphi_{d-j-1}\) . From
(164)
we know
(165)
Then
(166)
The wrong choice of \( \varphi_{d-j-1}\) denoted by \( \widetilde{\varphi}_{d-j-1}\) then satisfies
(167)
This proves the validity of (161), and hence that of Kitaev’s algorithm.
Fourier transform is used ubiquitously in scientific computing, and fast Fourier transform (FFT) is the backbone for many fast algorithms in classical computation. Similarly, the quantum Fourier transform is also an important component in many quantum algorithms, such as phase estimation, Shor’s algorithm, and inspires other fast algorithms such as fast Fermionic Fourier transform (FFFT) etc.
For any \( j\) in the computational basis, the (discrete) forward Fourier transform is defined as
(168)
In particular
(169)
The (discrete) inverse Fourier transform is
(170)
Using the binary representation of integers
(171)
we have
Therefore the exponential can be written as
(172)
The most important step of QFT is the following direct calculation, which requires some patience with the manipulation of indices:
(173)
(173) involves a series of controlled rotations of the form
(174)
Hence before discussing the quantum circuit for QFT, let us first work out the circuit for implementing this controlled rotation. We use the relation
(175)
Example 24 (Implementation of controlled rotation)
Consider the implementation of
(176)
Let
(177)
and \( R_j=R_z(\pi/2^{j-1})\) . In particular, \( R_1=Z\) . The quantum circuit is
The implementation of QFT follows the same principle, but does not require the signal qubit to store the phase information. Let us see a few examples.
When \( n=1\) , we need to implement
(178)
This is the Hadamard gate:
(179)
When \( n=2\) , we need to implement
(180)
This can be implemented using the following circuit:
Comparing the result with that in (180), we find that the ordering of the qubits is reversed. To recover the desired result in QFT, we can apply a SWAP gate to the outcome, i.e.,
In order to implement the inverse Fourier transform, we only need to apply the Hermitian conjugate as
Similarly one can construct the circuit for \( U_{\mathrm{FT}}\) and its inverse for \( n=3\) .
In general, the QFT circuit is given by (10). Compare the circuit in (10) with (173), we find again that the ordering is reversed in the output. To restore the correct order as defined in QFT, we can use \( \mathcal{O}(n/2)\) swaps operations. The total gate complexity of QFT is \( \mathcal{O}(n^2)\) .
Example 25 (Qiskit example for QFT)
In Kitaev’s method, we use \( 1\) ancilla qubit but \( d\) different circuits of various circuit depths to perform phase estimation. In this section, we introduce the (standard) quantum phase estimation (QPE), which uses one signal quantum circuit based on QFT, but requires \( d\) -ancilla qubits to store the phase information in the quantum computer. From now, we assume \( \varphi=.\varphi_{d-1}… \varphi_0\) is exact. From the availability of \( U^j\) we can define a controlled unitary operation
(181)
When \( d=1\) , \( \mathcal{U}\) is simply the controlled \( U\) operation. For a general \( d\) , it seems that we need to implement all \( 2^d\) different \( U^j\) . However, this is not necessary. Using the binary representation of integers \( j=(j_{d-1}… j_0.)=\sum_{i=0}^{d-1} j_i 2^{i}\) , we have \( U^{j}=U^{\sum_{i=0}^{d-1} j_i 2^i}=\prod_{i=0}^{d-1}U^{j_i2^i}\) . Therefore similar to the operations in QFT,
(182)
Here the primed product \( \sideset{}{’}\prod \) is a slightly awkward notation, which means the tensor product for the first register, and the regular matrix product for the second register. It is in fact much clearer to observe the structure in the quantum circuit in (11).
Remark 5
At first glance, the saving due to the usage of the circuit in (11) may not seem to be large, since we still need to implement matrix powers as high as \( U^{2^{d-1}}\) . However, the alternative would be to implement \( \sum_{j\in[2^d]} \ket{j}\bra{j}\) , which requires very complicated multi-qubit control operations. Another scenario when significant advantage can be gained is when \( U\) can be fast-forwarded, i.e., \( U^j\) can be implemented at a cost that is independent of \( j\) . This is for instance, if \( U=R_z(\theta)\) is a single-qubit rotation. Then the circuit (11) is exponentially better than the direct implementation of \( \mathcal{U}\) .
Now let the initial state in the ancilla qubits be \( \ket{0^n}\) . Use QFT and \( \mathcal{U}\) , we transform the initial states according to
(183)
Since we have \( \varphi=\frac{k}{2^d}\) for some \( k\in[2^d]\) , measuring the ancilla qubits, and we will obtain the state \( \ket{k}\ket{\psi_0}\) with certainty, and we obtain the phase information. Therefore the quantum circuit for QFT based QPE is given by (12). Here we have used (169). We should note that \( U_{\mathrm{FT}}^{†}\) includes the swapping operations. (Exercise: 1. what if the swap operation is not implemented? 2. Is it possible to modify the circuit and remove the need of implementing the swap operations?)
Example 26 (Hadamard test viewed as QPE)
When \( d=1\) , note that \( U^{†}=U^{†}_{\mathrm{FT}}=H\) , the QFT based QPE in (12) is exactly the Hadamard test in (6). Note that \( \varphi\) does not need to be exactly represented by a one bit number!
Example 27 (Qiskit example for QPE)
In order to apply QPE (standard or Kitaev), we have assumed that
In general practical calculations, neither condition can be exactly satisfied, and we need to analyze the effect on the error of the QPE. Recall the discussion in (2.9), we assume the only sources of errors are at the mathematical level (instead of noisy implementation of quantum devices). In this context, the error can be due to an inexact eigenstate \( \ket{\psi}\) , or Monte Carlo errors in the readout process due to the probabilistic nature of the measurement process. In this section, we relax these constraints and study what happens when the conditions are not exactly met. We assume \( U\) has the eigendecomposition.
(184)
Without loss of generality we assume \( 0\le \varphi_0\le \varphi_1… \le\varphi_{N-1}<1\) . We are interested in using QPE to find the value of \( \varphi_0\) .
We first relax the condition (1), i.e., assume all \( \varphi_i\) ’s have an exact \( d\) -bit binary representation, but the quantum state is given by a linear combination
(185)
Here the overlap \( p_0=\left\lvert \braket{\phi|\psi_0}\right\rvert ^2=\left\lvert c_0\right\rvert ^2<1\) .
Applying the QPE circuit in (12) to \( \ket{0^t}\ket{\phi}\) with \( t=d\) , and measure the ancilla qubits, we obtain the binary representation of \( \varphi_0\) with probability \( p_0\) . Furthermore, the system register returns the eigenstate \( \ket{\psi_0}\) with probability \( p_0\) . Of course, in order to recognize that \( \varphi_0\) is the desired phase, we need some a priori knowledge of the location of \( \varphi_0\) , e.g. \( \varphi_0\in(a,b)\) and \( \varphi_i>b\) for all \( i\ne 0\) . It would be desirable to relax both conditions (1) and (2). However, the analysis can be rather involved and additional assumptions are needed. We will discuss some of the implications in the context of estimating the ground state energy in (5.1).
For now, to simplify the analysis, we focus on the case that only the condition (2) is violated, i.e., \( \varphi_0\) cannot be exactly represented by a \( d\) -bit number, and we need to apply the QPE circuit to an initial state \( \ket{0^t}\ket{\phi}\) with \( t>d\) . The exact relation between the \( t\) and the desired accuracy \( d\) will be determined later. Similar to (183), we obtain the state
(186)
Here
(187)
Therefore if \( \varphi_0\) has an exact \( d\) -bit representation, i.e., \( \varphi_0=\widetilde{\varphi}_{k'_0}\) for some \( k'_0\) , then \( \gamma_{0,k'}=\delta_{k',k'_0}\) . We recover the previous result that one run of the QPE circuit gives the value \( \varphi_0\) deterministically.
Now assume that \( \varphi_0 \ne \widetilde{\varphi}_{k'}\) for any \( k'\) . Note that \( e^{\mathrm{i} 2\pi x}\) is a periodic function with period \( 1\) , we can only determine the value of \( x \bmod 1\) . Therefore we use the periodic distance (160). In terms of the phase, we would like to find \( k_0'\) such that
(188)
Here \( \epsilon=2^{-d}=2^{t-d}/T\) is the precision parameter. In particular, for any \( k'\) we have
(189)
Using the relation that for any \( \theta\in[-\pi,\pi]\) ,
(190)
we obtain
(191)
Let \( k'_0\) be the measurement outcome, which can be viewed as a random variable. The probability of obtaining some \( \widetilde{\varphi}_{k'_0}\) that is at least \( \epsilon\) distance away from \( \varphi_0\) is
(192)
Set \( t-d=\lceil\log_2 \delta^{-1}\rceil\) , then \( T\epsilon=2^{t-d}\ge \delta^{-1}\) . Hence for any \( 0<\delta<1\) , the failure probability
(193)
In other words, in order to obtain the phase \( \varphi_0\) to accuracy \( \epsilon=2^{-d}\) with a success probability at least \( 1-\delta\) , we need \( d+\lceil\log_2 \delta^{-1}\rceil\) ancilla qubits to store the value of the phase. On top of that, the simulation time needs to be \( T=(\epsilon\delta)^{-1}\) .
Remark 6 (Quantum median method)
One problem with QPE is that in order to obtain a success probability \( 1-\delta\) , we must use \( \log_2 \delta^{-1}\) ancilla qubits, and the maximal simulation time also needs to be increased by a factor \( \delta^{-1}\) . The increase of the maximal simulation time is particularly undesirable since it increases the circuit depth and hence the required coherence time of the quantum device. When \( \ket{\psi}\) is an exact eigenstate, this can be improved by the median method, which uses \( \log \delta^{-1}\) copies of the result from QPE without using ancilla qubits or increasing the circuit depth. When \( \ket{\psi}\) is a linear combination of eigenstates, the problem of the aliasing effect becomes more difficult to handle. One possibility is to generalize the median method into the quantum median method [9], which uses classical arithmetics to evaluate the median using a quantum circuit. To reach success probability \( 1-\delta\) , we still need \( \log_2 \delta^{-1}\) ancilla qubits, but the maximal simulation time does not need to be increased.
Exercise 16
Write down the quantum circuit for the overlap estimate in (19).
Exercise 17
For \( \varphi=0.111111\) , we run Kitaev’s algorithm to estimate its first \( 4\) bits. Check that the outcome satisfies (162). Note that \( 0.0000\) and \( 0.1111\) are both acceptable answers. Prove the validity of Kitaev’s algorithm in general in the sense of (162).
Exercise 18
For a \( 3\) qubit system, explicitly construct the circuit for \( U_{\mathrm{FT}}\) and its inverse.
Exercise 19
For a \( n\) -qubit system, write down the quantum circuit for the swap operation used in QFT.
Exercise 20
Similar to the Hadamard test in (26), develop an algorithm to perform QPE using the circuit in (12) with only \( d=2\) , while the phase \( \varphi\) can be any number in \( [0,1/2)\) .
As an application, we use QPE to solve the problem of estimating the ground state energy of a Hamiltonian. Let \( H\) be a Hermitian matrix with eigendecomposition
(194)
Below are two examples of Hamiltonians commonly encountered in quantum many-body physics.
Example 28 (Transverse field Ising model)
The Hamiltonian for the one dimensional transverse field Ising model (TFIM) with nearest neighbor interaction of length \( n\) is
(195)
The dimension of the Hamiltonian matrix \( H\) is \( 2^n\) .
Example 29 (Fermionic system in second quantization)
For a fermionic system (such as electrons), the Hamiltonian can be expressed in terms of the creation and annihilation operators as
(196)
The creation and annihilation operators \( \hat{a}^{†}_i,\hat{a}_i\) can be converted into Pauli operators via e.g. the Jordan-Wigner transform as
(197)
Here \( X,Y,Z,I\) are single-qubit Pauli-matrices. The dimension of the Hamiltonian matrix \( \hat{H}\) is thus \( 2^n\) .
The number operator takes the form
(198)
For a given state \( \ket{\Psi}\) , the total number of particles is
(199)
The Hamiltonian \( H\) preserves the total number of particles \( N_e\) .
Without loss of generality we assume \( 0<\lambda_0\le\lambda_1\le … <\lambda_{N-1}<\frac12\) . Note that for the purpose of estimating the ground state energy, we do not necessarily require a positive energy gap. For simplicity of the presentation, we still assume that the ground state is non-degenerate, i.e., \( \lambda_0<\lambda_1\) . We are also provided an approximate eigenstate
(200)
of which the overlap with the ground state is \( p_0=\left\lvert \braket{\phi|\psi_0}\right\rvert ^2\) . Our goal is to estimate \( \lambda_0\) to precision \( \epsilon=2^{-d}\) . We assume \( \epsilon<\lambda_0\) . This appears in many problems in quantum many-body physics, quantum chemistry, optimization etc.
In order to use QPE (based on QFT), we assume access to the unitary evolution operator \( U=e^{\mathrm{i} 2\pi H}\) . This is called a Hamiltonian simulation problem, which will be discussed in detail in later chapters. For now we assume \( U\) can be implemented exactly. Then
This becomes a phase estimation problem, where the input vector is not an exact eigenstate. Following the discussion in (4.5), if all eigenvalues \( \lambda_j\) can be exactly represented by \( d\) -bit numbers, we obtain both the ground state and the ground state energy with probability \( p_0\) . Therefore repeating the process for \( \mathcal{O}(p_0^{-1})\) times we obtain the ground state energy.
Now we relax both conditions (1) and (2) in (4.5), and apply the QPE circuit in (12) to an initial state \( \ket{0^t}\ket{\phi}\) for some \( t>d\) . Similar to (183), we have
(201)
Here
(202)
Therefore the definition in (187) is a special case.
Our algorithm is simple: we would like to run QPE \( M\) times, and denote the output of the \( \ell\) -th run by \( \widetilde{\varphi}^{(\ell)}_{k'}\) . Then we take the minimum of the measured output \( \min_{\ell}\widetilde{\varphi}^{(\ell)}_{k'}\) as the estimate to the ground state energy. The hope is to obtain the ground state energy to accuracy \( \epsilon\) with success probability at least \( 1-\delta\) for any \( \delta>0\) . Let us now analyze this algorithm.
If \( \varphi_0\) has an exact \( d\) -bit representation, i.e., \( \lambda_0=\widetilde{\varphi}_{k'_0}\) for some integer \( k'_0\) , then \( \gamma_{k'_0,k'}=\delta_{k'_0,k'}\) . It may seem that this implies with probability \( p_0\) , we obtain the exact estimate of \( \varphi_0\) , and correspondingly the eigenstate \( \ket{\psi_0}\) is stored in the system register. This is much better than the previous assumption that all eigenvalues \( \lambda_j\) need to be represented by a \( d\) -bit number.
Unfortunately, this analysis is not correct. In fact, for any \( \lambda_k\) that does not have an exact \( t\) -bit representation (note that \( t>d\) ), we may have \( \gamma_{k,k''}\ne 0\) and \( \widetilde{\varphi}_{k''}<\lambda_0\) , i.e., we obtain an energy estimate that is lower than the ground state energy! Therefore the probability of ending up in the state \( \ket{k''}\ket{\psi_k}\) is \( \left\lvert c_k\gamma_{k,k''}\right\rvert ^2\) , i.e., it is still possible to obtain a wrong ground state energy. This is called the aliasing effect.
We demonstrate below that if \( T\) is large enough, we can control the probability of underestimating the ground state energy. Since \( \lambda_0\) is the ground state energy, and all eigenvalues are in \( (0,1/2)\) , when \( \widetilde{\varphi}_{k'}\le \lambda_0-\epsilon\) , we have
(203)
Then the probability of under estimating the ground state energy by \( \epsilon\) is
(204)
Let \( T\epsilon=\delta'^{-1}\) with \( \delta'<1/2\) , we have
(205)
Therefore after \( M\) repetitions, we have
(206)
In order to obtain \( P\left(\min_{\ell}\widetilde{\varphi}^{(\ell)}_{k'}\le \lambda_0-\epsilon\right)<\frac{\delta}{2}\) , we need to set \( \delta'=M^{-1} \delta\) .
On the other hand, we also would like to have bound \( P\left(\min_{\ell}\widetilde{\varphi}^{(\ell)}_{k'}\ge \lambda_0+\epsilon\right)\) . To this end, we first note that when \( \widetilde{\varphi}_{k'}\ge \lambda_0+\epsilon\) , we have \( \left\lvert \widetilde{\varphi}_{k'}- \lambda_0\right\rvert _1\ge\epsilon\) . Moreover,
(207)
Here we have used the normalization condition that
(208)
Therefore
(209)
This means that
(210)
We can then take \( M=\lceil \frac{2}{p_0} \log \frac{2}{\delta} \rceil\) so that
(211)
To summarize, according to the relation \( \delta'=M^{-1} \delta\) , in order to estimate the ground state energy to precision \( \epsilon=2^{-d}\) with success probability \( 1-\delta\) , we need
(212)
ancilla qubits in QPE. The circuit depth is
(213)
Taking the number of repetitions \( M\) into account, the total cost of the method is
(214)
Remark 7 (Dependence on the initial overlap \( p_0\))
The analysis above shows that QPE has a nontrivial dependence on the initial overlap \( p_0\) , which has a rather indirect source. First, in order to reduce the possibility of overshooting, the number of repetitions \( M\) needs to be large enough and is \( \mathcal{O}(p_0^{-1})\) . However, this also increases the probability of undershooting the eigenvalue, and hence \( \delta'\) needs to be chosen to be \( \mathcal{O}(M^{-1})=\mathcal{O}(p_0)\) . This means that the circuit depth should be \( \mathcal{O}(\delta')=\mathcal{O}(p_0^{-1})\) . The total complexity is thus \( TM=\mathcal{O}(p_0^{-2})\) . Therefore, when the initial overlap \( p_0\) is small, using QPE to find the ground state energy can be very costly. On the other hand, using different techniques, the dependence on \( p_0\) can be drastically improved to \( \mathcal{O}(p_0^{-\frac12})\) [10]. See also the discussions in (8.8).
Remark 8 (QPE for preparing the ground state)
The estimate of the ground state energy does not necessarily require the energy gap \( \lambda_g:=\lambda_1-\lambda_0\) to be positive. However, if our goal is to prepare the ground state \( \ket{\psi_0}\) from an initial state \( \ket{\phi}\) using QPE, then we need stronger assumptions. In particular, we cannot afford to obtain \( \ket{k'}\ket{\psi_k}\) , where \( \left\lvert \widetilde{\varphi}_{k'}-\lambda_0\right\rvert <\epsilon\) but \( k\ne 0\) . This at least requires \( \epsilon=2^{-d}<\lambda_g\) , and introduces a natural dependence on the inverse of the gap.
Through the analysis above, we see that although the analysis of QPE is very clean when 1) all eigenvalues (properly scaled to be represented as phase factors) are exactly given by \( d\) -bit numbers 2) the input vector is an eigenstate, the analysis can become rather complicated and tedious when such conditions are relaxed. Such difficulty does not merely show up at the theoretical level, but can seriously impact the robust performance of QPE in practical applications. To simplify the discussion of the applications below, we will be much more cavalier about the usage of QPE and assume all eigenvalues are exactly represented by \( d\) -bit numbers whenever necessary. But we should keep such caveats in mind. Furthermore, when we move beyond QPE, the issue of having exact \( d\) -bit numbers will become much less severe in techniques based on quantum signal processing, i.e., quantum eigenvalue transformation (QET) and quantum singular value transformation (QSVT).
Let \(\ket{\psi_0}\) be prepared by an oracle \( U_{\psi_0}\) , i.e., \( U_{\psi_0}\ket{0^n}=\ket{\psi_0}\) . We have the knowledge that
(215)
and \( p_0\ll 1\) . Here \( \ket{\psi_{\mathrm{bad}}}\) is an orthogonal state to the desired state \( \ket{\psi_{\mathrm{good}}}\) , and \( \sqrt{p_0}=\sin\frac{\theta}{2}\) . The problem of amplitude estimation is to estimate \( p_0\) to \( \epsilon\) -precision. If \( p_0\) is away from \( 0\) or \( 1\) and is to be estimated directly from the Monte Carlo method, the number of samples needed is \( \mathcal{N}=\mathcal{O}(\epsilon^{-2})\) .
Let \( G=R_{\psi_0}R_{\mathrm{good}}\) be the Grover operator as in (3.3). Then in the basis \( \mathcal{B}=\{\ket{\psi_{\mathrm{good}}},\ket{\psi_{\mathrm{bad}}}\}\) , the subspace \( \mathcal{H}= \operatorname{span} B \) is an invariant subspace of \( G\) . Recall the computation of (112), the matrix representation is
(216)
Its two eigenstates are
(217)
with eigenvalues \( e^{\pm \mathrm{i} \theta}\) , respectively.
Therefore the problem of estimating \( \theta\) can be solved with phase estimation with an imperfect initial state. Note that
(218)
Consider a QPE circuit with \( t\) ancilla qubits and querying \( G\) for \( T=2^t\) times. Then each execution with the system register will be in \( \ket{\psi_+}\) or \( \ket{\psi_{-}}\) states each with probability \( 0.5\) . Let
(219)
be the number of ancilla qubits with \( \epsilon'=2^{-d}\) . Then QPE obtains an estimate denoted by \( \widetilde{\theta}\) , which approximates \( \theta\) to precision \( \epsilon'\) with success probability \( 1-\delta\) . Note that \( p_0=\sin^2\frac{\theta}{2}\) , and
(220)
Using the fact that \( \left\lvert \sin (\widetilde{\theta}-\theta)\right\rvert \le \left\lvert \widetilde{\theta}-\theta\right\rvert \le \epsilon'\) , we have
(221)
Let \( \epsilon'\) be sufficiently small. Now if \( p_0(1-p_0)=\Omega(1)\) , we can choose \( \epsilon'=\mathcal{O}(\epsilon)\) , and the total complexity of QPE is \( \mathcal{O}(\epsilon^{-1})\) .
If \( p_0\) is small, then we should estimate \( p_0\) to multiplicative accuracy \( \epsilon\) instead. Use
(222)
we have \( \epsilon'=\sqrt{p_0}\epsilon\) . Therefore the runtime of QPE is \( \mathcal{O}(p_0^{-\frac12}\epsilon^{-1})\) . If \( p_0\) is to be estimated to precision \( \epsilon'\) using the Monte Carlo method, the number of samples would be \( \mathcal{N}=\mathcal{O}(p_0^{-1}\epsilon^{-2})\) .
So, the amplitude estimation method achieves quadratic speedup in the total complexity, but the circuit depth is increased to \( \mathcal{O}(\epsilon'^{-1}\delta^{-1})\) .
Example 30 (Amplitude estimation to accelerate Hadamard test)
Consider the circuit for the Hadamard test in (6) to estimate \( \operatorname{Re}\braket{\psi|U|\psi}\) . Let the initial state \( \ket{\psi}\) be prepared by a unitary \( U_{\psi}\) , then the following combined circuit
maps \( \ket{0}\ket{0^n}\) to
(223)
and the goal is to estimate \( p_0\) . This also gives the implementation of the reflector \( R_{\psi_0}\) .
Note that \( R_{\mathrm{good}}\) can be implemented by simply reflecting against the signal qubit, i.e.,
(224)
Then we can run QPE to the Grover unitary \( G=R_{\psi_0}R_{\mathrm{good}}\) to estimate \( p(0)\) , and the circuit depth is \( \mathcal{O}(\epsilon^{-1})\) .
In this section, we consider the solution of linear systems of equations
(225)
where \( A\in \mathbb{C}^{N\times N}\) is a non-singular matrix. Without loss of generality we assume \( A\) is Hermitian. Otherwise, we can solve a dilated Hermitian matrix, which enlarges the matrix dimension by a factor of \( 2\) (i.e., use one ancilla qubit)
(226)
and solve the enlarged problem
(227)
We assume \( b\in\mathbb{C}^N\) is a normalized vector and hence can be stored in a quantum state. More specifically, we have a unitary \( U_b\) such that (may require some work registers)
(228)
On classical computers, the solution is simply \( x=A^{-1}b\) . However, the solution vector is in general not a normalized vector, and hence cannot be directly stored as a quantum state. Therefore the goal of solving the quantum linear system problem (QLSP) is to find a quantum state \( \ket{\widetilde{x}}\) so that
(229)
The normalization constant \( \left\lVert A^{-1}\ket{b}\right\rVert \) should be recovered separately.
One useful application of QLSP solvers is to evaluate the many-body Green’s function [11], based on the quantum many-body Hamiltonian in (29). We will omit the details here.
The HHL algorithm [12], based on QPE, is the first quantum algorithm for solving QLSP. The algorithm can be summarized as follows. Let \( A\) have the following eigendecomposition
(230)
To simplify the analysis, we assume \( 0< \lambda_0\le \lambda_1\le …\le \lambda_{N-1}< 1\) and all eigenvalues have an exact \( d\) -bit representation. The analysis can also be generalized to the case when \( A\) has negative eigenvalues, but the interpretation of the result from QPE needs to be modified accordingly.
The matrix \( A\) can be queried using Hamiltonian simulation as \( U=e^{\mathrm{i} 2\pi A}\) . Then if the input state is already one of the eigenvectors, i.e., \(\ket{b}=\ket{v_j}\), then QPE can be applied to implement the mapping
(231)
In general, let the input state \(|b\rangle=\sum_{j} \beta_{j}\left|v_{j}\right\rangle\) be expanded using the eigendecomposition of \( A\) . Then by linearity,
(232)
Note that the unnormalized solution satisfies
(233)
so all we need to do is to use the information of the eigenvalue \( \ket{\lambda_j}\) stored in the ancilla register, and perform a controlled rotation to multiply the factor \( \lambda_j^{-1}\) to each \( \beta_j\) . To this end, we see that it is crucial to store all eigenvalues in the quantum computer coherently, as achieved by QPE. We would like to implement the following controlled rotation unitary (see (5.3.2))
(234)
where each \( \widetilde{\lambda}_{j}\) approximates \( \lambda_j\) .
Finally, perform uncomputation by applying \( U^{†}_{\mathrm{QPE}}\) , we convert the information from the ancilla register \( \ket{\lambda_j}\) back to \( \ket{0^d}\) . Therefore the quantum circuit for the HHL algorithm is in (14).
Note that through the uncomputation, the \( d\) ancilla qubits for storing the eigenvalues also becomes a working register. Discarding all working registers, the resulting unitary denoted by \( U_{\mathrm{HHL}}\) satisfies
(235)
Finally, measuring the signal qubit (the only ancilla qubit left), if the outcome is \( 1\) , we obtain the (unnormalized) vector
(236)
stored as a normalized state in the system register is
(237)
which is the desired approximate solution to QLSP. In particular, the constant \( C\) does not appear in the solution.
Remark 9 (Recovering the norm of the solution)
The HHL algorithm returns the solution to QLSP in the form of a normalized state \( \ket{x}\) stored in the quantum computer. In order to recover the magnitude of the unnormalized solution \( \left\lVert \widetilde{x}\right\rVert \) , we note that the success probability of measuring the signal qubit in (235) is
(238)
Therefore sufficient repetitions of running the circuit in (235) and estimate \( p(1)\) , we can obtain an estimate of \( \left\lVert \widetilde{x}\right\rVert \) .
More general discussion of the readout problem of the HHL algorithm will be given in (12).
Proposition 3 (Controlled rotation given rotation angles)
Let \(0\le\theta<1\) has exact \( d\) -bit fixed point representation \(\theta=.\theta_{d-1}…\theta_0\) be its \(d\)-bit fixed point representation. Then there is a \( (d+1)\) -qubit unitary \(U_{\theta}\) such that
(239)
Proof
First (by e.g. Taylor expansion)
(240)
Here \( R_y(\cdot)\) perform a single-qubit rotation around the \( y\) -axis. For any \( j\in [2^d]\) with its binary representation \( j=j_{d-1}… j_0\) , we have
(241)
So choose \( \tau=\pi(.j_{d-1}… j_0)\) , and define
(242)
Applying \(U_{\theta}\) to \(\ket{0}\ket{\theta}\) gives the desired results.
The quantum circuit for the controlled rotation circuit is
This is a sequence of single-qubit rotations on the signal qubit, each controlled by a single qubit.
In order to use the controlled rotation operation, we need to store the information of \( \lambda_j\) in term of an angle \( \theta_j\) . Let \( C>0\) be a lower bound to \( \lambda_0\) , so that \( 0<C/\lambda_j<1\) for all \( j\) . Define
(243)
and \( \widetilde{\theta}_j\) be its \( d'\) -bit representation. Then
(244)
Again for simplicity we assume \( d'\) is large enough so that the error of the fixed point representation is negligible in this step. The mapping
(245)
can be implemented using classical arithmetics circuits in (2.8), which may require \( \operatorname{poly}(d')\) gates and an additional working register of \( \operatorname{poly}(d')\) qubits, which are not displayed here. Therefore the entire controlled rotation operation needed for the HHL algorithm is given by the circuit in (15).
Therefore through the uncomputation \( U^{†}_{\mathrm{angle}}\) , the \( d'-d\) ancilla qubits also become a working register. Discard the working register, and we obtain a unitary \( U_{\mathrm{CR}}\) satisfying
(246)
This is the unitary used in the HHL circuit in (14).
Although the choice of the constant \( C\) does not appear in the normalized quantum state, it does directly affect the success probability. From (235) we immediately obtain the success probability for measuring the signal qubit with outcome \( 1\) is the square of the norm of the unnormalized solution
(247)
Therefore the success probability is determined by
To maximize the success probability, \( C\) should be chosen to be as large as possible (without exceeding \( \lambda_0\) ). So assuming the exact knowledge of \( \lambda_0\) , we can choose \( C=\lambda_0\) . For a Hermitian positive definite matrix \( A\) , \( \left\lVert A\right\rVert =\lambda_{N-1}\) , and \( \left\lVert A^{-1}\right\rVert =\lambda_0^{-1}\) . For simplicity, assume the largest eigenvalue of \( A\) is \( \lambda_{N-1}=1\) . Then the condition number of \( A\) is
(248)
Furthermore,
(249)
Therefore
(250)
In other words, in the worst case, we need to repeatedly run the HHL algorithm for \( \mathcal{O}(\kappa^2)\) times to obtain the outcome \( 1\) in the signal qubit.
Assuming the number of system qubits \( n\) is large, the circuit depth and the gate complexity of \( U_{\mathrm{HHL}}\) is mainly determined by those of \( U_{\mathrm{QPE}}\) . Therefore we can measure the complexity of the HHL algorithm in terms of the number of queries to \( U=e^{\mathrm{i} 2\pi A}\) . In order to solve QLSP to precision \( \epsilon\) , we need to estimate the eigenvalues to multiplicative accuracy \( \epsilon\) instead of the standard additive accuracy.
To see why this is the case, assume \( \widetilde{\lambda}_j=\lambda_j(1+e_j)\) and \( \left\lvert e_j\right\rvert \le \frac{\epsilon}{4}\le \frac{1}{2}\) . Then the unnormalized solution satisfies
(251)
Hence
(252)
Then the normalized solution satisfies
(253)
The discussion requires the QPE to be run to additive precision \( \epsilon'=\lambda_0 \epsilon=\epsilon/\kappa\) . Therefore the query complexity of QPE is \( \mathcal{O}(\kappa/\epsilon)\) . Counting the number of times needed to repeat the HHL circuit, the worst case query complexity of the HHL algorithm is \( \mathcal{O}(\kappa^3/\epsilon)\) .
The above analysis the worst case analysis, because we assume \( p(1)\) attains the lower bound \( \Omega(\kappa^{-2})\) . In practical applications, the result may not be so pessimistic. For instance, if \( \beta_j\) concentrates around the smallest eigenvalues of \( A\) , then we may have \( \left\lVert x\right\rVert \sim \Theta(\lambda_0^{-1})=\Theta(\kappa^{-1})\) . Then \( p(1)=\Theta(1)\) . In such a case, we only need to repeat the HHL algorithm for a constant number of times to yield outcome \( 1\) in the ancilla qubit. This does not reduce the query complexity of each run of the algorithm. Then in this best case, the query complexity is \( \mathcal{O}(\kappa/\epsilon)\) .
Below we discuss a few more aspects of the HHL algorithm. The first observation is that the asymptotic worst-case complexity of the HHL algorithm can be generally improved using amplitude amplification.
Remark 10 (HHL with amplitude amplification)
We may view (235) as
(254)
Since \( \ket{\psi_{\mathrm{good}}}\) is marked by a single signal qubit, we may use (16) to construct a reflection operator with respect to the signal qubit. This is simply given by
(255)
The reflection with respect to the initial vector is
(256)
where \( U_{\psi_0}=U_{\mathrm{HHL}}(I_1\otimes U_b)\) . Let \( G=R_{\psi_0}R_{\mathrm{good}}\) be the Grover iterate. Then amplitude amplification allows us to apply \( G\) for \( \Theta(p(1)^{-\frac12})\) times to boost the success probability of obtaining \( \ket{\psi_{\mathrm{good}}}\) with constant success probability. Therefore in the worst case when \( p(1)=\Theta(\kappa^{-2})\) , the number of repetitions is reduced to \( \mathcal{O}(\kappa)\) , and the total runtime is \( \mathcal{O}(\kappa^2/\epsilon)\) . This query complexity is the commonly referred query complexity for the HHL algorithm. Note that as usual, amplitude amplification increases the circuit depth. However, the tradeoff is that the circuit depth increases from \( \mathcal{O}(\kappa/\epsilon)\) to \( \mathcal{O}(\kappa^2/\epsilon)\) .
So far our analysis, especially that based on QPE relies on the assumption that all \( \lambda_j\) all eigenvalues have an exact \( d\) -bit representation. From the discussion in (4.5), we know that such an assumption is unrealistic and causes theoretical and practical difficulties. The full analysis of the HHL algorithm is thus more involved. We refer to e.g. [13] for more details.
Remark 11 (Comparison with classical iterative linear system solvers)
Let us now compare the cost of the HHL algorithm to that of classical iterative algorithms. If \( A\) is \( n\) -qubit Hermitian positive definite with condition number \( \kappa\) , and is \( d\) -sparse (i.e., each row/column of \( A\) has at most \( d\) nonzero entries), then each matrix vector multiplication \( Ax\) costs \( \mathcal{O}(dN)\) floating point operations. The number of iterations for the steepest descent (SD) algorithm is \( \mathcal{O}(\kappa\log \epsilon^{-1})\) , and the this number can be significantly reduced to \( \mathcal{O}(\sqrt{\kappa}\log \epsilon^{-1})\) by the renowned conjugate gradient (CG) method. Therefore the total cost (or wall clock time) of SD and CG is \( \mathcal{O}(dN\kappa\log \epsilon^{-1})\) and \( \mathcal{O}(dN\sqrt{\kappa}\log \epsilon^{-1})\) , respectively.
On the other hand, the query complexity of the HHL algorithm, even after using the AA algorithm, is still \( \mathcal{O}(\kappa^2/\epsilon)\) . Such a performance is terrible in terms of both \( \kappa\) and \( \epsilon\) . Hence the power of the HHL algorithm, and other QLSP solvers is based on that each application of \( A\) (in this case, using the unitary \( U\) ) is much faster. In particular, if \( U\) can be implemented with \( \operatorname{poly}(n)\) gate complexity (also can be measured by the wall clock time), then the total gate complexity of the HHL algorithm (with AA) is \( \mathcal{O}(\operatorname{poly}(n)\kappa^2/\epsilon)\) . When \( n\) is large enough, we expect that \( \operatorname{poly}(n)\ll N=2^n\) and the HHL algorithm would eventually yield an advantage. Nonetheless, for a realistic problem, the assumption that \( U\) can be implemented with \( \operatorname{poly}(n)\) cost, and no classical algorithm can implement \( Ax\) with \( \operatorname{poly}(n)\) cost should be taken with a grain of salt and carefully examined.
Remark 12 (Readout problem of QLSP)
By solving the QLSP, the solution is stored as a quantum state in the quantum computer. Sometimes the QLSP is only a subproblem of a larger application, so it is sufficient to treat the HHL algorithm (or other QLSP solvers) as a “quantum subroutine”, and leave \( \ket{x}\) in the quantum computer. However, in many applications (such as the solution of Poisson’s equation in (5.4), the goal is to solve the lienar system. Then the information in \( \ket{x}\) must be converted to a measurable classical output.
The most common case is to compute the expectation of some observable \( \braket{O}=\braket{x|O|x}\approx \braket{\widetilde{x}|O|\widetilde{x}}\) . Assuming \( \braket{O}=\Theta(1)\) . Then to reach additive precision \( \epsilon\) of the observable, the number of samples needed is \( \mathcal{O}(\epsilon^{-2})\) . On the other hand, in order to reach precision \( \epsilon\) , the solution vector \( \ket{\widetilde{x}}\) must be solved to precision \( \epsilon\) . Assuming the worst case analysis for the HHL algorithm, the total query complexity needed is
(257)
Remark 13 (Query complexity lower bound)
The cost of a quantum algorithm for solving a generic QLSP scales at least as \( \Omega(\kappa(A))\) , where \( \kappa(A):=\left\lVert A\right\rVert \left\lVert A^{-1}\right\rVert \) is the condition number of \( A\) . The proof is based on converting the QLSP into a Hamiltonian simulation problem, and the lower bound with respect to \( \kappa\) is proved via the “no-fast-forwarding” theorem for simulating generic Hamiltonians [12]. Nonetheless, for specific classes of Hamiltonians, it may still be possible to develop fast algorithms to overcome this lower bound.
As an application of the HHL algorithm, let us consider a toy problem of solving the Poisson’s equation in one-dimension with Dirichlet boundary conditions
(258)
We use the central finite difference formula to discretize the Laplacian operator on a uniform grid \( r_i=(i+1)h,i\in[N]\) and \( h=1/(N+1)\) . Let \( u_i=u(r_i), b_i=b(r_i)\) , then Poisson’s equation is discretized into a linear system of equations \( Au=b\) , with a tridiagonal matrix
(259)
Our goal here is not to address the quality of the spatial discretization, but to study the cost for solving the linear system. To this end we need to compute the condition number of \( A\) . We also assume the right hand vector \( b\) is already normalized so that \( \ket{b}=b\) .
Proposition 4 (Diagonalization of tridiagonal matrices)
A Hermitian Toeplitz tridiagonal matrix
(260)
can be analytically diagonalized as
(261)
where \( (v_k)_j=v_{j,k}, j=1,…,N\) , \( b=\left\lvert b\right\rvert e^{\mathrm{i} \theta}\) , and
(262)
Proof
Note that formally \( v_{0,k}=v_{N+1,k}=0\) . Then direct matrix vector multiplication shows that for any \( j=1,…,N\) ,
(263)
Using (4) with \( a=2/h^2,b=-1/h^2\) , the largest eigenvalue of \( A\) is \( \lambda_{\max}=\left\lVert A\right\rVert \approx 4/h^2\) , and the smallest eigenvalue \( \lambda_{\min}=\left\lVert A^{-1}\right\rVert ^{-1}\approx \pi^2\) . So
(264)
The circuit depth of the HHL algorithm is \( \mathcal{O}(N^2/\epsilon)\) , and the worst case query complexity (using AA) is \( \mathcal{O}(N^4/\epsilon)\) . So when \( N\) is large, there is little benefit in employing the quantum computer to solve this problem.
Let us now consider solving a \( d\) -dimensional Poisson’s equation with Dirichlet boundary conditions
(265)
The grid is the Cartesian product of the uniform grid in 1D with \( N\) grid points per dimension and \( h=1/(N+1)\) . The total number of grid points is \( \mathcal{N}=N^d\) . After discretization, we obtain a linear system \( \mathcal{A}u=b\) , where
(266)
where \( I\) is an identity matrix of size \( N\) . Since \( \mathcal{A}\) is Hermitian and positive definite, we have \( \left\lVert \mathcal{A}\right\rVert \approx 4d/h^2\) , and \( \left\lVert \mathcal{A}^{-1}\right\rVert ^{-1}\approx d\pi^2\) . So \( \kappa(\mathcal{A})\approx 4/(h^2\pi^2)\approx \kappa(A)\) . Therefore the condition number is independent of the spatial dimension \( d\) .
The worst case query complexity of the HHL algorithm is \( \mathcal{O}(N^2/\epsilon)\) . So when the number of grid points per dimension \( N\) is fixed and the spatial dimension \( d\) increases, and if \( U=e^{\mathrm{i} \mathcal{A}\tau}\) can be implemented efficiently for some \( \tau\left\lVert \mathcal{A}\right\rVert <1\) with \( \operatorname{poly}(d)\) cost, then the HHL algorithm will have an advantage over classical solvers, for which each matrix-vector multiplication scales linearly with respect to \( \mathcal{N}\) and is therefore exponential in \( d\) .
Consider the solution of a time-dependent linear differential equation
(267)
Here \( b(t),x(t)\in \mathbb{C}^d\) and \( A(t)\in \mathbb{C}^{d\times d}\) . (267) can be used to represent a very large class of ordinary differential equations (ODEs) and spatially discretized partial differential equations (PDEs). For instance, if \( A(t)=-\mathrm{i} H(t)\) for some Hermitian matrix \( H(t)\) and \( b(t)=0\) , this is the time-dependent Schrödinger equation
(268)
A special case when \( H(t)\equiv H\) is often called the Hamiltonian simulation problem, and the solution can be written as
(269)
which can be viewed as the problem of evaluating the matrix function \( e^{-\mathrm{i} HT}\) . This will be discussed separately in later chapters.
In this section we consider the general case of (267), and for simplicity discretize the equation in time using the forward Euler method with a uniform grid \( t_k=k\Delta t\) where \( \Delta t=T/N,k=0,…,N\) . Let \( A_k=A(t_k), b_k=b(t_k)\) . The resulting discretized system becomes
(270)
which can be rewritten as
(271)
or more compactly as a linear systems of equations
(272)
Here \( I\) is the identity matrix of size \( d\) , and \( \mathbf{x}\in\mathbb{R}^{Nd}\) encodes the entire history of the states.
To solve (271) as a QLSP, the right hand side \( \mathbf{b}\) needs to be a normalized vector. This means we need to properly normalize \( x_0,b_k\) so that
(273)
In the limit \( \Delta t \to 0\) , this can be written as
(274)
which is not difficult to satisfy as long as \( x_0,b(t)\) can be prepared efficiently using unitary circuits and \( \left\lVert x_0\right\rVert ,\left\lVert b(t)\right\rVert =\Theta(1)\) .
To solve (271) using the HHL algorithm, we need to estimate the condition number of \( \mathcal{A}\) . Note that \( \mathcal{A}\) is a block-bidiagonal matrix and in particular is not Hermitian. So we need to use the dilation method in (226) and solve the corresponding Hermitian problem.
In order to estimate the condition number, for simplicity we first assume \( d=1\) (i.e., this is a scalar ODE problem), and \( A(t)\equiv a\in\mathbb{C}\) is a constant. Then
(275)
Let \( \xi=1+\Delta t a\) . When \( \operatorname{Re} a\le 0\) , the absolute stability condition of the forward Euler method requires \( \left\lvert 1+\Delta t a\right\rvert =\left\lvert \xi\right\rvert <1\) . In general we are interested in the regime \( \Delta t \left\lvert a\right\rvert \ll 1\) , and in particular \( \left\lvert 1+\Delta t a\right\rvert =\left\lvert \xi\right\rvert <2\) .
Proposition 5
For any \( A\in \mathbb{C}^{N\times N}\) , \( \left\lVert A\right\rVert ^2,(1/\left\lVert A^{-1}\right\rVert )^2\) are given by the largest and smallest eigenvalue of \( A^{†}A\) , respectively.
From (5), we need to first compute
(276)
Theorem 2 (Gershgorin circle theorem, see e.g. {[14, Theorem 7.2.1]})
Let \( A\in\mathbb{C}^{N\times N}\) with entries \( a_{ij}\) . For each \( i=1,…,N\) , define
(277)
Let \( D(a_{ii},R_i)\subseteq \mathbb{C}\) be a closed disc centered at \( a_{ii}\) with radius \( R_i\) , which is called a Gershgorin disc. Then every eigenvalue of \( A\) lies within at least one of the Gershgorin discs \( D(a_{ii},R_i)\)
Since \( \mathcal{A}^{†}\mathcal{A}\) is Hermitian, we can restrict the Gershgorin discs to the real line so that \( D(a_{ii},R_i)\subseteq \mathbb{R}\) . Then Gershgorin discs of the matrix \( \mathcal{A}^{†}\mathcal{A}\) satisfy the bound
(278)
Applying (2) we have
(279)
for all values of \( a\) such that \( \left\lvert \xi\right\rvert <2\) .
To obtain a meaningful lower bound of \( \lambda_{\min}(\mathcal{A}^{†}\mathcal{A})\) , we need \( \operatorname{Re} a<0\) and hence \( \left\lvert \xi\right\rvert <1\) . Use the inequality
(280)
we have
(281)
when
(282)
is satisfied. Then we have
(283)
Then according to (2), under the condition (282),
(284)
Therefore
(285)
and
(286)
As a result, when \( (-\operatorname{Re} a)=\Theta(1)\) , the condition number satisfies
(287)
In summary, for the scalar problem \( d=1\) and \( \operatorname{Re} a<0\) , the query complexity of the HHL algorithm is \( \mathcal{O}((\Delta t)^{-2}\epsilon^{-2})\) .
Example 31 (Growth of the condition number when \( \operatorname{Re} a\ge0\))
The Gershgorin circle theorem does not provide a meaningful bound of the condition number when \( \operatorname{Re} a>0\) and \( 1<\left\lvert \xi\right\rvert <2\) . This is a correct behavior. To see this, just consider \( a=1,b=0\) , and the solution should grow exponentially as \( x(T)=e^{T}\) . If \( \kappa(\mathcal{A})=\mathcal{O}\left((\Delta t) ^{-1}\right)\) holds and in particular is independent of the final \( T\) , then the norm of the solution can only grow polynomially in \( T\) , which is a contradiction. See (16) for an illustration. Note that when \( a=-1.0\) , \( \Delta t=0.1\) , the condition number is less than \( 20\) , which is smaller than the upper bound above, i.e., \( 3\times\frac{2}{\Delta t (-a)}=60\) .
Remark 14
It may be tempting to modify the \( (N,N)\) -th entry of \( \mathcal{A}\) to be \( 1+\left\lvert \xi\right\rvert ^2\) to obtain
(288)
Here \( \mathcal{G}\) is a Toeplitz tridiagonal matrix satisfying the requirement of (4). The eigenvalues of \( \mathcal{G}\) take the form
(289)
If we take the approximation \( \lambda_{\min}(\mathcal{A}^{†}\mathcal{A})\approx \lambda_{\min}(\mathcal{G})\) , we would find that (287) holds even when \( \operatorname{Re} a>0\) . This behavior is however incorrect, despite that the matrices \( \mathcal{A}^{†}\mathcal{A}\) and \( \mathcal{G}\) only differ by a single entry!
Here we consider a general \( d>1\) , but for simplicity assume \( A(t)\equiv A\in\mathbb{C}^{d\times d}\) is a constant matrix. We also assume \( A\) is diagonalizable with eigenvalue decomposition
(290)
and \( \Lambda=\operatorname{diag}(\lambda_1,…,\lambda_N)\) . We only consider the case \( \operatorname{Re} \lambda_k< 0\) for all \( k\) .
Proposition 6
For any diagonalizable \( A\in\mathbb{C}^{N\times N}\) with eigenvalue decomposition \( Av_k=\lambda_k v_k\) , we have
(291)
Proof
Use the Schur form \( A=QTQ^{†}\) , where \( Q\) is an unitary matrix and \( T\) is an upper triangular matrix (see e.g. [14, Theorem 7.13]). The diagonal entries of \( T\) encodes all eigenvalues of \( A\) , and the eigenvalues can appear in any order along the diagonal of \( T\) . The proposition follows by arranging the eigenvalue of \( A\) with the smallest and largest absolute values to the \( (N,N)\) -th entry, respectively.
The absolute stability condition of the forward Euler method requires \( \Delta t \left\lVert A\right\rVert <1\) , and we are interested in the regime \( \Delta t \left\lVert A\right\rVert \ll 1\) . Therefore \( \Delta t \left\lvert \lambda_k\right\rvert \ll 1\) for all \( k\) .
Let \( I\) be an identity matrix of size \( d\) , and denote by \( B=-(I+\Delta t A)\) , then
(292)
Note that
(293)
For any \( \mathbf{x}\in\mathbb{C}^{Nd}\) ,
(294)
So \( \lambda_{\max}(\mathcal{A}^{†}\mathcal{A})\le 15\) , and
(295)
To bound \( \left\lVert \mathcal{A}^{-1}\right\rVert \) , we first note that from the eigenvalue decomposition of \( A\) , we have
(296)
Hence
(297)
Here \( \kappa(V)=\left\lVert V\right\rVert \left\lVert V^{-1}\right\rVert \) is the condition number of the eigenvector matrix
(298)
This reduces the problem to the scalar case. Let \( \xi_k=1+\Delta t \lambda_k\) , and assume
(299)
Then from (286) we have
(300)
So if \( \min_{k}(-\operatorname{Re} \lambda_k)=\Theta(1)\) , we have
(301)
and the cost of the HHL algorithm is \( \mathcal{O}((\Delta t)^{-2}\epsilon^{-2}\kappa(V))\) . Hence compared to the scalar case, the condition number of the eigenvector matrix \( V\) can play an important role.
The solution of (271) means that the normalized state \( \ket{\mathbf{x}}\) is computed to precision \( \epsilon\) and stored in the quantum computer. In order to evaluate observables at the final time \( T\) , i.e., \( \braket{x(T)|O|x(T)}\) , we find that by the normalization condition, \( \left\lVert x(T)\right\rVert \) is on average \( \mathcal{O}(N^{-\frac12})=\mathcal{O}((\Delta t)^{\frac12})\) , and \( \braket{x(T)|O|x(T)}=\mathcal{O}(\Delta t)\) . Therefore instead of reaching accuracy \( \epsilon\) , the Monte Carlo procedure must reach precision \( \mathcal{O}(\epsilon\Delta t)\) . This increases the number of samples by another factor of \( \mathcal{O}((\Delta t)^{-2})\) .
There is however a simple way to overcome this problem. Instead of solving (271), we can redefine \( \mathbf{x}\) by artificially padding the vector with \( N\) copies of the final state \( x_N\) . This can be written as and can be reinterpreted as
(302)
with the unnormalized vector
(303)
and the corresponding linear systems of equation becomes
(304)
Note that the solution vector only requires one ancilla qubit, after solving the equation The condition number of this modified equation is still \( \kappa=\mathcal{O}((\Delta t)^{-1})\) , so the total query complexity is \( \mathcal{O}((\Delta t)^{-2}\epsilon^{-2})\) . By solving the modified equation (304) to precision \( \epsilon\) , we can estimate \( \braket{x(T)|O|x(T)}\) by
(305)
of which the magnitude does not scale with \( \Delta t\) . Here \( I\) is the identity matrix of size \( N\) , and \( \mathbf{y}\) can be obtained by measuring the ancilla qubit and obtain \( 1\) . If the norm of \( \left\lVert x(t)\right\rVert \) is comparable for all \( t\in[0,T]\) , then the success probability will be \( \Theta(1)\) after \( \ket{\mathbf{X}}\) is obtained.
As an application of the differential equation solver in 5.5, let us consider a toy problem of solving the heat equation in one-dimension with Dirichlet boundary conditions
(306)
After spatial discretization using the central finite difference method with \( d\) grid points, this becomes a linear ODE system
(307)
where \( A\in \mathbb{R}^{N\times N}\) is a tridiagonal matrix given by (259). After applying the forward Euler method and discretize the simulation time \( T\) into \( L\) intervals with \( \Delta t=T/L\) , we obtain a linear system of size \( NL\) . The eigenvalues of \( -A\) , denoted by \( \lambda_k\) , are all negative and satisfy
(308)
The absolute stability condition requires \( \left\lvert 1+\Delta t \lambda_k\right\rvert <1\) , or \( \Delta t<h^2/4=\mathcal{O}(N^{-2})\) , which implies \( L=\mathcal{O}(N^2)\) . Since \( A\) is Hermitian, we have \( \kappa(V)=1\) . So for \( T=\mathcal{O}(1)\) , the query complexity for solving the heat equation is \( \mathcal{O}(N^2\epsilon^{-2})\) , which is the same as solving Poisson’s equation.
Again, the potential advantage of the quantum solver only appears when solving the \( d\) -dimensional heat equation
(309)
This can be written as a linear system of equations
(310)
where \( \mathcal{A}\) is given in (266). The eigenvalues of \( \mathcal{A}\) are all negative. Note that \( \left\lVert \mathcal{A}\right\rVert =\Theta(dN^2)\) , then \( h=\mathcal{O}(d^{-1}N^{-2})\) , and the query complexity of the HHL solver is \( \mathcal{O}(dN^2 \epsilon^{-2})\) . This could potentially have an exponential advantage over classical solvers.
Exercise 21 (Quantum counting)
Given query access to a function \( f:\{0,1\}^N \rightarrow \{0,1\}\) design a quantum algorithm that computes the size of its kernel, i.e.,, total number of \( x\) ’s that satisfy \( f(x)=1\) .
Exercise 22
Consider the initial value problem of the linear differential equation (267).
Construct the linear system of equations
like (271) using the backward Euler method.
The Hamiltonian simulation problem with a time-independent Hamiltonian, or the Hamiltonian simulation problem for short is the following problem: given an initial state \( \ket{\psi_0}\) and a Hamiltonian \( H\) , evaluate the quantum state at time \( t\) according to \( \ket{\psi(t)}=e^{-\mathrm{i} t H}\ket{\psi_0}\) . Hamiltonian simulation is of immense importance in characterizing quantum dynamics for a diverse range of systems and situations in quantum physics, chemistry and materials science. Simulation of one quantum Hamiltonian by another quantum system was also one of the motivations of Feynman’s 1982 proposal for design of quantum computers [15]. We have also seen that Hamiltonian simulation appears as a quantum subroutine in numerous other quantum algorithms, such as QPE and its various applications.
The Hamiltonian simulation problem can also be viewed as a linear ODE:
(311)
However, thanks to the unitarity of the operator \( e^{-\mathrm{i} t H}\) for any \( t\) , we do not need to store the full history of the quantum states as in (5.5), and can instead focus on the quantum state at time \( t\) of interest.
Following the conceptualization of a universal quantum simulator using a Trotter decomposition of the time evolution operator \( e^{-\mathrm{i} tH}\) [16], many new quantum algorithms for Hamiltonian simulation have been proposed. We will discuss some more advanced methods in later chapters. This chapter focuses on the Trotter based Hamiltonian simulation method (also called the product formula).
Consider the Hamiltonian simulation problem for \( H=H_1+H_2\) , where \( e^{-\mathrm{i} H_1 \Delta t}\) and \( e^{-\mathrm{i} H_2 \Delta t}\) can be efficiently computed at least for some \( \Delta t\) . In general \( [H_1,H_2]\ne 0\) , and the splitting of the evolution of \( H_1,H_2\) needs to be implemented via the Lie product formula
(312)
When taking \( L\) to be a finite number, and let \( \Delta t=t/L\) , this gives the simplest first order Trotter method with
(313)
Therefore to perform Hamiltonian simulation to time \( t\) , the error is
(314)
So to reach precision \( \epsilon\) in the operator norm, we need
(315)
This can be improved to the second order Trotter method (also called the symmetric Trotter splitting, or Strang splitting)
(316)
Following a similar analysis to the first order method, we find that to reach precision \( \epsilon\) we need
(317)
Higher order Trotter methods are also available, such as the \( p\) -th order Suzuki formula. The local truncation error is \( (\Delta t)^{p+1}\) . Therefore to reach precision \( \epsilon\) , we need
(318)
This is often written as \( L=\mathcal{O}(t^{1+o(1)}\epsilon^{-o(1)})\) as \( p\to \infty\) .
Example 32 (Simulating transverse field Ising model)
For the one dimensional transverse field Ising model (TFIM) with nearest neighbor interaction in (195), wince all Pauli-\( Z_i\) operators commute, we have
(319)
Each \( e^{\mathrm{i} t Z_i Z_{i+1}}\) is a rotation involving only the qubits \( i,j\) , and the splitting has no error. Similarly
(320)
and each \( e^{\mathrm{i} t g X_i}\) can be implemented independently without error.
Example 33 (Particle in a potential)
Let \( H=-\Delta_ř+V(\mathbf{r})=H_1+H_2\) be the Hamiltonian of a particle in a potential field \( V(\mathbf{r})\) , where \( \mathbf{r}\in \Omega=[0,1]^d\) with periodic boundary conditions. After discretization using Fourier modes, \( e^{\mathrm{i} H_1t}\) can be efficiently performed by diagonalizing \( H_1\) in the Fourier space, and \( e^{\mathrm{i} H_2t}\) can be efficiently performed since \( V(\mathbf{r})\) is diagonal in the real space.
In this section we try to refine the error bounds in (313) by evaluating the preconstant explicitly. For simplicity we only focus on the first order Trotter formula. The Trotter propagator \( \widetilde{U}(t)=e^{-\mathrm{i} t H_1}e^{-\mathrm{i} t H_2}\) satisfies the equation
(321)
with initial condition \( \widetilde{U}(0)=I\) . By Duhamel’s principle, and let \( U(t)=e^{-\mathrm{i} t H}\) , we have
(322)
So we have
(323)
Now consider \( G(t)=[e^{-\mathrm{i} t H_1},H_2]e^{\mathrm{i} t H_1}=e^{-\mathrm{i} t H_1}H_2e^{\mathrm{i} t H_1}-H_2\) , which satisfies \( G(0)=0\) and
(324)
Hence
(325)
Plugging this back to (323), we have
(326)
In the last equality, we have used the relation \( \left\lVert [H_1,H_2]\right\rVert \le 2\nu^2\) with \( \nu=\max\{\left\lVert H_1\right\rVert ,\left\lVert H_2\right\rVert \}\) . Therefore (313) can be replaced by a sharper inequality
(327)
Here the first inequality is called the commutator norm error estimate, and the second inequality the operator norm error estimate.
For the transverse field Ising model with nearest neighbor interaction, we have \( \left\lVert H_1\right\rVert ,\left\lVert H_2\right\rVert =\mathcal{O}(n)\) , and hence \( \nu^2=\mathcal{O}(n^2)\) . On the other hand, since \( [Z_iZ_j,X_k]\ne 0\) only if \( k=i\) or \( k=j\) , the commutator bound satisfies \( \left\lVert [H_1,H_2]\right\rVert =\mathcal{O}(n)\) . Therefore to reach precision \( \epsilon\) , the scaling of the total number of time steps \( L\) with respect to the system size is \( \mathcal{O}(n^2/\epsilon)\) according to the estimate based on the operator norm, but is only \( \mathcal{O}(n/\epsilon)\) according to that based on the commutator norm.
For the particle in a potential, for simplicity consider \( d=1\) and the domain \( \Omega=[0,1]\) is discretized using a uniform grid of size \( N\) . For smooth and bounded potential, we have \( \left\lVert H_1\right\rVert =\mathcal{O}(N^2)\) , and \( \left\lVert V\right\rVert =\mathcal{O}(1)\) . Therefore the operator norm bound gives \( \nu^2=\mathcal{O}(N^4)\) . This is too pessimistic. Reexamining the second inequality of (327) shows that in this case, the error bound should be \( \mathcal{O}((\Delta t)^2\nu)\) instead of \( (\Delta t)^2\nu^2\) . So according to the operator norm error estimate, we have \( L=\mathcal{O}(N^2/\epsilon)\) . On the other hand, in the continuous space, for any smooth function \( \psi(r)\) , we have
(328)
So
(329)
Here we have used that \( \left\lVert V'\right\rVert =\left\lVert V''\right\rVert =\mathcal{O}(1)\) , and \( \left\lVert \psi'\right\rVert =\mathcal{O}(N)\) in the worst case scenario. Therefore \( \left\lVert [H_1,H_2]\right\rVert =\mathcal{O}(N)\) , and we obtain a significantly improved estimate \( L=\mathcal{O}(N/\epsilon)\) according to the commutator norm.
The commutator scaling of the Trotter error is an important feature of the method. We refer readers to [17] for analysis of the second order Trotter method, and [18, 19] for the analysis of the commutator scaling of high order Trotter methods.
Remark 15 (Vector norm bound)
The Hamiltonian simulation problem of interest in practice often concerns the solution with particular types of initial conditions, instead of arbitrary initial conditions. Therefore the operator norm bound in (327) can still be too loose. Taking the initial condition into account, we readily obtain
(330)
For the example of the particle in a potential, we have
(331)
Therefore if we are given the a priori knowledge that \( \max_{0\le s\le t}\left\lVert \psi'(s)\right\rVert =\mathcal{O}(1)\) , we may even have \( L=\mathcal{O}(\epsilon^{-1})\) , i.e., the number of time steps is independent of \( N\) .
Exercise 23
Consider the Hamiltonian simulation problem for \( H = H_1 + H_2 + H_3\) . Show that the first order Trotter formula
has a commutator type error bound.
Exercise 24
Consider the time-dependent Hamiltonian simulation problem for the following controlled Hamiltonian
where \( a(t)\) and \( b(t)\) are smooth functions bounded together with all derivatives. We focus on the following Trotter type splitting, defined as
where the intervals \( [t_j, t_{j+1}]\) are equidistant and of length \( \Delta t\) on the interval \( [0, t]\) with \( t_n = t\) . Show that this method has first-order accuracy, but does not exhibit a commutator type error bound in general.
In order to perform matrix computations, we must first address the problem of the input model: how to get access to information in a matrix \( A\in\mathbb{C}^{N\times N}\) (\( N=2^n\) ) which is generally a non-unitary matrix, into the quantum computer? One possible input model is given via the unitary \( e^{\mathrm{i} \tau A}\) (if \( A\) is not Hermitian, in some scenarios we can consider its Hermitian version via the dilation method). This is particularly useful when \( e^{\mathrm{i} \tau A}\) can be constructed using simple circuits, e.g. Trotter splitting.
A more general input model, as will be discussed in this chapter, is called “block encoding”. Of course, if \( A\) is a dense matrix without obvious structures, any input model will be very expensive (e.g. exponential in \( n\) ) to implement. Therefore a commonly assumed input model is \( s\) -sparse, i.e., there are at most \( s\) nonzero entries in each row / column of the matrix. Furthermore, we have an efficient procedure to get access to the location, as well as the value of the nonzero entries. This in general can again be a difficult task given that the number of nonzero entries can still be exponential in \( n\) for a sparse matrix. Some dense matrices may also be efficiently block encoded on quantum computers. This chapter will illustrate the block encoding procedure for via a number of detailed examples.
The query model for sparse matrices is based on certain quantum oracles. In some scenarios, these quantum oracles can be implemented all the way to the elementary gate level.
Throughout the discussion we assume \( A\) is an \( n\) -qubit, square matrix, and
(332)
If the \( \left\lVert A\right\rVert _{\max}\ge1\) , we can simply consider the rescaled matrix \( \widetilde{A}/\alpha\) for some \( \alpha>\left\lVert A\right\rVert _{\max}\) .
To query the entries of a matrix, the desired oracle takes the following general form
(333)
In other words, given \( i,j\in[N]\) and a signal qubit \( 0\) , \( O_A\) performs a controlled rotation (controlling on \( i,j\) ) of the signal qubit, which encodes the information in terms of amplitude of \( \ket{0}\) .
However, the classical information in \( A\) is usually not stored natively in terms of such an oracle \( O_A\) . Sometimes it is more natural to assume that there is an oracle
(334)
where \( \widetilde{A}_{ij}\) is a \( d'\) -bit fixed point representation of \( A_{ij}\) , and the value of \( \widetilde{A}_{ij}\) is either computed on-the-fly with a quantum computer, or obtained through an external database. In either case, the implementation of \( \widetilde{O}_A\) may be challenging, and we will only consider the query complexity with respect to this oracle.
Using classical arithmetic operations, we can convert this oracle into an oracle
(335)
where \( 0\le \widetilde{\theta}_{ij}< 1\) , and \( \widetilde{\theta}_{ij}\) is a \( d\) -bit representation of \( \theta_{ij}=\arccos(A_{ij})/\pi\) . This step may require some additional work registers not shown here.
Now using the controlled rotation in (3), the information of \( \widetilde{A}_{ij}\) , \( \widetilde{\theta}_{ij}\) has now been transferred to the phase of the signal qubit. We should then perform uncomputation and free the work register storing such intermediate information \( \widetilde{A}_{ij}\) , \( \widetilde{\theta}_{ij}\) . The procedure is as follows
(336)
From now on, we will always assume that the matrix entries of \( A\) can be queried using the phase oracle \( O_A\) or its variants.
The simplest example of block encoding is the following: assume we can find a \( (n+1)\) -qubit unitary matrix \( U\) (i.e., \( U\in\mathbb{C}^{2N\times 2N}\) ) such that
where \( *\) means that the corresponding matrix entries are irrelevant, then for any \( n\) -qubit quantum state \( \ket{b}\) , we can consider the state
(337)
and
(338)
Here the (unnormalized) state \( \ket{\perp}\) can be written as \( \ket{1}\ket{\psi}\) for some (unnormalized) state \( \ket{\psi}\) , that is irrelevant to the computation of \( A\ket{b}\) . In particular, it satisfies the orthogonality relation.
(339)
In order to obtain \( A\ket{b}\) , we need to measure the qubit \( 0\) and only keep the state if it returns \( 0\) . This can be summarized into the following quantum circuit:
Note that the output state is normalized after the measurement takes place. The success probability of obtaining \( 0\) from the measurement can be computed as
(340)
So the missing information of norm \( \left\lVert A\ket{b}\right\rVert \) can be recovered via the success probability \( p(0)\) if needed. We find that the success probability is only determined by \( A,\ket{b}\) , and is independent of other irrelevant components of \( U_A\) .
Note that we may not need to restrict the matrix \( U_A\) to be a \( (n+1)\) -qubit matrix. If we can find any \( (n+m)\) -qubit matrix \( U_A\) so that
(341)
Here each \( *\) stands for an \( n\) -qubit matrix, and there are \( 2^m\) block rows / columns in \( U_A\) . The relation above can be written compactly using the braket notation as
(342)
A necessary condition for the existence of \( U_A\) is that \( \left\lVert A\right\rVert \le 1\) . (Note: \( \left\lVert A\right\rVert _{\max}\le 1\) does not guarantee that \( \left\lVert A\right\rVert \le 1\) , see (26)). However, if we can find sufficiently large \( \alpha\) and \( U_A\) so that
(343)
Measuring the \( m\) ancilla qubits and all \( m\) -qubits return \( 0\) , we still obtain the normalized state \( \frac{A\ket{b}}{\left\lVert A\ket{b}\right\rVert }\) . The number \( \alpha\) is hidden in the success probability:
(344)
So if \( \alpha\) is chosen to be too large, the probability of obtaining all \( 0\) ’s from the measurement can be vanishingly small.
Finally, it can be difficult to find \( U_A\) to block encode \( A\) exactly. This is not a problem, since it is sufficient if we can find \( U_A\) to block encode \( A\) up to some error \( \epsilon\) . We are now ready to give the definition of block encoding in (1).
Definition 1 (Block encoding)
Given an \( n\) -qubit matrix \( A\) , if we can find \( \alpha, \epsilon \in \mathbb{R}_+\) , and an \( (m+n)\) -qubit unitary matrix \( U_A\) so that
(345)
then \( U_A\) is called an \( (\alpha, m, \epsilon)\) -block-encoding of \( A\) . When the block encoding is exact with \( \epsilon=0\) , \( U_A\) is called an \( (\alpha, m)\) -block-encoding of \( A\) . The set of all \( (\alpha, m, \epsilon)\) -block-encoding of \( A\) is denoted by \( \operatorname{BE}_{\alpha,m}(A,\epsilon)\) , and we define \( \operatorname{BE}_{\alpha,m}(A)=\operatorname{BE}(A,0)\) .
Assume we know each matrix element of the \( n\) -qubit matrix \( A_{ij}\) , and we are given an \( (n+m)\) -qubit unitary \( U_A\) . In order to verify that \( U_A\in\operatorname{BE}_{1,m}(A)\) , we only need to verify that
(346)
and \( U_A\) applied to any vector \( \ket{0^m,b}\) can be obtained via the superposition principle.
Therefore we may first evaluate the state \( U_A\ket{0^m,j}\) , and perform inner product with \( \ket{0^m,i}\) and verify the resulting the inner product is \( A_{ij}\) . We will also use the following technique frequently. Assume \( U_A=U_B U_C\) , and then
(347)
So we can evaluate the states \( U_B^{†}\ket{0^m,i},U_C\ket{0^m,j}\) independently, and then verify the inner product is \( A_{ij}\) . Such a calculation amounts to running the circuit (18), and if the ancilla qubits are measured to be \( 0^m\) , the system qubits return the normalized state \( \sum_{i}A_{ij}\ket{i}/\left\lVert \sum_{i}A_{ij}\ket{i}\right\rVert \) .
Example 34 (\( (1,1)\) -block-encoding is general)
For any \( n\) -qubit matrix \( A\) with \( \left\lVert A\right\rVert _2\le 1\) , the singular value decomposition (SVD) of \( A\) is denoted by \( W\Sigma V^{†}\) , where all singular values in the diagonal matrix \( \Sigma\) belong to \( [0,1]\) . Then we may construct an \( (n+1)\) -qubit unitary matrix
(348)
which is a \( (1,1)\) -block-encoding of \( A\) .
Example 35 (Random circuit block encoded matrix)
In some scenarios, we may want to construct a pseudo-random non-unitary matrix on quantum computers. Note that it would be highly inefficient if we first generate a dense pseudo-random matrix \( A\) classically and then feed it into the quantum computer using e.g. quantum random-access memory (QRAM). Instead we would like to work with matrices that are inherently easy to generate on quantum computers. This inspires the random circuit based block encoding matrix (RACBEM) model [20]. Instead of first identifying \( A\) and then finding its block encoding \( U_A\) , we reverse this thought process: we first identify a unitary \( U_A\) that is easy to implement on a quantum computer, and then ask which matrix can be block encoded by \( U_A\) .
(34) shows that in principle, any matrix \( A\) with \( \left\lVert A\right\rVert _2\le 1\) can be accessed via a \( (1,1,0)\) -block-encoding. In other words, \( A\) can be block encoded by an \( (n+1)\) -qubit random unitary \( U_A\) , and \( U_A\) can be constructed using only basic one-qubit unitaries and CNOT gates. The layout of the two-qubit operations can be designed to be compatible with the coupling map of the hardware. A cartoon is shown in (19), and an example is given in (20).
Example 36 (Block encoding of a diagonal matrix)
As a special case, let us consider the block encoding of a diagonal matrix. Since the row and column indices are the same, we may simplify the oracle (333) into
(349)
In the case when the oracle \( \widetilde{O}_A\) is used, we may assume accordingly
(350)
Let \( U_A=O_A\) . Direct calculation shows that for any \( i,j\in [N]\) ,
(351)
This proves that \( U_A\in\operatorname{BE}_{1,1}(A)\) , i.e., \( U_A\) is a \( (1,1)\) -block-encoding of the diagonal matrix \( A\) .
We now give a few examples of block encodings of more general sparse matrices. We start from a \( 1\) -sparse matrix, i.e., there is only one nonzero entry in each row or column of the matrix. This means that for each \( j\in [N]\) , there is a unique \( c(j)\in[N]\) such that \( A_{c(j),j}\ne 0\) , and the mapping \( c\) is a permutation. Then there exists a unitary \( O_c\) such that
(352)
The implementation of \( O_c\) may require the usage of some work registers that are omitted here. We also have
(353)
We assume the matrix entry \( A_{c(j),j}\) can be queried via
(354)
Now we construct \( U_A=(I\otimes O_c)O_A\) , and compute
(355)
This proves that \( U_A\in\operatorname{BE}_{1,1}(A)\) .
For a more general \( s\) -sparse matrix, WLOG we assume each row and column has exactly \( s\) nonzero entries (otherwise we can always treat some zero entries as nonzeros). For each column \( j\) , the row index for the \( \ell\) -th nonzero entry is denoted by \( c(j,\ell)\equiv c_{j,\ell}\) . For simplicity, we assume that there exists a unitary \( O_c\) such that
(356)
Here we assume \( s=2^\mathfrak{s}\) and the first register is an \( \mathfrak{s}\) -qubit register. A necessary condition for this query model is that \( O_c\) is reversible, i.e., we can have \( O_c^{†}\ket{\ell}\ket{c(j,\ell)}=\ket{\ell}\ket{j}\) . This means that for each row index \( i=c(j,\ell)\) , we can recover the column index \( j\) given the value of \( \ell\) . This can be satisfied e.g.
(357)
where \( \ell_0\) is a fixed number. This corresponds to a banded matrix. This assumption is of course somewhat restrictive. We shall discuss more general query models in (7.5).
Corresponding to (356), the matrix entries can be queried via
(358)
In order to construct a unitary that encodes all row indices at the same time, we define \( D=H^{\otimes \mathfrak{s}}\) (sometimes called a diffusion operator, which is a term originated from Grover’s search) satisfying
(359)
Consider \( U_A\) given by the circuit in (21). The measurement means that to obtain \( A\ket{b}\) , the ancilla register should all return the value \( 0\) .
Proposition 7
The circuit in (21) defines \( U_A\in \operatorname{BE}_{s,\mathfrak{s}+1}(A)\) .
Proof
We call \( \ket{0}\ket{0^{\mathfrak{s}}}\ket{j}\) the source state, and \( \ket{0}\ket{0^{\mathfrak{s}}}\ket{i}\) the target state. In order to compute the inner product \( \bra{0}\bra{0^{\mathfrak{s}}}\bra{i} U_A \ket{0}\ket{0^{\mathfrak{s}}}\ket{j}\) , we apply \( D,O_A,O_c\) to the source state accordingly as
(360)
Since we are only interested in the final state when all ancilla qubits are the \( 0\) state, we may apply \( D\) to target state \( \ket{0}\ket{0^{\mathfrak{s}}}\ket{i}\) as (note that \( D\) is Hermitian)
(361)
Hence the inner product
(362)
So far we have considered general \( s\) -sparse matrices. Note that if \( A\) is a Hermitian matrix, its \( (\alpha, m, \epsilon)\) -block-encoding \( U_A\) does not need to be Hermitian. Even if \( \epsilon=0\) , we only have that the upper-left \( n\) -qubit block of \( U_A\) is Hermitian. For instance, even the block encoding of a Hermitian, diagonal matrix in (36) may not be Hermitian (exercise). On the other hand, there are indeed cases when \( U_A=U_A^{†}\) is indeed a Hermitian matrix, and hence the definition:
Definition 2 (Hermitian block encoding)
Let \( U_A\) be an \( (\alpha, m, \epsilon)\) -block-encoding of \( A\) . If \( U_A\) is also Hermitian, then it is called an \( (\alpha, m, \epsilon)\) -Hermitian-block-encoding of \( A\) . When \( \epsilon=0\) , it is called an \( (\alpha, m)\) -Hermitian-block-encoding. The set of all \( (\alpha, m, \epsilon)\) -Hermitian-block-encoding of \( A\) is denoted by \( \operatorname{HBE}_{\alpha,m}(A,\epsilon)\) , and we define \( \operatorname{HBE}_{\alpha,m}(A)=\operatorname{HBE}(A,0)\) .
The Hermitian block encoding provides the simplest scenario of the qubitization process in (8.1).
If we query the oracle (356), the assumption that for each \( \ell\) the value of \( c(j,\ell)\) is unique for all \( j\) seems unnatural for constructing general sparse matrices. So we consider an altnerative method for construct the block encoding of a general sparse matrix as below.
Again WLOG we assume that each row / column has at most \( s=2^{\mathfrak{s}}\) nonzero entries, and that we have access to the following two \( (2n)\) -qubit oracles
(363)
Here \( r(i,\ell),c(j,\ell)\) gives the \( \ell\) -th nonzero entry in the \( i\) -th row and \( j\) -th column, respectively. It should be noted that although the index \( \ell\in[s]\) , we should expand it into an \( n\) -qubit state (e.g. let \( \ell\) take the last \( \mathfrak{s}\) qubits of the \( n\) -qubit register following the binary representation of integers). The reason for such an expansion, and that we need two oracles \( O_r,O_c\) will be seen shortly.
Similar to the discussion before, we need a diffusion operator satisfying
(364)
This can be implemented using Hadamard gates as
(365)
We assume that the matrix entries are queried using the following oracle using controlled rotations
(366)
where the rotation is controlled by both row and column indices. However, if \( A_{ij}=0\) for some \( i,j\) , the rotation can be arbitrary, as there will be no contribution due to the usage of \( O_r,O_c\) .
Proposition 8
(22) defines \( U_A\in\operatorname{BE}_{s,n+1}(A)\) .
Proof
We apply the first four gate sets to the source state
(367)
We then apply \( D\) and \( O_r\) to the target state
(368)
Then the inner product gives
(369)
Here we have used that there exists a unique \( \ell\) such that \( i=c(j,\ell)\) , and a unique \( \ell'\) such that \( j=r(i,\ell')\) .
We remark that the quantum circuit in (22) is essentially the construction in [21, Lemma 48], which gives a \( (s,n+3)\) -block-encoding. The construction above slightly simplifies the procedure and saves two extra qubits (used to mark whether \( \ell\ge s\) ).
Next we consider the Hermitian block encoding of a \( s\) -sparse Hermitian matrix. Since \( A\) is Hermitian, we only need one oracle to query the location of the nonzero entries
(370)
Here \( c(j,\ell)\) gives the \( \ell\) -th nonzero entry in the \( j\) -th column. It can also be interpreted as the \( \ell\) -th nonzero entry in the \( i\) -th column. Again the first register needs to be interpreted as an \( n\) -qubit register. The diffusion operator is the same as in (365).
Unlike all discussions before, we introduce two signal qubits, and a quantum state in the computational basis takes the form \( \ket{a}\ket{i}\ket{b}\ket{j}\) , where \( a,b\in\{0,1\},i,j\in[N]\) . In other words, we may view \( \ket{a}\ket{i}\) as the first register, and \( \ket{b}\ket{j}\) as the second register. The \( (n+1)\) -qubit SWAP gate is defined as
(371)
To query matrix entries, we need access to the square root of \( A_{ij}\) as (note that act on the second single-qubit register)
(372)
The square root operation is well defined if \( A_{ij}\ge 0\) for all entries. If \( A\) has negative (or complex) entries, we first write \( A_{ij}=\left\lvert A_{ij}\right\rvert e^{\mathrm{i} \theta_{ij}},\theta_{ij}\in[0,2\pi)\) , and the square root is uniquely defined as \( \sqrt{A_{ij}}=\sqrt{\left\lvert A_{ij}\right\rvert }e^{\mathrm{i} \theta_{ij}/2}\) .
Proposition 9
(23) defines \( U_A\in\operatorname{HBE}_{s,n+2}(A)\) .
Proof
Apply the first four gate sets to the source state gives
(373)
Apply the last three gate sets to the target state
(374)
Finally, take the inner product as
(375)
In this equality, we have used that \( A\) is Hermitian: \( A_{ij}=A_{ji}^*\) , and there exists a unique \( \ell\) such that \( i=c(j,\ell)\) , as well as a unique \( \ell'\) such that \( j=c(i,\ell')\) .
The quantum circuit in (23) is essentially the construction in [22]. The relation with quantum walks will be further discussed in (8.2).
Exercise 25
Construct a query oracle \( O_A\) similar to that in (336), when \( A_{ij}\in\mathbb{C}\) with \( \left\lvert A_{ij}\right\rvert < 1\) .
Exercise 26
Let \( A\in\mathbb{C}^{N\times N}\) be a \( s\) -sparse matrix. Prove that \( \left\lVert A\right\rVert \le s \left\lVert A\right\rVert _{\max}\) . For every \( 1\le s \le N\) , provide an example that the equality can reached.
Exercise 27
Construct an \( s\) -sparse matrix so that the oracle in (356) does not exist.
Exercise 28
Let \( A\in\mathbb{C}^{N\times N}\) (\( N=2^n\) ) be a Hermitian matrix with entries on the complex unit circle \( A_{ij}=z_{ij}\) , \( |z_{ij}|=1\) .
Construct a \( 2n\) qubit block-diagonal unitary \( V\in\mathbb C^{N^2\times N^2}\) such that
Here, block-diagonal means \( (\bra{x}\otimes I)V(\ket{y}\otimes I)=0^{N\times N}\) for \( x\ne y\) .
Let \( A\) be an \( n\) -qubit Hermitian matrix. Then \( A\) has the eigenvalue decomposition
(376)
Here \( \Lambda=\operatorname{diag}(\{\lambda_i\})\) is a diagonal matrix, and \( \lambda_0\le …\le \lambda_{N-1}\) . Let the scalar function \( f\) be well defined on all \( \lambda_i\) ’s. Then the matrix function \( f(A)\) can be defined in terms of the eigendecomposition:
Definition 3 (Matrix function of Hermitian matrices)
Let \( A\in \mathbb{C}^{N\times N}\) be a Hermitian matrix with eigenvalue decomposition (376). Let \( f: \mathbb{R} \rightarrow \mathbb{\mathbb{C}}\) be a scalar function such that \( f\left(\lambda_{i}\right)\) is defined for all \( i\in[N]\) . The matrix function is defined as
(377)
where
(378)
This chapter introduces techniques to construct an efficient quantum circuit to compute \( f(A)\ket{b}\) for any state \( \ket{b}\) . Throughout the discussion we assume \( A\) is queried in the block encoding model denoted by \( U_A\) . For simplicity we assume that there is no error in the block encoding, i.e., \( U_A\in\operatorname{BE}_{\alpha,m}(A)\) , and WLOG we can take \( \alpha=1\) .
Many tasks in scientific computation can be expressed in terms of matrix functions. Here are a few examples:
A key technique for representing matrix functions is called the qubitization.
We first introduce some heuristic idea behind qubitization. For any \( -1< \lambda\le 1\) , we can consider a \( 2\times 2\) rotation matrix,
(379)
where we have performed the change of variable \( \lambda=\cos\theta\) with \( 0\le \theta<\pi\) .
Now direct computation shows
(380)
Using the definition of Chebyshev polynomials (of first and second kinds, respectively)
(381)
we have
(382)
Note that if we can somehow replace \( \lambda\) by \( A\) , we immediately obtain a \( (1,1)\) -block-encoding for the Chebyshev polynomial \( T_k(A)\) ! This is precisely what qubitization aims at achieving, though there are some small twists.
In the simplest scenario, we assume that \( U_A\in\operatorname{HBE}_{1,m}(A)\) . Start from the spectral decomposition
(383)
we have that for each eigenstate \( \ket{v_i}\) ,
(384)
Here \( \ket{\widetilde{\perp}_i}\) is an unnormalized state that is orthogonal to all states of the form \( \ket{0^m}\ket{\psi}\) , i.e.,
(385)
where
(386)
is a projection operator.
Since the right hand side of (384) is a normalized state, we may also write
(387)
where \( \ket{\perp_i}\) is a normalized state.
Now if \( \lambda_i=\pm 1\) , then \( \mathcal{H}_i=\operatorname{span}\{\ket{0^m}\ket{v_i}\}\) is already an invariant subspace of \( U_A\) , and \( \ket{\perp_i}\) can be any state. Otherwise, use the fact that \( U_A=U_A^{†}\) , we can apply \( U_A\) again to both sides of (384) and obtain
(388)
Therefore \( \mathcal{H}_i=\operatorname{span}\{\ket{0^m}\ket{v_i},\ket{\perp_i}\}\) is an invariant subspace of \( U_A\) . Furthermore, the matrix representation of \( U_A\) with respect to the basis \( \mathcal{B}_i=\{\ket{0^m}\ket{v_i},\ket{\perp_i}\}\) is
(389)
i.e., \( U_A\) restricted to \( \mathcal{H}_i\) is a reflection operator. This also leads to the name “qubitization”, which means that each eigenvector \( \ket{v_i}\) is “qubitized” into a two-dimensional space \( \mathcal{H}_i\) .
In order to construct a block encoding for \( T_k(A)\) , we need to turn \( U_A\) into a rotation. For this note that \( \mathcal{H}_i\) is also an invariant subspace for the projection operator \( \Pi\) :
(390)
Similarly define \( Z_{\Pi}=2\Pi-1\) , since
(391)
\( Z_{\Pi}\) acts as a reflection operator restricted to each subspace \( \mathcal{H}_i\) . Then \( \mathcal{H}_i\) is the invariant subspace for the iterate
(392)
and
(393)
is the desired rotation matrix. Therefore
(394)
Since \( \{\ket{0^m}\ket{v_i}\}\) spans the range of \( \Pi\) , we have
(395)
i.e., \( O^k=(U_A Z_{\Pi})^k\) is a \( (1,m)\) -block-encoding of the Chebyshev polynomial \( T_k(A)\) .
In order to implement \( Z_{\Pi}\) , note that if \( m=1\) , then \( Z_{\Pi}\) is just the Pauli \( Z\) gate. When \( m>1\) , the circuit
returns \( \ket{1}\ket{0^m}\) if \( b=0^m\) , and \( -\ket{1}\ket{b}\) if \( b\ne 0^m\) . So this precisely implements \( Z_{\Pi}\) where the signal qubit \( \ket{1}\) is used as a work register. We may also discard the signal qubit, and resulting unitary is denoted by \( Z_{\Pi}\) .
In other words, the circuit in (24) implements the operator \( O\) . Repeating the circuit \( k\) times gives the \( (1,m+1)\) -block-encoding of \( T_k(A)\) .
Remark 16 (Alternative perspectives of qubitization)
The fact that an arbitrarily large block encoding matrix \( U_A\) can be partially block diagonalized into \( N\) subblocks of size \( 2\times 2\) may seem a rather peculiar algebraic structure. In fact there are other alternative perspectives and derivations of the qubitization result. Some noticeable ones include the use of Jordan’s Lemma, and the use of the cosine-sine (CS) decomposition. Throughout this chapter and the next chapter, we will adopt the more “elementary” derivations used above.
Quantum walk is one of the major topics in quantum algorithms. Roughly speaking, there are two versions of quantum walks. The continuous time quantum walk is the closely analogous to its classical counterpart, i.e., continuous time random walk. The other version, the discrete time quantum walk, or Szegedy’s quantum walk [23] is not so obviously connected to the classical random walks. We will not introduce the motivations behind the continuous and discrete time random walks, and refer readers to [8, Chapter 16,17] for detailed discussions.
Let \( G=(V,E)\) be a graph of size \( N\) . A Markov chain (or a random walk) is given by a transition matrix \( P\) , with its entry \( P_{ij}\) denoting the probability of the transition from vertex \( i\) to vertex \( j\) . The matrix \( P\) is a stochastic matrix satisfying
(396)
Let \( \pi\) be the stationary state, which is a left eigenvector of \( P\) with eigenvalue \( 1\) :
(397)
A Markov chain is irreducible if any state can be reached from any other state in a finite number of steps. An irreducible Markov chain is aperiodic if there exists no integer greater than one that divides the length of every directed cycle of the graph. A Markov chain is ergodic if it is both irreducible and aperiodic. By the Perron–Frobenius Theorem, any ergodic Markov chain \( P\) has a unique stationary state \( \pi\) , and \( \pi_i>0\) for all \( i\) . A Markov chain is reversible if the following detailed balance condition is satisfied
(398)
Now we define the discriminant matrix associated with a Markov chain as
(399)
which is real symmetric and hence Hermitian. For a reversible Markov chain, the stationary state can be encoded as an eigenvector of \( D\) (the proof is left as an exercise).
Proposition 10 (Reversible Markov chain)
If a Markov chain is reversible, then the coherent version of the stationary state
(400)
is a normalized eigenvector of the discriminant matrix \( D\) satisfying
(401)
Furthermore, when \( \pi_i>0\) for all \( i\) , we have
(402)
Therefore the set of (left) eigenvalues of \( P\) and the set of the eigenvalues of \( D\) are the same.
Our first goal is to construct a Hermitian block encoding of \( D\) . Assume that we have access to an oracle \( O_P\) satisfying
(403)
Thanks to the stochasticity of \( P\) , the right hand side is already a normalized vector, and no additional signal qubit is needed.
We also introduce the \( n\) -qubit SWAP operator Swap operator:
(404)
which swaps the value of the two registers in the computational basis, and can be directly implemented using \( n\) two-qubit SWAP gates.
We claim that the following circuit gives \( U_D\in \operatorname{HBE}_{1,n}(D)\) .
Proposition 11
(25) defines \( U_D\in\operatorname{HBE}_{1,n}(D)\) .
Proof
Clearly \( U_D\) is unitary and Hermitian. Now we compute as before
(405)
Meanwhile
(406)
So the inner product gives
(407)
This proves the claim.
For a Markov chain defined on a graph \( G=(V,E)\) , Szegedy’s quantum walk implements a qubitization of the discriminant matrix \( D\) in (399). Let \( U_D\) be the Hermitian block encoding defined by the circuit in (25), we may readily plug it into (24), and obtain the circuit in (26) denoted by \( O_Z\) .
Let the eigendecomposition of \( D\) be denoted by
(408)
For each \( \ket{v_i}\) , the associated basis in the 2-dimensional subspace is \( \mathcal{B}_i=\{\ket{0^m}\ket{v_i},\ket{\perp_i}\}\) . Then the qubitization procedure gives
(409)
The eigenvalues of \( O_Z\) in the \( 2\times 2\) matrix block are
(410)
This relation is important for the following reasons. By (10), if a Markov chain is reversible and ergodic, the eigenvalues of \( D\) and \( P\) are the same. In particular, the largest eigenvalue of \( D\) is unique and is equal to \( 1\) , and the second largest eigenvalue of \( D\) is \( 1-\delta\) , where \( \delta>0\) is called the spectral gap. Since \( \arccos(1)=0\) , and \( \arccos(1-\delta)\approx \sqrt{2 \delta}\) , we find that the spectral gap of \( O_Z\) on the unit circle is in fact \( \mathcal{O}(\sqrt{\delta})\) instead of \( \mathcal{O}(\delta)\) . This is called the spectral gap amplification, which leads to e.g. the quadratic quantum speedup of the hitting time.
Example 37 (Determining whether there is a marked vertex in a complete graph)
Let \( G=(V,E)\) be a complete, graph of \( N=2^n\) vertices. We would like to distinguish the following two scenarios:
All vertices are the same, and the random walk is given by the transition matrix
(411)
There is one marked vertex. Without loss of generality we may assume this is the \( 0\) -th vertex (of course we do not have access to this information). In this case, the transition matrix is
(412)
In other words, in the case (2), the random walk will stop at the marked index. The transition matrix can also be written in the block partitioned form as
(413)
Here \( \widetilde{e}\) is an all \( 1\) vector of length \( N-1\) .
For the random walk defined by \( P\) , the stationary state is \( \pi=\frac{1}{N}e\) , and the spectral gap is \( 1\) . For the random walk defined by \( \widetilde{P}\) , the stationary state is \( \widetilde{\pi}=(1,0,…,0)^{\top}\) , and the spectral gap of is \( \delta=N^{-1}\) . Starting from the uniform state \( \pi\) , the probability distribution after \( k\) steps of random walk is \( \pi^{\top}\widetilde{P}^{k}\) . This converges to the stationary state of \( \widetilde{P}\) , and hence reach the marked vertex after \( \mathcal{O}(N)\) steps of walks (exercise).
These properties are also inherited by the discriminant matrices, with \( D=P\) and
(414)
To distinguish the two cases, we are given a Szegedy quantum walk operator called \( O\) , which can be either \( O_Z\) or \( \widetilde{O}_Z\) , which is associated with \( D,\widetilde{D}\) , respectively. The initial state is
(415)
Our strategy is to measure the expectation
(416)
which can be obtained via Hadamard’s test.
Before determining the value of \( k\) , first notice that if \( O=O_Z\) , then \( O_{Z}\ket{\psi_0}=\ket{\psi_0}\) . Hence \( m_k=1\) for all values of \( k\) .
On the other hand, if \( O=\widetilde{O}_Z\) , we use the fact that \( \widetilde{D}\) only has two nonzero eigenvalues \( 1\) and \( (N-1)/N=1-\delta\) , with associated eigenvectors denoted by \( \ket{\widetilde{\pi}}\) and \( \ket{\widetilde{v}}=\frac{1}{\sqrt{N-1}}(0,1,1…,1)^{\top}\) , respectively. Furthermore,
(417)
Due to qubitization, we have
(418)
where \( \ket{\perp}\) is an unnormalized state satisfying \( (\ket{0^n}\bra{0^n})\otimes I_n \ket{\perp}=0\) . Then using \( T_k(1)=1\) for all \( k\) , we have
(419)
Use the fact that \( T_k(1-\delta)=\cos(k\arccos(1-\delta))\) , in order to have \( T_k(1-\delta)\approx 0\) , the smallest \( k\) satisfies
(420)
Therefore taking \( k=\lceil\frac{\pi\sqrt{N}}{2\sqrt{2}} \rceil\) , we have \( m_k\approx 1/N\) . Running Hadamard’s test to constant accuracy allows us to distinguish the two scenarios.
Remark 17 (Without using the Hadamard test)
Alternatively, we may evaluate the success probability of obtaining \( 0^n\) in the ancilla qubits, i.e.,
(421)
When \( O=O_Z\) , we have \( p(0^n)=1\) with certainty. When \( O=\widetilde{O}_Z\) , according to (418),
(422)
So running the problem with \( k=\lceil\frac{\pi\sqrt{N}}{2\sqrt{2}} \rceil\) , we can distinguish between the two cases.
Remark 18 (Comparison with Grover’s search)
It is natural to draw comparisons between Szegedy’s quantum walk and Grover’s search. The two algorithms make queries to different oracles, and both yield quadratic speedup compared to the classical algorithms. The quantum walk is slightly weaker, since it only tells whether there is one marked vertex or not. On the other hand, Grover’s search also finds the location of the marked vertex. Both algorithms consist of repeated usage of the product of two reflectors. The number of iterations need to be carefully controlled. Indeed, choosing a polynomial degree four times as large as (420) would result in \( m_k\approx 1\) for the case with a marked vertex.
Remark 19 (Comparison with QPE)
Another possible solution of the problem of finding the marked vertex is to perform QPE on the Szegedy walk operator \( O\) (which can be \( O_Z\) or \( \widetilde{O}_Z\) ). The effectiveness of the method rests on the spectral gap amplification discussed above. We refer to [8, Chapter 17] for more details.
The quantum walk procedure can also be presented as follows. Using the \( O_P\) oracle and the multi-qubit \( \operatorname{SWAP} \) gate, we can define two set of quantum states
(423)
This defines two projection operators
(424)
from which we can define two \( 2n\) -qubit reflection operators \( R_{\Pi_l}=2 \Pi_l-I_{2n}\) . Let us write down the reflection operators more explicitly. Using the resolution of identity,
(425)
Similarly
(426)
Then Szegedy’s quantum walk operator takes the form
(427)
which is a rotation operator that resembles Grover’s algorithm. Note that
(428)
so
(429)
so the walk operator is the same as a block encoding of \( T_2(D)\) using qubitization, up to a matrix similarity transformation, and the eigenvalues are the same. In particular, consider the matrix power \( O_Z^k\) , which provides a block encoding of the Chebyshev matrix polynomial \( T_k(D)\) . Then the difference between \( O_Z^{2k}\) and \( \mathcal{U}_Z^k\) appears only at the beginning and end of the circuit.
In practical applications, we are often not interested in constructing the Chebyshev polynomial of \( A\) , but a linear combination of Chebyshev polynomials. For instance, the matrix inversion problem can be solved by expanding \( f(x)=x^{-1}\) using a linear combination of Chebyshev polynomials, on the interval \( [-1,-\kappa^{-1}]\cup [\kappa^{-1},1]\) . Here we assume \( \left\lVert A\right\rVert =1\) and \( \kappa=\left\lVert A\right\rVert \left\lVert A^{-1}\right\rVert \) is the condition number. One way to implement this is via a quantum primitive called the linear combination of unitaries (LCU).
Let \(T=\sum_{i=0}^{K-1} \alpha_i U_i\) be the linear combination of unitary matrices \( U_i\) . For simplicity let \(K=2^a\), and \( \alpha_i>0\) (WLOG we can absorb the phase of \( \alpha_i\) into the unitary \( U_i\) ). Then
(430)
implements the selection of \(U_i\) conditioned on the value of the \(a\)-qubit ancilla states (also called the control register). \(U\) is called a select oracle.
Let \(V\) be a unitary operation satisfying
(431)
and \(V\) is called the prepare oracle. The 1-norm of the coefficients is given by
(432)
In the matrix form
(433)
where the first basis is \(\ket{0^m}\), and all other basis functions are orthogonal to it. Then
(434)
Then \( T\) can be implemented using the unitary given in (1) (called the LCU lemma).
Lemma 1 (LCU)
Define \(W=(V^{\dagger}\otimes I_n) U(V\otimes I_n)\), then for any \(\ket{\psi}\),
(435)
where \( \ket{\widetilde{\perp}}\) is an unnormalized state satisfying
(436)
In other words, \( W\in\operatorname{BE}_{\left\lVert \alpha\right\rVert _1,a}(T)\) .
Proof
First
(437)
Then using the matrix representation (434), and let the state \(\ket{\widetilde{\perp}}\) collect all the states marked by \(*\) orthogonal to \(\ket{0^m}\),
(438)
The LCU Lemma is a useful quantum primitive, as it states that the number of ancilla qubits needed only depends logarithmically on \( K\) , the number of terms in the linear combination. Hence it is possible to implement the linear combination of a very large number of terms efficiently. From a practical perspective, the select and prepare oracles uses multi-qubit controls, and can be difficult to implement. If implemented directly, the number of multi-qubit controls again depends linearly on \( K\) and is not desirable. Therefore an efficient implementation using LCU (in terms of the gate complexity) also requires additional structures in the prepare and select oracles.
If we apply \(W\) to \(\ket{0^a}\ket{\psi}\) and measure the ancilla qubits, then the probability of obtaining the outcome \(0^a\) in the ancilla qubits (and therefore obtaining the state \(T\ket{\psi}/\left\lVert T\ket{\psi}\right\rVert \) in the system register) is \(\left(\left\lVert T\ket{\psi}\right\rVert /\left\lVert \alpha\right\rVert _1\right)^2\). The expected number of repetition needed to succeed is \(\left(\left\lVert \alpha\right\rVert _1/\left\lVert T\ket{\psi}\right\rVert \right)^2\). Now we demonstrate that using amplitude amplification (AA) in (3.3), this number can be reduced to \(\mathcal{O}\left(\left\lVert \alpha\right\rVert _1/\left\lVert T\ket{\psi}\right\rVert \right)\).
Remark 20 (Alternative construction of the prepare oracle)
In some applications it may not be convenient to absorb the phase of \( \alpha_i\) into the select oracle. In such a case, we may modify the prepare oracle instead. If \( \alpha_i=r_i e^{\mathrm{i} \theta_i}\) with \( r_i>0,\theta_i\in[0,2\pi)\) , we can define \( \sqrt{\alpha_i}=\sqrt{r_i} e^{\mathrm{i} \theta_i/2}\) , and \( V\) is defined as in (431). However, instead of \( V^{†}\) , we need to introduce
(439)
Then following the same proof as (1), we find that \( W=(\widetilde{V}\otimes I_n) U(V\otimes I_n)\in\operatorname{BE}_{\left\lVert \alpha\right\rVert _1,a}(T)\) .
Remark 21 (Linear combination of non-unitaries)
Using the block encoding technique, we may immediately obtain linear combination of general matrices that are not unitaries. However, with some abuse of notation, the term “LCU” will be used whether the terms to be combined are unitaries or not. In other words, the term “linear combination of unitaries” should be loosely interpreted as “linear combination of things” (LCT) in many contexts.
Example 38 (Linear combination of two matrices)
Let \( U_A,U_B\) be two \( n\) -qubit unitaries, and we would like to construct a block encoding of \( T=U_A+U_B\) .
There are two terms in total, so one ancilla qubit is needed. The prepare oracle needs to implement
(440)
so this is the Hadamard gate. The circuit is given by (27), which constructs \( W\in \operatorname{BE}_{\sqrt{2},1}(T)\) .
A special case is the linear combination of two block encoded matrices. Given two \( n\) -qubit matrices \( A,B\) , for simplicity let \( U_A\in\operatorname{BE}_{1,m}(A),U_B\in\operatorname{BE}_{1,m}(B)\) . We would like to construct a block encoding of \( T=A+B\) . The circuit is given by (28), which constructs \( W\in \operatorname{BE}_{\sqrt{2},1+m}(T)\) . This is also an example of a linear combination of non-unitary matrices.
Example 39 (Transverse field Ising model)
Consider the following TFIM model with periodic boundary conditions (\( Z_{n}=Z_0\) ), and \( n=2^\mathfrak{n}\) ,
(441)
In order to use LCU, we need \( (\mathfrak{n}+1)\) ancilla qubits. The prepare oracle can be simply constructed from the Hadamard gate
(442)
and the select oracle implements
(443)
The corresponding \( W\in\operatorname{BE}_{\sqrt{2n},\mathfrak{n}+1}(\hat{H})\) .
Example 40 (Block encoding of a matrix polynomial)
Let us use the LCU lemma to construct the block encoding for an arbitrary matrix polynomial for a Hermitian matrix \( A\) in (8.1).
(444)
with \( \left\lVert \alpha\right\rVert _1=\sum_{k\in[K]} |\alpha_k|\) and we set \( K=2^{a}\) . For simplicity assume \( \alpha_k\ge 0\) .
We have constructed \( U_k:=(U_A Z_{\Pi})^k\) as the \( (1,m)\) -block-encoding of \( T_k(A)\) . From each \( U_k\) we can implement the select oracle
(445)
via multi-qubit controls. Also given the availability of the prepare oracle
(446)
we obtain a \( (\left\lVert \alpha\right\rVert _1,m+a)\) -block-encoding of \( f(A)\) .
The need of using \( a\) ancilla qubits, and even more importantly the need to implement the prepare and select oracles is undesirable. We will see later that the quantum signal processing (QSP) and quantum singular value transformation (QSVT) can drastically reduce both sources of difficulties.
Example 41 (Matrix functions given by a matrix Fourier series)
Instead of block encoding, LCU can also utilize a different query model based on Hamiltonian simulation. Let \( A\) be an \( n\) -qubit Hermitian matrix. Consider \( f(x)\in\mathbb{R}\) given by its Fourier expansion (up to a normalization factor)
(447)
and we are interested in computing the matrix function via numerical quadrature
(448)
Here \( \mathcal{K}\) is a uniform grid discretizing the interval \( [-L,L]\) using \( \left\lvert \mathcal{K}\right\rvert =2^{\mathfrak{k}}\) grid points, and the grid spacing is \( \Delta k = 2L/\left\lvert \mathcal{K}\right\rvert \) . The prepare oracle is given by the coefficients \( c_k=\Delta k \hat{f}(k)\) , and the corresponding subnormalization factor is
(449)
The select oracle is
(450)
This can be efficiently implemented using the controlled matrix powers as in (11), where the basic unit is the short time Hamiltonian simulation \( e^{\mathrm{i} \Delta k A}\) . This can be used to block encode a large class of matrix functions.
In (8.1) we assume that \( U_A=U_A^{†}\) to block encode a Hermitian matrix \( A\) . For instance, \( s\) -sparse Hermitian matrices, such Hermitian block encodings can be constructed following the recipe in (7.5). However, this can come at the expense of requiring additional structures and oracles. In general, the block encoding of a Hermitian matrix may not be Hermitian itself. In this section we demonstrate that the strategy of qubitization can be modified to accommodate general block encodings.
Again start from the eigendecomposition (383), we apply \( U_A\) to \( \ket{0^m}\ket{v_i}\) and obtain
(451)
where \( \ket{\perp_i'}\) is a normalized state satisfying \( \Pi\ket{\perp_i'}=0\) .
Since \( U_A\) block-encodes a Hermitian matrix \( A\) , we have
(452)
which implies that there exists another normalized state \( \ket{\perp_i}\) satisfying \( \Pi\ket{\perp_i}=0\) and
(453)
Now apply \( U_A\) to both sides of (453), we obtain
(454)
which gives
(455)
Define
(456)
and the associated two-dimensional subspaces \( \mathcal{H}_i= \operatorname{span} B_i , \mathcal{H}'_i= \operatorname{span} B_i' \) , we find that \( U_A\) maps \( \mathcal{H}_i\) to \( \mathcal{H}_i'\) . Correspondingly \( U_A^{†}\) maps \( \mathcal{H}_i'\) to \( \mathcal{H}_i\) .
Then (451,455) give the matrix representation
(457)
Similar calculation shows that
(458)
Meanwhile both \( \mathcal{H}_i\) and \( \mathcal{H}_i'\) are the invariant subspaces of the projector \( \Pi\) , with matrix representation
(459)
Therefore
(460)
Hence \( \mathcal{H}_i\) is an invariant subspace of \( \widetilde{O}=U_A^{†}Z_{\Pi}U_A Z_{\Pi}\) , with matrix representation
(461)
Repeating \( k\) times, we have
(462)
Since any vector \( \ket{0^m}\ket{\psi}\) can be expanded in terms of the eigenvectors \( \ket{0^m}\ket{v_i}\) , we have
(463)
Therefore if we would like to construct an even order Chebyshev polynomial \( T_{2k}(A)\) , the circuit \( (U_A^{†}Z_{\Pi}U_A Z_{\Pi})^{k}\) straightforwardly gives a \( (1,m)\) -block-encoding.
In order to construct the block-encoding of an odd polynomial \( T_{2k+1}(A)\) , we note that
(464)
Using the fact that \( \mathcal{B}_i,\mathcal{B}_i'\) share the common basis \( \ket{0^m}\ket{v_i}\) , we still have the block-encoding
(465)
Therefore \( U_A Z_{\Pi}(U_A^{†}Z_{\Pi}U_A Z_{\Pi})^{k}\) is a \( (1,m)\) -block-encoding of \( T_{2k+1}(A)\) .
In summary, the block-encoding of \( T_{l}(A)\) is given by applying \( U_A Z_{\Pi}\) and \( U^{†}_A Z_{\Pi}\) alternately. If \( l=2k\) , then there are exactly \( k\) such pairs. The quantum circuit for each pair \( \widetilde{O}\) is
Otherwise if \( l=2k+1\) , then there is an extra \( U_A Z_{\Pi}\) . The effect is to map each eigenvector \( \ket{0^m}\ket{v_i}\) back and forth between the two-dimensional subspaces \( \mathcal{H}_i\) and \( \mathcal{H}_i'\) . In (8.6), we shall see that the separate treatment even/odd polynomials will play a more prominent role.
Now that we have obtained \( O_k\in\operatorname{BE}_{1,m}(T_k(A))\) for all \( k\) , we can use the LCU lemma again to construct block encodings for linear combination of Chebyshev polynomials. We omit the details here.
Let us briefly recap what we have done so far: (1) construct a block encoding of an Hermitian matrix \( A\) (the block encoding matrix itself can be non-Hermitian); (2) use qubitization to block encode \( T_k(A)\) ; (3) use LCU to block encode an arbitrary polynomial function of \( A\) (up to a subnormalization factor). This framework is conceptually appealing, but the practical implementation of the select and prepare oracles are by no means straightforward, and can come at significant costs.
Note that \( \mathrm{i} Z=e^{\mathrm{i} \frac{\pi}{2}Z}\) , if \( A\) is given by a Hermitian block encoding \( U_A\) , the block encoding of the Chebyshev polynomial in (8.1) can be written as
(466)
This is a special case of the following representation. Note that \( (-\mathrm{i})^d\) is an irrelevant phase factor and can be discarded.
(467)
The representation (467) is called a quantum eigenvalue transformation (QET).
Due to qubitization, \( U_{\widetilde{\Phi}}\) should block encode some matrix polynomial of \( A\) . We first state the following theorem without proof.
Theorem 3 (Quantum signal processing, a simplified version)
Consider
(468)
For any \( \widetilde{\Phi} := (\widetilde{\phi}_0, …, \widetilde{\phi}_d) \in \mathbb{R}^{d+1}\) ,
(469)
where \( P\in\mathbb{C}\) satisfy
Also define \( -\widetilde{\Phi} := (-\widetilde{\phi}_0, …, -\widetilde{\phi}_d) \in \mathbb{R}^{d+1}\) , then
(470)
Here \( P^*(x)\) is the complex conjugation of \( P(x)\) , and \( U^*_{\widetilde{\Phi}}\) is the complex conjugation (without transpose) of \( U_{\widetilde{\Phi}}\) .
Remark 22
This theorem can be proved inductively. However, this is a special case of the quantum signal processing in (6), so we will omit the proof here. In fact, (6) will also state the converse of the result, which describes precisely the class of matrix polynomials that can be described by such phase factor modulations. In (3), the condition (1) states that the polynomial degree is upper bounded by the number of \( U_A\) ’s, and the condition (3) is simply a consequence of that \( U_{\widetilde{\Phi}}\) is a unitary matrix. The condition (2) is less obvious, but should not come at a surprise, since we have seen the need of treating even and odd polynomials separately in the case of qubitization with a general block encoding. (470) can be proved directly by taking the complex conjugation of \( U_{\widetilde{\Phi}}\) .
Following the qubitization procedure, we immediately have (4).
Theorem 4 (Quantum eigenvalue transformation with Hermitian block encoding)
Let \( U_A\in\operatorname{HBE}_{1,m}(A)\) . Then for any \( \widetilde{\Phi} := (\widetilde{\phi}_0, …, \widetilde{\phi}_d) \in \mathbb{R}^{d+1}\) ,
(471)
where \( P\in\mathbb{C}\) satisfies the requirements in (3).
Using (4), we may construct the block encoding of a matrix polynomial without invoking LCU. The cost is essentially the same as block encoding a Chebyshev polynomial.
In order to implement \( e^{\mathrm{i} \phi Z_{\Pi}}\) , we note that the quantum circuit denoted by \( \operatorname{CR} _{\phi}\) is in (30) returns \( e^{\mathrm{i}\phi}\ket{0}\ket{0^m}\) if \( b=0^m\) , and \( e^{-\mathrm{i}\phi}\ket{0}\ket{b}\) if \( b\ne 0^m\) . So omitting the signal qubit, this is precisely \( e^{\mathrm{i} \phi Z_{\Pi}}\) .
Therefore, if \( A\) is given by a Hermitian block encoding \( U_A\) , we can follow the argument in (8.1) and construct the following unitary The corresponding quantum circuit is in (31), which uses one extra ancilla qubit. When measuring the \( (m+1)\) ancilla qubits and obtain \( \ket{0}\ket{0^m}\) , the corresponding (unnormalized) state in the system register is \( P(A)\ket{\psi}\) .
The QET described by the circuit in (31) generally constructs a block encoding of \( P(A)\) for some complex polynomial \( P\) . In practical applications (such as those later in this chapter), we would like to construct a block encoding of \( P_{\operatorname{Re}}(A)\equiv (\operatorname{Re} P)(A)=\frac12(P(A)+P^*(A))\) instead. Below we demonstrate that a simple modification of (31) allows us to achieve this goal.
To this end, we use (470). Qubitization allows us to construct
(472)
So all we need is to negate all phase factors in \( \widetilde{\Phi}\) . In order to implement \( \operatorname{CR} _{-\phi}\) , we do not actually need to implement a new circuit. Instead we may simply change the signal qubit from \( \ket{0}\) to \( \ket{1}\) :
which returns \( e^{-\mathrm{i}\phi}\ket{1}\ket{0^m}\) if \( b=0^m\) , and \( e^{\mathrm{i}\phi}\ket{1}\ket{b}\) if \( b\ne 0^m\) . In other words, the circuit for \( U_{P^*(A)}\) and \( U_{P(A)}\) are exactly the same except that the input signal qubit is changed from \( \ket{0}\) to \( \ket{1}\) .
Now we claim the circuit in (32) implements a block encoding \( U_{P_{\operatorname{Re}}(A)}\in \operatorname{BE}_{1,m+1}(P_{\operatorname{Re}}(A))\) . This circuit can be viewed as an implementation of the linear combination of unitaries \( \frac12 (U_{P^*(A)}+U_{P(A)})\) .
To verify this, we may evaluate
(473)
Here \( \ket{\perp},\ket{\perp'}\) are two \( (m+n)\) -qubit state orthogonal to any state \( \ket{0^m}\ket{x}\) , while \( \ket{\widetilde{\perp}}\) is a \( (m+n+1)\) -qubit state orthogonal to any state \( \ket{0}\ket{0^m}\ket{x}\) . In other words, by measuring all \( (m+1)\) ancilla qubits and obtain \( 0^{m+1}\) , the corresponding (unnormalized) state in the system register is \( P_{\operatorname{Re}}(A)\ket{\psi}\) .
If \( A\) is given by a general block encoding \( U_A\) , the quantum eigenvalue transformation should consist of an alternating sequence of \( U_A,U_A^{†}\) gates. The circuit is given by (33), and the corresponding block encoding is described in (5). Note that the Hermitian block encoding becomes a special case with \( U_A=U_A^{†}\) .
Theorem 5 (Quantum eigenvalue transformation with general block encoding)
Let \( U_A\in\operatorname{BE}_{1,m}(A)\) . Then for any \( \widetilde{\Phi} := (\widetilde{\phi}_0, …, \widetilde{\phi}_d) \in \mathbb{R}^{d+1}\) , let
(474)
when \( d\) is even, and
(475)
when \( d\) is odd. Then
(476)
where \( P\in\mathbb{C}\) satisfy the conditions in (3).
Following exactly the same procedure, we find that the circuit in (32), with \( U_{P(A)}\) given by (33) implements a \( U_{P_{\operatorname{Re}}(A)}\in\operatorname{BE}_{1,m+1}(P_{\operatorname{Re}}(A))\) . This is left as an exercise.
In practical applications, we may be interested in matrix polynomials \( f(A)\) , where \( f(x)\in\mathbb{R}\) does not have a definite parity. This violates the parity requirement of (3). This can be solved by using the LCU technique.
Note that
(477)
where \( f_{\mathrm{even}}(x)=\frac12(f(x)+f(-x)), f_{\mathrm{odd}}(x)=\frac12(f(x)-f(-x))\) . If \( |f(x)|\le 1\) on \( [-1,1]\) , then \( |f_{\mathrm{even}}(x)|,|f_{\mathrm{odd}}(x)|\le 1\) on \( [-1,1]\) , and \( f_{\mathrm{even}}(x)\) , \( f_{\mathrm{odd}}(x)\) can be each constructed using the circuit in (32). Introducing another ancilla qubit and using the LCU technique, we obtain a \( (1,m+2)\) -block-encoding of \( (f_{\mathrm{even}}(A)+f_{\mathrm{odd}}(A))/2\) . In other words, we obtain a circuit \( U_f\in \operatorname{BE}_{2,m+2}(f(A))\) . Note that unlike the case of the block encoding of \( P_{\operatorname{Re}}(A)\) , we lose a subnormalization factor of \( 2\) here.
Following the same principle, if \( f(x)=g(x)+\mathrm{i} h(x)\in\mathbb{C}\) is a given complex polynomial, and \( g,h\in\mathbb{R}\) do not have a definite parity, we can construct \( U_{g(A)}\in\operatorname{BE}_{2,m+2}(g(A)),U_{h(A)}\in\operatorname{BE}_{2,m+2}(h(A))\) . Then applying another layer of LCU, we obtain \( U_{f(A)}\in \operatorname{BE}_{4,m+3}(f(A))\) .
On the other hand, if the real and imaginary parts \( g,h\) have definite parity, then \( U_{g(A)}\in\operatorname{BE}_{1,m+1}(g(A)),U_{h(A)}\in\operatorname{BE}_{1,m+1}(h(A))\) . Applying LCU, we obtain \( U_{f(A)}\in \operatorname{BE}_{2,m+2}(f(A))\) .
The construction circuits in the cases above is left as an exercise.
In terms of implementing matrix polynomials of Hermitian matrices, quantum eigenvalue transform provides a much simpler circuit than the method based on LCU and qubitization (i.e., linear combination of Chebyshev polynomials). The simplification is clear both in terms of the number of ancilla qubits and of the circuit architecture. However, it is not clear so far for which polynomials (either a complex polynomial \( P\in\mathbb{C}\) or a real polynomial \( P_{\operatorname{Re}}\in\mathbb{R}\) ) we can apply the QET technique, and how to obtain the phase factors. Quantum signal processing (QSP) provides a complete answer to this question.
Due to qubitization, all these questions can be answered in the context of SU(2) matrices. QSP is the theory of QET for SU(2) matrices, or the unitary representation of a scalar (real or complex) polynomial \( P(x)\) . Let \( A=x\in[-1,1]\) be a scalar with a one-qubit Hermitian block encoding
(478)
Then
(479)
is a rotation matrix.
Similar to (467), the QSP representation takes the following form
(480)
By setting \( \phi_0=…=\phi_d=0\) , we immediately obtain the block encoding of the Chebyshev polynomial \( T_d(x)\) . The representation power of this formulation is characterized by (6), which is based on slight modification of [24, Theorem 4]. In the following discussion, even functions have parity \( 0\) and odd functions have parity \( 1\) .
Theorem 6 (Quantum signal processing)
There exists a set of phase factors \( \Phi := (\phi_0, …, \phi_d) \in \mathbb{R}^{d+1}\) such that
(481)
if and only if \( P,Q\in\mathbb{C}\) satisfy
Here \( \deg Q=-1\) means \( Q=0\) .
Proof
\( \Rightarrow:\)
Since both \( e^{\mathrm{i} \phi Z}\) and \( O(x)\) are unitary, the matrix \( U_{\Phi}(x)\) is always a unitary matrix, which immediately implies the condition (3). Below we only need to verify the conditions (1), (2).
When \( d=0\) , \( U_{\Phi}(x)=e^{\mathrm{i} \phi_0 Z}\) , which gives \( P(x)=e^{\mathrm{i} \phi_0}\) and \( Q=0\) , satisfying all three conditions. For induction, suppose \( U_{(\phi_0,…,\phi_{d-1})}(x)\) takes the form in (481) with degree \( d-1\) , then for any \( \phi\in\mathbb{R}\) , we have
(482)
Therefore \( U_{(\phi_0,…,\phi_{d-1},\phi)}(x)\) satisfies conditions (1),(2).
\( \Leftarrow:\)
When \( d=0\) , the only possibility is \( P(x)=e^{\mathrm{i} \phi_0}\) and \( Q=0\) , which satisfies (481).
For \( d>0\) , when \( d\) is even we may still have \( \deg P=0\) , i.e., \( P(x)=e^{\mathrm{i} \phi_0}\) and \( Q=0\) . In this case, note that
(483)
we may set \( \phi_j=(-1)^{j}\frac{\pi}{2}, j=1,…,d\) , and
(484)
Thus the statement holds.
Now given \( P,Q\) satisfying conditions (1)–(3), with \( \deg P=\ell>0\) , and \( \ell\equiv d \pmod 2\) . Then \( \deg(\left\lvert P(x)\right\rvert ^2)=2\ell>0\) , and according to the condition (3) we must have \( \deg(Q)=\ell-1\) . Let \( P,Q\) be expanded as
(485)
then the leading term of \( |P(x)|^2 + (1-x^2) |Q(x)|^2 \) is
(486)
which implies \( \left\lvert \alpha_{\ell}\right\rvert =\left\lvert \beta_{\ell-1}\right\rvert \) .
For any \( \phi\in \mathbb{R}\) , we have
(487)
It may appear that \( \deg{\widetilde{P}}=\ell+1\) . However, by properly choosing \( \phi\) we may obtain \( \deg{\widetilde{P}}=\ell-1\) . Let \( e^{2\mathrm{i} \phi}=\alpha_{\ell}/\beta_{\ell-1}\) . Then the coefficient of the \( x^{\ell+1}\) term in \( \widetilde{P}\) is
(488)
Similarly, the coefficient of the \( x^{\ell}\) term in \( \widetilde{Q}\) is
(489)
The coefficient of the \( x^{\ell}\) term in \( \widetilde{P}\) , and the coefficient of the \( x^{\ell-1}\) term in \( \widetilde{Q}\) are both \( 0\) by the parity condition. So we have
Here the condition (3) is automatically satisfied due to unitarity. The induction follows until \( \ell=0\) , and apply the argument in (484) to represent the remaining constant phase factor if needed.
Remark 23 (\( W\) -convention of QSP)
[24, Theorem 4] is stated slightly differently as
(490)
where
(491)
This will be referred to as the \( W\) -convention. Correspondingly (480) will be referred to as the \( O\) -convention. The two conventions can be easily converted into one another, due to the relation
(492)
Correspondingly the relation between the phase angles using the \( O\) and \( W\) representations are related according to
(493)
On the other hand, note that for any \( \theta\in\mathbb{R}\) , \( U_{\Phi}(x)\) and \( e^{\mathrm{i}\theta Z} U_{\Phi}(x) e^{-\mathrm{i}\theta Z}\) both block encodes \( P(x)\) . Therefore WLOG we may as well take
(494)
In many applications, we are only interested in \( P\in\mathbb{C}\) , and \( Q\in\mathbb{C}\) is not provided a priori. [21, Theorem 4] states that under certain conditions \( P\) , the polynomial \( Q\) can always be constructed. We omit the details here.
Note that the normalization condition (3) in (6) imposes very strong constraints on the coefficients of \( P,Q\in\mathbb{C}\) . If we are only interested in QSP for real polynomials, the conditions can be significantly relaxed.
Theorem 7 (Quantum signal processing for real polynomials)
Given a real polynomial \( P_{\operatorname{Re}}(x)\in\mathbb{R}\) , and \( \deg P_{\operatorname{Re}}=d>0\) , satisfying
then there exists polynomials \( P(x),Q(x)\in\mathbb{C}\) with \( \operatorname{Re} P=P_{\operatorname{Re}}\) and a set of phase factors \( \Phi := (\phi_0, …, \phi_d) \in \mathbb{R}^{d+1}\) such that the QSP representation (481) holds.
Compared to (6), the conditions in (7) is much easier to satisfy: given any polynomial \( f(x)\in\mathbb{R}\) satisfying condition (1) on parity, we can always scale \( f\) to satisfy the condition (2) on its magnitude. Again the presentation is slightly modified compared to [24, Corollary 5]. We can now summarize the result of QET with real polynomials as follows.
Corollary 1 (Quantum eigenvalue transformation with real polynomials)
Let \( A\in\mathbb{C}^{N\times N}\) be encoded by its \( (1,m)\) -block-encoding \( U_A\) . Given a polynomial \( P_{\operatorname{Re}}(x)\in\mathbb{R}\) of degree \( d\) satisfying the conditions in (7), we can find a sequence of phase factors \( \Phi\in\mathbb{R}^{d+1}\) , so that the circuit in (32) denoted by \( U_{\Phi}\) implements a \( (1,m+1)\) -block-encoding of \( P_{\operatorname{Re}}(A)\) . \( U_{\Phi}\) uses \( U_A,U_A^{†}\) , m-qubit controlled NOT, and single qubit rotation gates for \( \mathcal{O}(d)\) times.
Remark 24 (Relation between QSP representation and QET circuit)
Although \( O(x)=U_A(x) Z\) , we do not actually need to implement \( Z\) separately in QET. Note that \( \mathrm{i} Z=e^{\mathrm{i} \frac{\pi}{2}Z}\) , i.e., \( Z e^{\mathrm{i} \phi Z}=(-\mathrm{i} ) e^{\mathrm{i} (\frac{\pi}{2}+\phi)Z}\) , we obtain
(495)
where \( \widetilde{\phi}_0=\phi_0,\widetilde{\phi}_j=\phi_j+\pi/2,j=1,…, d\) . For the purpose of block encoding \( P(x)\) , another equivalent, and more symmetric choice is
(496)
When the phase factors are given in the \( W\) -convention, since we can perform a similarity transformation and take \( \Phi=\Phi^W\) , we can directly convert \( \Phi^W\) to \( \widetilde{\Phi}\) according to (496), which is used in the QET circuit in (33).
Example 42 (QSP for Chebyshev polynomial revisited)
In order to block encode the Chebyshev polynomial, we have \( \phi_j=0,j=0,…,d\) . This gives \( \widetilde{\phi}_0=0\) , \( \widetilde{\phi}_j=\pi/2,j=1,…,d\) , and
(497)
According to (496), an equivalent symmetric choice for block encoding \( T_d(x)\) is
(498)
QSP for real polynomials is the most useful version for many problems in scientific computation. Let us now summarize the problem of finding phase factors following the \( W\) -convention and identify \( \Phi=\Phi^W\) .
Given a target polynomial \( f=P_{\operatorname{Re}}\in\mathbb{R}\) satisfying (1) \( \deg(f)=d\) , (2) the parity of \( f\) is \( d \bmod 2\) , (3) \( \left\lVert f\right\rVert _{\infty}:=\max_{x\in[-1,1]} \left\lvert f(x)\right\rvert < 1\) , we would like to find phase factors \( \Phi:=(\phi_0,…,\phi_d)\in [-\pi,\pi)^{d+1}\) so that
(499)
with
(500)
(7) shows the existence of the phase factors. Due to the parity constraint, the number of degrees of freedom in the target polynomial \( f(x)\) is \( \widetilde{d} := \lceil \frac{d+1}{2} \rceil\) . Hence \( f(x)\) is entirely determined by its values on \( \widetilde{d}\) distinct points. Throughout the paper, we choose these points to be \( x_k=\cos\left(\frac{2k-1}{4\widetilde{d}}\pi\right)\) , \( k=1,...,\widetilde{d}\) , i.e., positive nodes of the Chebyshev polynomial \( T_{2 \widetilde{d}}(x)\) . The QSP problem can be equivalently solved via the following optimization problem
(501)
i.e., any solution \( \Phi^*\) to (499) achieves the global minimum of the cost function with \( F(\Phi^*)=0\) , and vice versa.
However, note that the number of variables is larger than the number of equations and there should be an infinite number of global minima. [25, Theorem 2] shows that the existence of symmetric phase factors
(502)
Then the optimization problem is changed to
(503)
This corresponds to choosing complementary polynomial \( Q(x)\in\mathbb{R}\) . With the symmetric constraint taken into account, the number of variables matches the number of constraints.
Unfortunately, the energy landscape of the cost function \( F(\Phi)\) is very complex, and has numerous global as well as local minima. Starting from a random initial guess, an optimization algorithm can easily be trapped at a local minima already when \( d\) is small. It is therefore surprising that starting from a special symmetric initial guess
(504)
at least one global minimum can be robustly identified using standard unconstrained optimization algorithms even when \( d\) is as large as \( 10,000\) using standard double precision arithmetic operations [25], and the optimization method is observed to be free from being trapped by any local minima. Direct calculation shows that \( g(x,\Phi^0)=0\) , and therefore \( \Phi^0\) does not contain any a priori information of the target polynomial \( f(x)\) .
This optimization based method is implemented in QSPPACK .
Remark 25 (Other methods for treating QSP with real polynomials)
The proof of [24, Corollary 5] also gives a constructive algorithm for solving the QSP problem for real polynomials. Since \( P_{\operatorname{Re}}=f\in\mathbb{R}\) is given, the idea is to first find complementary polynomials \( P_{\operatorname{Im}},Q\in\mathbb{R}\) , so that the resulting \( P(x)=P_{\operatorname{Re}}(x)+\mathrm{i} P_{\operatorname{Im}}(x)\) and \( Q(x)\) satisfy the requirement in (6). Then the phase factors can be constructed following the recursion relation shown in the proof of (6). We will not describe the details of the procedure here. It is worth noting that the method is not numerically stable. This is made more precise by [26] that these algorithms require \( \mathcal{O}(d\log(d/\epsilon))\) bits of precision, where \( d\) is the degree of \( f(x)\) and \( \epsilon\) is the target accuracy. It is worth mentioning that the extended precision needed in these algorithms is not an artifact of the proof technique. For instance, for \( d\approx 500\) , the number of bits needed to represent each floating point number can be as large as \( 1000\sim 2000\) . In particular, such a task cannot be reliably performed using standard double precision arithmetic operations which only has \( 64\) bits.
Let us now use \( f(x)=\cos(xt)\) as in the Hamiltonian simulation to demonstrate a typical workflow of QSP. This function should be an even or odd function to satisfy the parity constraint (1) in (7).
Remark 26
When the function \( f(x)\) of interest has singularity on \( [-1,1]\) , the function should first be mollified on a proper subinterval of interest, and then approximated by polynomials. A more streamlined method is to use the Remez exchange algorithm with parity constraint to directly approximate \( f(x)\) on the subinterval. We refer readers to [25, Appendix E] for more details.
Using QET, let us now revisit the problem of time-independent Hamiltonian simulation problem \( \mathcal{U}=e^{\mathrm{i} H t}\) . Instead of Trotter splitting, we assume that we are given access to a block encoding \( U_H\in\operatorname{BE}_{\alpha,m}(H)\) . Since \( e^{\mathrm{i} H t}=e^{\mathrm{i} (H/\alpha) (\alpha t)}\) , the subnormalization factor \( \alpha\) can be factored into the simulation time \( t\) . So WLOG we assume \( U_H\in\operatorname{BE}_{1,m}(H)\) . Since
(505)
and that \( \cos(xt),\sin(xt)\) are even and odd functions, respectively, we can construct a block encoding for the real and imaginary part directly using the circuit for real matrix polynomials in (32).
More specifically, we first use the Fourier–Chebyshev series of the trigonometric functions given by the Jacobi-Anger expansion \( [-1,1]\) :
(506)
Here \( J_{\nu}(t)\) denotes Bessel functions of the first kind.
This series converges very rapidly. With
(507)
terms, the truncated Jacobi–Anger expansion with degree up to \( d=2r+1\) can approximate \( \cos (t x),\sin(tx)\) to precision \( \epsilon/\sqrt{2}\) , respectively [24, Corollary 32]. Such a scaling also matches the complexity lower bound for Hamiltonian simulation [27, 28]. Define
(508)
where \( \beta>1\) is chosen so that \( |C_d|,|S_d|\le 1\) on \( [-1,1]\) , and \( \beta\) can be chosen to be as small as \( 1+\epsilon\) . Also let \( f_d(x)=C_d(x)+\mathrm{i} S_d(x)\) , then
(509)
(7) guarantees the existence of phase factors \( \widetilde{\Phi}_C,\widetilde{\Phi}_S\) , using which we can construct \( \mathcal{U}_C\in\operatorname{BE}_{1,m+1}(C_d(H))\) and \( \mathcal{U}_s\in\operatorname{BE}_{1,m+1}(S_d(H))\) . Finally, we can use one more ancilla qubit and LCU in (8.5.3) to construct a block encoding \( \mathcal{U}_d\in\operatorname{BE}_{2,m+2}(f_d(H))\) , or \( \mathcal{U}_d\in\operatorname{BE}_{2\beta,m+2}(\mathcal{U},\epsilon)\) . The circuit depth is \( \mathcal{O}\left(t + \log(1/\epsilon)\right)\) .
As an example, (34) shows the QSP representation of the quality of approximating \( \cos(tx)\) using \( P_{\operatorname{Re}}(x)=C_d(x)\) with \( t=4\pi,\beta=1.001,d=24\) . The quality of the approximation can be significantly improved with a larger degree \( d=50\) (see (35)). The phase factors are obtained via QSPPACK.
Given a block encoding \( U_H\in\operatorname{BE}_{1,m}(H)\) , and WLOG assume \( 0\preceq H\preceq 1\) . We assume that we are provided an initial state \( \ket{\varphi}\) so that the initial overlap \( p_0=\left\lvert \braket{\varphi|\psi_0}\right\rvert ^2\) is not small. To simplify the problem we also assume that ground and the first excited state energies \( E_0,E_1\) are known with a positive gap \( \Delta:=E_1-E_0>0\) . Our goal is to use QET to prepare an approximate quantum state \( \ket{\psi}\approx \ket{\psi_0}\) .
To this end, we can construct a threshold polynomial approximating the sign function on \( [0,1]\) . Due to the assumption that \( 0\preceq H\preceq 1\) , this function can be chosen to be an even function. Since the sign function is a discontinuous, the polynomial should only aim at approximating the sign function outside \( (\mu-\Delta/2,\mu+\Delta/2)\) , where \( \mu=(E_0+E_1)/2\) . The polynomial also needs to satisfy the conditions in (7). We need to find an even polynomial \( P_{\operatorname{Re}}(x)\) satisfying
(510)
We can achieve this by approximating the sign function, with \( \deg(P_{\operatorname{Re}})=\mathcal{O}(\log(1/\epsilon) \Delta^{-1})\) . (see e.g. [29, Corollary 7] and [24, Corollary 16]). This construction is based on an approximation to the \( \mathrm{erf}\) function. (36) gives a concrete construction of the even polynomial obtained via numerical optimization, and the phase factors are obtained via QSPPACK.
Applying the circuit \( U_{\Phi}\) to the initial state \( \ket{\varphi}\) , we have
(511)
Here \( \ket{\psi_{\perp}}\) is a state in the system register orthogonal to \( \ket{\psi_0}\) , while \( \ket{\perp}\) is orthogonal to all states \( \ket{0^m}\ket{\psi}\) . Note that
(512)
Therefore if we measure the ancilla qubits, the success probability of obtaining \( 0^m\) in the ancilla qubits, and the ground state \( \ket{\psi_0}\) in the system register is at least \( p_0 (1-\epsilon)\) . So the total number of queries to to \( U_A\) and \( U_A^{†}\) is \( \mathcal{O}(\Delta^{-1}p_0^{-1}\log(1/\epsilon) )\) .
Using amplitude amplification, the number of repetitions can be reduced to \( \mathcal{O}(\gamma^{-1})\) , and the total number of queries to to \( U_A\) and \( U_A^{†}\) becomes \( \mathcal{O}(\Delta^{-1} p_0^{-\frac12}\log(1/\epsilon) )\) . This also matches the lower bound [30].
Once the ground state is prepared, we can estimate the ground state energy by measuring the expectation value \( \braket{\psi_0|H|\psi_0}\) . The number of samples needed is \( \mathcal{O}(1/\epsilon^2)\) , which can be reduced to \( \mathcal{O}(1/\epsilon)\) using amplitude amplification. In summary, the best complexity for estimating the ground state energy \( E_0\) to accuracy \( \epsilon\) is \( \mathcal{O}(\Delta^{-1}p_0^{-\frac12}\epsilon^{-1}\log(1/\epsilon) )\) . Note that the cost of estimating the ground state energy depends on the gap \( \Delta\) . This is because the algorithm first prepares the ground state and then estimates the ground state energy. If we are only interested in estimating \( E_0\) to precision \( \epsilon\) , the gap dependence is not necessary (see (5.1) as well as [30]).
Exercise 29
Let \( A,B\) be two \( n\) -qubit matrices. Construct a circuit to block encode \( C=A+B\) with \( U_A\in\operatorname{BE}_{\alpha_A,m}(A),U_B\in\operatorname{BE}_{\alpha_B,m}(B)\) .
Exercise 30
Use LCU to construct a block encoding of the TFIM model with periodic boundary conditions in (195), with \( g\ne 1\) .
Exercise 31
Prove (10).
Exercise 32
Let \( A\) be an \( n\) -qubit Hermitian matrix. Write down the circuit for \( U_{P_{\operatorname{Re}}(A)}\in\operatorname{BE}_{1,m+1}(P_{\operatorname{Re}}(A))\) with a block encoding \( U_A\in\operatorname{BE}_{1,m}(A)\) , where \( P\) is characterized by the phase sequence \( \widetilde{\Phi}\) specified in (3).
Exercise 33
Write down the circuit for LCU of Hamiltonian simulation.
Exercise 34
Using QET to prepare the Gibbs state.
In (8), we have found that using qubitization, we can effectively block encode the Chebyshev matrix polynomial \( T_k(A)\) for a Hermitian matrix \( A\) . Combined with LCU, we can construct a block encoding of any matrix polynomial of \( A\) . The process is greatly simplified using QSP and QET, which allows the implementation a general class of matrix functions for Hermitian matrices.
In this section, we generalize the results of qubitization and QET to general non-Hermitian matrices. This is called the quantum singular value transformation (QSVT). Throughout the chapter we assume \( A\in\mathbb{C}^{N\times N}\) is a square matrix. QSVT is applicable to non-square matrices as well, and we will omit the discussions here.
For a square matrix \( A \in \mathbb{C}^{N \times N}\) , where for simplicity we assume \( N=2^n\) for some positive integer \( n\) , the singular value decomposition (SVD) of the normalized matrix \( A\) can be written as
(513)
or equivalently
(514)
We may apply a function \( f(\cdot)\) on its singular values and define the generalized matrix function [31, 32] as below.
Definition 4 (Generalized matrix function {[32, Definition 4]})
Given \( A\in\mathbb{C}^{N\times N}\) with singular value decomposition (513), and let \( f: \mathbb{R} \rightarrow \mathbb{\mathbb{C}}\) be a scalar function such that \( f\left(\sigma_{i}\right)\) is defined for all \( i\in[N]\) . The generalized matrix function is defined as
(515)
where
Given the form in (514), we also define two other types of generalized matrix functions.
Definition 5
Under the conditions in (4), the left and right generalized matrix function are defined respectively as
(516)
Here the left and right pointing triangles reflects that the transformation only keeps the left and right singular vectors, respectively. For a given \( A\) , somewhat confusingly, in the discussion below, the transformation \( f^{\triangleright}(A),f^{\triangleleft}(A),f^{\diamond}(A)\) will all be referred to as singular value transformations. In particular, QSVT mainly concerns \( f^{\triangleright}(A),f^{\diamond}(A)\) .
Proposition 12
The following relations hold:
(517)
and
(518)
Proof
Just note that \( A^{†}A=V\Sigma^2 V^{†}\) , we have \( \sqrt{A^{†}A}=V\Sigma V^{†}\) . So the eigenvalue and singular value decomposition coincide for both \( \sqrt{A^{†}A}\) and \( \sqrt{A A^{†}}\) .
WLOG we assume access to \( U_A\in\operatorname{BE}_{1,m}(A)\) , so that the singular values of \( A\) are in \( [0,1]\) , i.e.,
(519)
In (8.4) we have observed that when \( A\) is a Hermitian matrix, the qubitization procedure introduces two different subspaces \( \mathcal{H}_i\) and \( \mathcal{H}_i'\) associated with each eigenvector \( \ket{v_i}\) . In particular, \( U_A\) maps \( \mathcal{H}_i\) to \( \mathcal{H}_i'\) , and \( U_A^{†}\) maps \( \mathcal{H}_i'\) to \( \mathcal{H}_i\) . Furthermore, both \( \mathcal{H}_i\) and \( \mathcal{H}_i'\) are the invariant subspaces of the projection operator \( \Pi\) . Therefore \( \mathcal{H}_i\) is an invariant subspace of \( U_A^{†} f(\Pi) U_A\) for any function \( f\) . Much of the same structure can be carried out to the quantum singular value transformation. The only difference is that the qubitization is now defined with respect to the singular vectors. The procedure below almost entirely parallelizes that of (8.4), except that we need to work with the singular value decomposition instead of the eigenvalue decomposition.
Start from the SVD in (514), we apply \( U_A\) to \( \ket{0^m}\ket{v_i}\) and obtain
(520)
where \( \ket{\perp_i'}\) is a normalized state satisfying \( \Pi\ket{\perp_i'}=0\) .
Since \( U_A\) block encodes a matrix \( A\) , we have
(521)
which implies that there exists another normalized state \( \ket{\perp_i}\) satisfying \( \Pi\ket{\perp_i}=0\) and
(522)
Now apply \( U_A\) to both sides of (522), we obtain
(523)
which gives
(524)
Define
(525)
and the associated two-dimensional subspaces \( \mathcal{H}_i= \operatorname{span} B_i , \mathcal{H}'_i= \operatorname{span} B_i' \) , we find that \( U_A\) maps \( \mathcal{H}_i\) to \( \mathcal{H}_i'\) . Correspondingly \( U_A^{†}\) maps \( \mathcal{H}_i'\) to \( \mathcal{H}_i\) .
Then (520,524) give the matrix representation
(526)
Similar calculation shows that
(527)
Meanwhile both \( \mathcal{H}_i\) and \( \mathcal{H}_i'\) are the invariant subspaces of the projector \( \Pi\) , with matrix representation
(528)
Therefore
(529)
Hence \( \mathcal{H}_i\) is an invariant subspace of \( \widetilde{O}=U_A^{†}Z_{\Pi}U_A Z_{\Pi}\) , with matrix representation
(530)
Repeating \( k\) times, we have
(531)
In other words,
(532)
Therefore the circuit \( (U_A^{†}Z_{\Pi}U_A Z_{\Pi})^{k}\) gives \( (1,m)\) -block-encoding of \( T_{2k}^{\triangleright}(A)\) .
Similarly,
(533)
In other words,
(534)
Therefore the circuit \( U_A Z_{\Pi}(U_A^{†}Z_{\Pi}U_A Z_{\Pi})^{k}\) gives \( (1,m)\) -block-encoding of \( T_{2k+1}^{\diamond}(A)\) .
Remark 27
By approximating any continuous function \( f\) using polynomials, and using the LCU lemma, we can approximately evaluate \( f^{\diamond}(A)\) for any odd function \( f\) , and \( f^{\triangleright}(A)\) for any even function \( f\) . This may seem somewhat restrictive. However, note that all singular values are non-negative. Hence when performing the polynomial approximation, if we are interested in \( f^{\diamond}(A)\) , we can always use first perform a polynomial approximation of an odd extension of \( f\) , i.e.,
(535)
and then evaluate \( g^{\diamond}(A)\) . Similarly, if we are interested in \( f^{\triangleright}(A)\) for a general \( f\) , we can perform polynomial approximation to its even extension
(536)
and evaluate \( g^{\triangleright}(A)\) .
Due to the close relation between eigenvalue and singular value transformation in terms of Chebyshev polynomials and qubitization in (9.2), we can obtain the QSVT circuit easily following the discussion in (8.6).
First, there are no changes to the scalar case of QSP (in terms of SU(2) matrices), and in particular (6) and (7).
For the matrix case, when \( d\) is even,
(537)
gives a \( (1,m+1)\) -block-encoding of \( P^{\triangleright}(A)\) for some even polynomial \( P\in\mathbb{C}\) .
When \( d\) is odd,
(538)
gives a \( (1,m+1)\) -block-encoding of \( P^{\diamond}(A)\) for some odd polynomial \( P\in\mathbb{C}\) .
The quantum circuit is exactly the same as that in (33). The phase factors can be adjusted so that all polynomials \( P\) satisfying the conditions in (6) can be exactly represented. If we are only interested in some real polynomial \( P_{\operatorname{Re}}\in\mathbb{R}\) and \( P_{\operatorname{Re}}^{\diamond}(A)\) (odd) and \( P^{\triangleright}(A)\) (even), we can use (7) and the circuit in (37) (which is simply a combination of (32,33)) to implement its \( (1,m+1)\) -block-encoding. We have the following theorem. Since the conditions of QSP representation for real polynomials is simple to satisfy and is also most useful in practice, we only state the case with real polynomials.
Theorem 8 (Quantum singular value transformation with real polynomials)
Let \( A\in\mathbb{C}^{N\times N}\) be encoded by its \( (1,m)\) -block-encoding \( U_A\) . Given a polynomial \( P_{\operatorname{Re}}(x)\in\mathbb{R}\) of degree \( d\) satisfying the conditions in (7), we can find a sequence of phase factors \( \Phi\in\mathbb{R}^{d+1}\) , so that the circuit in (37) denoted by \( U_{\Phi}\) implements a \( (1,m+1)\) -block-encoding of \( P^{\diamond}_{\operatorname{Re}}(A)\) if \( d\) is odd, and of \( P^{\triangleright}_{\operatorname{Re}}(A)\) if \( d\) is even. \( U_{\Phi}\) uses \( U_A,U_A^{†}\) , m-qubit controlled NOT, and single qubit rotation gates for \( \mathcal{O}(d)\) times.
When \( A\) is a Hermitian matrix, the quantum circuit for QET and QSVT are the same. This means that the eigenvalue transformation and the singular value transformation are merely two different perspectives of the same object.
For a Hermitian matrix \( A\) , the eigenvalue decomposition and the singular value decomposition are connected as
(539)
Here
(540)
So if \( P\in\mathbb{C}\) is an even polynomial,
(541)
Similarly, if \( P\in\mathbb{C}\) is an odd polynomial, then
(542)
These relations indeed verify that the eigenvalue decomposition and singular value decomposition are indeed the same when \( P\) has a definite parity. When the parity of \( P\) is indefinite, the two objects are in general not the same, and in particular cannot be directly implemented using the QET circuit.
For general matrices, we have seen in the context of solving linear equations in (5.3) that the matrix dilation method in (226) can be used to convert the non-Hermitian problem to a Hermitian problem. Here we study the relation between QSP applied to the dilated Hermitian matrix, and QSVT for the general matrix.
Recall the definition of the dilated Hermitian matrix
(543)
When \( A\) is given by its block encoding \( U_A\in\operatorname{BE}_{1,m}(A)\) , the dilated Hermitian matrix \( \widetilde{A}\) can be obtained with one ancilla qubit through \( U_{\widetilde{A}}=\ket{0}\bra{1}\otimes U_A + \ket{1}\bra{0}\otimes U_A^\dagger\) , i.e., \( U_{\widetilde{A}}\in\operatorname{BE}_{1,m+1}(\widetilde{A})\) . Note that this requires the controlled version of \( U_A,U_A^{†}\) .
From the SVD in (514), we can construct
(544)
Direct calculation shows
(545)
i.e., \( \{\ket{z^{\pm}_i}\}\) are all the eigenvectors of \( \widetilde{A}\) .
For an arbitrary polynomial \( f\in\mathbb{C}\) , the matrix function applied to \( \widetilde{A}\) can be computed as
(546)
Here
(547)
Therefore applying the QSP to the dilated matrix \( \widetilde{A}\) automatically implements QSVT of \( A\) using polynomials of even and odd parities.
In particular, if \( f\) is an even function, then
(548)
i.e., measuring the signal qubit we obtain \( 0\) with certainty, and the system register is \( f^{\triangleright}_{\mathrm{even}}(A)\ket{\psi}\) . Similarly, if \( f\) is odd, then
(549)
i.e., measuring the signal qubit we obtain \( 1\) with certainty.
In this section, we revisit the problem of solving linear systems of equations \( Ax=b\) . With QSVT we can solve QLSP for general matrices without the need of dilating the matrix into a Hermitian matrix. Assume \( A=W\Sigma V^{†}\) is invertible, i.e., \( \forall i, \Sigma_{ii}>0\) , then
(550)
where \( f(x)=x^{-1}\) is an odd function.
WLOG assume \( A\) can be accessed by \( U_A\in\operatorname{BE}_{1,m}(A)\) . For simplicity we also assume \( \left\lVert A\right\rVert =1\) (though in general \( \left\lVert A\right\rVert \le \alpha=1\) , and we may not always be able to set \( \left\lVert A\right\rVert =1\) ). Let \( \kappa\) be the condition number of \( A\) , then the singular values of \( A\) are contained in the interval \( [\delta,1]\) , with \( \delta =\kappa^{-1}\) .
Note that \( f(\cdot)\) is not bounded by 1 and in particular singular at \( x=0\) . Therefore instead of approximating \( f\) on the whole interval \( [-1,1]\) we consider an odd polynomial \( p(x)\) such that
(551)
The \( \beta\) factor is chosen arbitrarily so that \( |p(x)|\leq 1\) for all \( x\in[-1,1]\) to satisfy the requirement of the condition (2) in (7). For instance, we may choose \( \beta=4/3\) . The precision parameter \( \epsilon'\) will be chosen later. The degree of the odd polynomial can be chosen to be \( \mathcal{O}(\frac{1}{\delta}\log(\frac{1}{\epsilon'}))\) is guaranteed by e.g. [21, Corollary 69]. This construction is not explicit (see an explicit construction in [22]). (38) gives a concrete construction of an odd polynomial obtained via numerical optimization, and the phase factors are obtained via QSPPACK.
Then (37) implements a \( (1,m+1)\) -block-encoding of \( p^{\diamond}(A^\dagger)=V p(\Sigma)W^\dagger\) denoted by \( U_{\Phi}\) . We have
(552)
The total number of queries to to \( U_A\) and \( U_A^{†}\) is
(553)
To solve QLSP, we assume the right hand side vector \( \ket{b}\) can be accessed through the oracle \( U_b\) such that
(554)
We introduce the parameter
(555)
which plays an important part in the success probability of the procedure. Let
(556)
be the unnormalized true solution, and the normalized solution state is \( \ket{x}=x/\left\lVert x\right\rVert \) . Now denote \( \widetilde{x}=p^{\diamond}(A^{\dagger})\ket{b}\) , and \( \ket{\widetilde{x}}=\widetilde{x}/\left\lVert \widetilde{x}\right\rVert \) . Then the unnormalized solution satisfies \( \|x-\widetilde{x}\|\leq\epsilon'\) . For the normalized state \( \ket{y}\) , this error is scaled accordingly. When \( \epsilon'\ll \|\ket{\widetilde{x}}\|\) , we have
(557)
Also we have
(558)
Therefore in order for the normalized output quantum state to be \( \epsilon\) -close to the normalized solution state \( \ket{x}\) , we need to set \( \epsilon'=\mathcal{O}(\epsilon \xi /\kappa)\) . This is similar to the case of the HHL algorithm in (5.3), where QPE needs to achieve \( \epsilon\) multiplicative accuracy, which means that the additive accuracy parameter \( \epsilon'\) should be set to \( \mathcal{O}(\epsilon/\kappa)\) .
The success probability of the above procedure is \( \Omega(\|\widetilde{x}\|^2)=\Omega(\xi^2/\kappa^2)\) . With amplitude amplification we can boost the success probability to be greater than \( 1/2\) with one qubit serving as a witness, i.e., if measuring this qubit we get an outcome 0 it means the procedure has succeeded, and if 1 it means the procedure has failed. It takes \( \mathcal{O}(\kappa/\xi)\) rounds of amplitude amplification, i.e., using \( U_{\Phi}^\dagger\) , \( U_{\Phi}\) , \( U_b\) , and \( U_b^\dagger\) for \( \mathcal{O}(\kappa/\xi)\) times. A single \( U_{\Phi}\) uses \( U_A\) and its inverse
(559)
times. Therefore the total number of queries to \( U_A\) and its inverse is
(560)
The number of queries to the \( U_b\) and its inverse is \( \mathcal{O}(\kappa/\xi)\) . We consider the following two cases for the magnitude of \( \xi\) .
So far we have assumed that we have a matrix \( A\) in mind, and the access to \( A\) is provided via the block encoding matrix \( U_A\) . The QSVT circuit takes the form
(561)
If we are further given two unitary matrices \( P,Q\) , we can equivalently rewrite the QSVT transformation as
(562)
The beginning of the equation ends with \( P\) or \( Q\) depending on the parity of \( d\) . The insertion of \( P,Q\) amounts to a basis transformation. Assume that we have access to \( \widetilde{U}_A=QU_A P^{†}\) , then \( U_A\) is the matrix representation of \( \widetilde{U}_A\) with respect to the bases given by \( P,Q\) , respectively. What is different is that the controlled rotations before \( U_A\) and \( U_A^{†}\) are now expressed with respect to two different basis sets, i.e., \( P \operatorname{CR} _{\widetilde{\phi}_d}P^{†}\) , and \( Q \operatorname{CR} _{\widetilde{\phi}_{d-1}}Q^{†}\) , respectively. This can be useful for certain applications. Let us now express these ideas more formally.
Assume that we are given an \( (n+m)\) -qubit unitary \( \widetilde{U}_A\) , and two \( (n+m)\) -qubit projectors \( \Pi,\Pi'\) . For simplicity we assume \( \operatorname{rank} (\Pi)= \operatorname{rank} (\Pi')=N\) . Define an orthonormal basis set
(563)
where the vectors \( \ket{\varphi_0},…,\ket{\varphi_{N-1}}\) span the range of \( \Pi\) , and all states \( \ket{v_i}\) are orthogonal to \( \ket{\varphi_j}\) . Similarly define an orthonormal basis set
(564)
where the vectors \( \ket{\psi_0},…,\ket{\psi_{N-1}}\) span the range of \( \Pi'\) , and all states \( \ket{w_i}\) are orthogonal to \( \ket{\psi_j}\) . We can think that the columns of \( \mathcal{B},\mathcal{B}'\) form the basis transformation matrix \( P,Q\) , respectively.
Then the matrix \( A\) is defined implicitly in terms of its matrix representation
(565)
Note that
(566)
we find that
(567)
Therefore (8) can be viewed as the singular value transformation of \( A\) , which is a submatrix of the matrix representation of \( U_A\) with respect to bases \( \mathcal{B},\mathcal{B}'\) .
The implementation of the controlled rotation \( P \operatorname{CR} _{\widetilde{\phi}}P^{†}\) relies on the implementation of \( \operatorname{C} _{\Pi} \operatorname{NOT} \) . The projectors \( \Pi,\Pi'\) can be accessed directly, and WLOG we focus on one projector \( \Pi\) . Motivated from Grover's search, we may assume access to a reflection operator
(568)
via the controlled NOT gates \( \operatorname{C} _{\Pi} \operatorname{NOT} , \operatorname{C} _{\Pi'} \operatorname{NOT} \) respectively. We can then define an \( m\) -qubit controlled NOT gate as
(569)
which can be constructed using \( R_{\Pi}\) as
(570)
Therefore assuming access to \( R_{\Pi}\) , the \( \operatorname{C} _{\Pi} \operatorname{NOT} \) gate can be implemented using the circuit in (39).
Then according to (8), we can implement the QSVT using \( \widetilde{U}_A,\widetilde{U}_A^{†}\) , \( \operatorname{C} _{\ket{0^m}\bra{0^m}} \operatorname{NOT} \) and single qubit rotation gates, where \( \ket{0^m}\bra{0^m}\otimes I_n\) is a rank-\( 2^n\) projector. In this generalized setting, \( \operatorname{C} _{\ket{0^m}\bra{0^m}\otimes I_n} \operatorname{NOT} = \operatorname{C} _{\ket{0^m}\bra{0^m}} \operatorname{NOT} \otimes I_n\) in the \( \mathcal{B},\mathcal{B}'\) basis should be implemented using \( \operatorname{C} _{\Pi} \operatorname{NOT} , \operatorname{C} _{\Pi'} \operatorname{NOT} \) , respectively. We arrive at the following theorem:
Theorem 9 (Quantum singular value transformation with real polynomials and projectors)
Let \( \widetilde{U}_A\) be a \( (n+m)\) -qubit unitary, and \( \Pi,\Pi'\) be two \( (n+m)\) -qubit projectors of rank \( 2^n\) . Define the basis \( \mathcal{B},\mathcal{B'}\) according to (563,564), and let \( A\in\mathbb{C}^{N\times N}\) be defined in terms of the matrix representation in (565). Given a polynomial \( P_{\operatorname{Re}}(x)\in\mathbb{R}\) of degree \( d\) satisfying the conditions in (7), we can find a sequence of phase factors \( \Phi\in\mathbb{R}^{d+1}\) to define a unitary \( U_{\Phi}\) satisfying
(571)
if \( d\) is odd, and
(572)
if \( d\) is even. Here \( \Pi'\ket{\perp_j'}=0\) , \( \Pi\ket{\perp_j}=0\) , and \( U_{\Phi}\) uses \( \widetilde{U}_A,\widetilde{U}_A^{†}\) , \( \operatorname{C} _{\Pi} \operatorname{NOT} , \operatorname{C} _{\Pi'} \operatorname{NOT} \) , and single qubit rotation gates for \( \mathcal{O}(d)\) times.
As an application, let us revisit the Grover search problem from the perspective of QSVT. Again let \( \ket{x_0}\) be the desired marked state, and \( \ket{\psi_0}\) be the uniform superposition of states. Now let us perform a basis transformation. We define an orthonormal basis set \( \mathcal{B}=\{\ket{\psi_0},\ket{v_1},…,\ket{v_{N-1}}\}\) , where all states \( \ket{v_i}\) are orthogonal to \( \ket{\psi_0}\) . Similarly define an orthonormal basis set \( \mathcal{B}'=\{\ket{x_0},\ket{w_1},…,\ket{w_{N-1}}\}\) , where all states \( \ket{w_i}\) are orthogonal to \( \ket{x_0}\) . Then the matrix of reflection operator \( R_{\psi_0}\) with respect to \( \mathcal{B},\mathcal{B'}\) is (let \( a=1/\sqrt{N}\) )
(573)
Let \( A=a\) be a \( 1\times 1\) matrix, and then \( R_{\psi_0}\in \operatorname{BE}_{1,n}(A)\) . Furthermore, the projectors \( \Pi=\ket{\psi_0}\bra{\psi_0}\) and \( \Pi'=\ket{x_0}\bra{x_0}\) can be accessed via the reflection operator \( R_{\psi_0},R_{x_0}\) , respectively. According to (570), this defines \( \operatorname{C} _{\Pi} \operatorname{NOT} , \operatorname{C} _{\Pi'} \operatorname{NOT} \) . Let \( \widetilde{U}_A=R_{\psi_0}\) , we have
(574)
and we would like to use (9) to find \( U_{\Phi}\) that block encodes
(575)
To this end, we need to find an odd, real polynomial \( P_{\operatorname{Re}}(x)\) satisfying \( P_{\operatorname{Re}}(a)\approx 1\) . More specifically, we need to find \( P_{\operatorname{Re}}(x)\) satisfying
(576)
We can achieve this by approximating the sign function, with \( \deg(P_{\operatorname{Re}})=\mathcal{O}(\log(1/\epsilon^2) a^{-1})=\mathcal{O}(\log(1/\epsilon)\sqrt{N})\) (see e.g. [29, Corollary 6]). This construction is also based on an approximation to the \( \mathrm{erf}\) function. (40) gives a concrete construction of an odd polynomial obtained via numerical optimization, and the phase factors are obtained via QSPPACK.
Then
(577)
More specifically, note that
(578)
for some unnormalized state \( \ket{\widetilde{\perp}}\) . Moreover
(579)
which gives
(580)
So
(581)
Therefore we can measure the system register to find \( x_0\) , and we achieve the same Grover type speedup.
Note that the approximation can be arbitrarily accurate, and there is no overshooting problem (though this is a small problem) as in the standard Grover search. While Grover’s search does not require the output quantum state to be exactly \( \ket{x_0}\) , this could be desirable when it is used as a quantum subroutine, such as amplitude amplification.
The immediate generalization of the procedure above is called the fixed point amplitude amplification.
Proposition 13 (Fixed-point amplitude amplification)
Let \( \widetilde{U}_A\) be an \( n\) -qubit unitary and \( \Pi'\) be an \( n\) -qubit orthogonal projector such that
(582)
Then there is a \( (n+1)\) -qubit unitary circuit \( U_{\Phi}\) such that
(583)
here \( U_{\Phi}\) uses the gates \( \widetilde{U}_A,\widetilde{U}_A^{†}, \operatorname{C} _{\Pi'} \operatorname{NOT} , \operatorname{C} _{\ket{\varphi_0}\bra{\varphi_0}} \operatorname{NOT} \) and single qubit rotation gates for \( \mathcal{O}(\log(1/\epsilon) \delta^{-1})\) times.
Proof
The procedure is very similar to Grover’s search. We only prove the case when \( \Pi\) is of rank \( 1\) , though the statement is also correct when the rank of \( \Pi\) is larger than \( 1\) . We can construct \( \mathcal{B}=\{\ket{\varphi_0},\ket{v_1},…,\ket{v_{N-1}}\}\) , where all states \( \ket{v_i}\) are orthogonal to \( \ket{\varphi_0}\) . Similarly define an orthonormal basis set \( \mathcal{B}'=\{\ket{\psi},\ket{w_1},…,\ket{w_{N-1}}\}\) , where all states \( \ket{w_i}\) are orthogonal to the target state \( \ket{\psi}\) . Since the target state \( \ket{\psi}\) belongs to the range of \( \Pi'\) ,
(584)
i.e.,
(585)
Now let \( \Pi=\ket{\varphi_0}\bra{\varphi_0}\) , we can use the same choice of \( P_{\operatorname{Re}}(x)\) as in Grover’s search so that \( \left\lvert P_{\operatorname{Re}}(x)-1\right\rvert =\mathcal{O}(\epsilon^2)\) for any \( x\ge \delta\) , and \( \deg(P_{\operatorname{Re}})=\mathcal{O}(\log(1/\epsilon) \delta^{-1})\) . The corresponding \( U_{\Phi}\) uses one ancilla qubit to block encode
(586)
Note that the ranks of \( \Pi',\Pi\) are different. This does not affect the proof.
Exercise 35 (Robust oblivious amplitude amplification)
Consider a quantum circuit consisting of two registers denoted by \( a\) and \( s\) . Suppose we have a block encoding \( V\) of \( A\) : \( A=(\bra{0}_a\otimes I_s)V(\ket{0}_a\otimes I_s)\) . Let \( W=-V(\mathrm{REF}\otimes I_s)V^{\dagger}(\mathrm{REF}\otimes I_s)V\) , where \( \mathrm{REF}=I_a-2\ket{0}_a\bra{0}_a\) . (1) Within the framework of QSVT, what is the polynomial associated with the singular value transformation implemented by \( W\) ? (2) Suppose \( A=U/2\) for some unitary \( U\) . What is \( (\bra{0}_a\otimes I_s)W(\ket{0}_a\otimes I_s)\) ? (3) Explain the construction of \( W\) in terms of a singular value transformation \( f^{\diamond}(A)\) with \( f(x)=3x-4x^3\) . Draw the picture of \( f(x)\) and mark its values at \( x=0,\frac12,1\) .
Exercise 36 (Logarithm of unitaries)
Given access to a unitary \( U=e^{\mathrm{i} H}\) where \( \left\lVert H\right\rVert \le \pi/2\) . Use QSVT to design a quantum algorithm to approximately implement a block encoding of \( H\) , using controlled \( U\) and its inverses, as well as elementary quantum gates.
[1] Quantum computation and quantum information 2000
[2] Classical and quantum computation American Mathematical Soc. 2002 47
[3] Quantum computing: from linear algebra to physical realizations CRC press 2008
[4] Quantum computing: A gentle introduction MIT Press 2011
[5] Quantum computing since Democritus Cambridge Univ. Pr. 2013
[6] Lecture notes for Physics 219: Quantum computation Caltech Lecture Notes 1999
[7] Quantum computing: Lecture notes arXiv:1907.09415 2019
[8] Lecture Notes on Quantum Algorithms 2021
[9] Fast amplification of QMA Quantum Inf. Comput. 2009 9 11 1053–1068
[10] Optimal quantum eigenstate filtering with application to solving quantum linear systems Quantum 2020 4 361
[11] Quantum many-particle systems Westview 1988
[12] Quantum algorithm for linear systems of equations Phys. Rev. Lett. 2009 103 150502
[13] Quantum linear systems algorithms: a primer arXiv:1802.08227 2018
[14] Matrix computations Johns Hopkins Univ. Press 2013 Baltimore Fourth
[15] Simulating physics with computers Int. J. Theor. Phys 1982 21 6/7
[16] Universal quantum simulators Science 1996 1073–1078
[17] Error bounds for exponential operator splittings BIT 2000 40 4 735–744 10.1023/A:1022396519656
[18] High-order exponential operator splitting methods for time-dependent Schrödinger equations SIAM J. Numer. Anal. 2008 46 4 2022–2038
[19] Theory of trotter error with commutator scaling Phys. Rev. X 2021 11 1 011020
[20] Random circuit block-encoded matrix and a proposal of quantum LINPACK benchmark Phys. Rev. A 2021 103 6 062412
[21] Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics arXiv:1806.01838 2018
[22] Quantum Algorithm for Systems of Linear Equations with Exponentially Improved Dependence on Precision SIAM J. Comput. 2017 46 1920–1950
[23] Quantum speed-up of Markov chain based algorithms 45th Annual IEEE symposium on foundations of computer science 2004 32–41
[24] Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing 2019 193–204
[25] Efficient phase factor evaluation in quantum signal processing Phys. Rev. A 2021 103 042419
[26] Product decomposition of periodic functions in quantum signal processing Quantum 2019 3 190
[27] Efficient quantum algorithms for simulating sparse Hamiltonians Commun. Math. Phys. 2007 270 2 359–371
[28] Optimal Hamiltonian Simulation by Quantum Signal Processing Phys. Rev. Lett. 2017 118 010501
[29] Hamiltonian simulation by uniform spectral amplification arXiv:1707.05391 2017
[30] Near-optimal ground state preparation Quantum 2020 4 372
[31] On generalized matrix functions Linear and Multilinear Algebra 1973 1 163–171
[32] Computation of Generalized Matrix Functions SIAM J. Matrix Anal. Appl. 2016 37 836–860