Foundations — the third algorithm

A zero-knowledge proof system over a relation R is a triple (Prove, Verify, Simulate). The first two are familiar: Prove(stmt, witness) returns a transcript; Verify(stmt, transcript) returns accept or reject. The third, Simulate(stmt), takes only the public statement and returns a transcript whose distribution is computationally indistinguishable from a real prover's. Its existence is the formal content of the ZK property: any information a verifier could extract from an honest transcript is information the simulator already produced from the statement alone, so the transcript leaks nothing about the witness.

Production ZK code ships only the first two. The simulator lives in the security proof of the paper, not in the codebase. This page implements it, alongside the prover and verifier it shadows.

Witness-indistinguishability versus zero-knowledge

Witness-indistinguishability (Feige and Shamir, 1990) is the weaker cousin: a WI protocol guarantees that for any two valid witnesses w₀, w₁ for the same statement, the distributions Prove(stmt, w₀) and Prove(stmt, w₁) are computationally indistinguishable. WI does not require a simulator. ZK does: the indistinguishability is between a real prover and an algorithm with no witness at all.

For relations whose statement admits a sufficiently large family of valid witnesses, the two claims collapse onto each other under a uniform sampling argument: a simulator that samples a witness uniformly from R(stmt), runs the honest prover with perfectly-hiding commitments and a programmed Fiat-Shamir transcript, and outputs the resulting bytes satisfies the textbook ZK property in the random-oracle model. The verifier learns nothing beyond the public statement because (i) the commitments are perfect-hiding under uniform blinders, (ii) the challenges are programmed rather than hashed from witness-dependent data, and (iii) the witness marginal is uniform conditional on the statement. The Orchard Action relation is such a relation; the toy c = a·b relation on Page II is too. For relations with a unique witness, the gap between WI and ZK does not close this way; closing it requires the byte-level no-witness construction discussed on the Reference page.

The statement these pages prove

The bottom layer of every Halo 2 proof is a polynomial commitment opening. The prover commits to a polynomial p under a Pedersen commitment with a fresh blinder, then opens that commitment at a verifier-chosen point x to a claimed value y:

∃ p ∈ 𝔽ⁿ, r ∈ 𝔽 such that C = ⟨p, G⟩ + r·H and p(x) = y

where G = (G₀, …, G_n-1) and H are fixed Pasta-curve generators. The claim is over 𝔽 = the Pasta scalar field. The inner-product argument compresses the opening into O(log n) round messages, the last of which is forced by a single linear equation in the random challenges. That last forced value is what the simulator solves for.

The simulator's algebraic move

Given only (C, x, y), the simulator never inspects a witness. It walks the verifier's final check equation backwards:

Sample challenges. Pick u₁, …, u_k uniformly at random from the Pasta scalar field. These are the values the simulator will program the Fiat-Shamir transcript to return.
Sample final scalars. Pick a uniform a (the surviving entry of the folded vector) and a uniform blinder s.
Compute the target. The folded point P_k = a·G_final + (a·b_final)·U + s·H depends only on sampled values and the public statement, so it is fully determined. So is the starting point P₀ = C + y·U. The difference T = P_k − P₀ is what the (L_i, R_i) sums must hit.
Sample, then solve. Pick L₁, …, L_k and R₁, …, R_k-1 uniformly. The last R_k is forced by a single linear equation with non-zero coefficient (the challenge square u_k² is invertible in 𝔽): there is a unique R_k that closes the sum.

Every component except the last R_k is sampled uniformly; the last is forced but its marginal distribution conditional on the others is also uniform over the curve. The transcript verifies under the verifier's programmed challenges. Under a real hash like Blake2b it does not, and could not without breaking Blake2b's pseudorandomness.

The gate-equation generalization on page II handles a single polynomial constraint by a parallel solve: sample the opening evaluations y_A, y_B, y_C uniformly, compute Z_H(x₃), and solve for y_h so that y_A·y_B − y_C = y_h·Z_H(x₃) holds at the challenge point. Three samples and one inversion; that's the entire content of the toy gated simulator.

What the simulator's existence proves

Once you have a simulator whose output is indistinguishable from a real prover's, the rest of the ZK argument is one line: any predicate D that a verifier could compute on a real transcript is by definition computable on a simulator transcript at the same advantage. Since the simulator never saw the witness, neither did D. There is no information about the witness left in the transcript for D to extract.

This is why the simulator is the cryptographically load-bearing algorithm. Without it, "zero-knowledge" reduces to "we have a prover and a verifier and we haven't found a leak yet."

The next page, Halo 2, runs three variants of this construction on halo2_proofs directly: an honest prover, a witness-indistinguishable simulator, and a ROM-programmable zero-knowledge simulator, all on a single multiplication-gate circuit. The page after that, Orchard, lifts the same idea to the production Action circuit.

For the formal protocol context behind the relations the next pages simulate, see the Zcash protocol specification (§4.18.4 for the Action statement; §5.4.10.3 for the concrete Halo 2 instantiation).