How we verify Wikifunctions

The deployed code, proved equal to its Mathlib specification — and cross-checked against real CPython.

We give Wikifunctions' executable functions the formal specification they lack, by joining each function to a Mathlib (Lean 4) declaration via its shared Wikidata QID, then proving the deployed code meets that spec. Two functions are fully verified end-to-end so far; the rest of the addressable set has specs (see the spec tracker).

Three layers of assurance

Each layer compares something different, covers a different slice of inputs, and rests on a different trust assumption. They are complementary: the proof is the strongest claim but rests on a model of Python; the lean.py check executes the genuine interpreter and closes exactly that gap.

Layer	What it compares	Coverage	Trust assumption
Deductive proofLean	the deployed Python, embedded in Lean, ⇔ the Mathlib spec	all inputs (∀), kernel-checked	our Lean model of Python's semantics matches CPython
Differential testcross-process	real CPython vs our compiled embedding	829 cases (coprime)	none — runs real Python
lean.py in-processembedded CPython	real CPython (loaded into Lean) vs the Mathlib spec	1607 (coprime) / 21 (factorial)	none — runs the genuine interpreter

The deductive proof is the strongest result: it is a theorem about every input, checked by Lean's kernel. But a proof about embedded Python is only as good as the embedding — it rests on the one assumption that our Lean model of Python's semantics faithfully matches what CPython actually does. The lean.py layer closes precisely that gap: it loads the real CPython interpreter into the Lean process and runs the genuine deployed source, so its agreement with the spec carries no semantics assumption at all. The cross-process differential test is a third, independent witness from outside the Lean process. Together: the proof gives universal coverage, and the two CPython checks empirically discharge the proof's lone assumption.

Worked example 1 — “are coprime” Z13701

The deployed Python

def Z13701(a, b):
    while b != 0:
        a, b = b, a % b
    return a == 1

The Lean embedding

The program is encoded as data over a tiny imperative-Python AST (Imp.lean) with an operational semantics; then we prove it equal to Mathlib's Nat.Coprime:

theorem runProgram_eq_coprime (a b : Nat) : runProgram a b = some (decide (Nat.Coprime a b))

Proved for all a, b with clean axioms [propext, Classical.choice, Quot.sound] — no sorry, no native_decide.
Differential test: 829 / 829, 0 mismatches.
lean.py in-process: 1607 / 1607.

Worked example 2 — “factorial” Z13667

The deployed Python

def Z13667(n):
    k = 1
    for i in range(1, n + 1):
        k *= i
    return k

The theorem

theorem runFac_eq_factorial (n : Nat) : runFac n = some (Nat.factorial n)

Proved (clean axioms [propext, Classical.choice, Quot.sound]).
lean.py in-process: 21 / 21 vs Nat.factorial.

Honest boundaries

These are features of an honest verification claim, not disclaimers to bury.

Only 2 of 25 addressable functions are proved so far; the others have specs / oracles (tracked on the spec page).
We also have Dafny + Verus proofs of coprime — but those verify a re-implementation against a spec written in the prover's own logic. That is a weaker, different claim than the Lean one (“the actual Python ⇔ Mathlib”).
The deductive proof's one assumption (Lean semantics ≈ CPython) is the irreducible gap of verifying any real code. lean.py mitigates it empirically by running the genuine interpreter.

Source & reproduce

Every claim above is in the repository — the embedding, the two proofs, and both CPython cross-checks.

Wikifunctions/Python/Imp.lean — the imperative-Python embedding (AST + operational semantics).

Wikifunctions/Python/Z13701.lean — the coprime proof (runProgram_eq_coprime).

Wikifunctions/Python/Z13667.lean — the factorial proof (runFac_eq_factorial).

native/leanpy/Main.lean — lean.py: real CPython loaded in-process, checked against the spec.

native/difftest.py — the cross-process differential test vs real CPython.

Reproduce the proofs:

git clone https://github.com/Deicyde/wikifunctions && cd wikifunctions
lake exe cache get
lake build Wikifunctions.Python.Z13701 Wikifunctions.Python.Z13667

✓ Both proofs build green with no sorry and clean axioms; the two CPython cross-checks report zero mismatches.