How we verify Wikifunctions
The deployed code, proved equal to its Mathlib specification — and cross-checked against real CPython.
We give Wikifunctions' executable functions the formal specification they lack, by joining each function to a Mathlib (Lean 4) declaration via its shared Wikidata QID, then proving the deployed code meets that spec. Two functions are fully verified end-to-end so far; the rest of the addressable set has specs (see the spec tracker).
Three layers of assurance
Each layer compares something different, covers a different slice of inputs, and rests on a different trust assumption. They are complementary: the proof is the strongest claim but rests on a model of Python; the lean.py check executes the genuine interpreter and closes exactly that gap.
| Layer | What it compares | Coverage | Trust assumption |
|---|---|---|---|
| Deductive proofLean | the deployed Python, embedded in Lean, ⇔ the Mathlib spec | all inputs (∀), kernel-checked | our Lean model of Python's semantics matches CPython |
| Differential testcross-process | real CPython vs our compiled embedding | 829 cases (coprime) | none — runs real Python |
| lean.py in-processembedded CPython | real CPython (loaded into Lean) vs the Mathlib spec | 1607 (coprime) / 21 (factorial) | none — runs the genuine interpreter |
The deductive proof is the strongest result: it is a theorem about every input, checked by Lean's kernel. But a proof about embedded Python is only as good as the embedding — it rests on the one assumption that our Lean model of Python's semantics faithfully matches what CPython actually does. The lean.py layer closes precisely that gap: it loads the real CPython interpreter into the Lean process and runs the genuine deployed source, so its agreement with the spec carries no semantics assumption at all. The cross-process differential test is a third, independent witness from outside the Lean process. Together: the proof gives universal coverage, and the two CPython checks empirically discharge the proof's lone assumption.
Worked example 1 — “are coprime” Z13701
The deployed Python
def Z13701(a, b):
while b != 0:
a, b = b, a % b
return a == 1
The Lean embedding
The program is encoded as data over a tiny imperative-Python AST
(Imp.lean) with an operational
semantics; then we prove it equal to Mathlib's Nat.Coprime:
theorem runProgram_eq_coprime (a b : Nat) : runProgram a b = some (decide (Nat.Coprime a b))
- Proved for all
a, bwith clean axioms[propext, Classical.choice, Quot.sound]— nosorry, nonative_decide. - Differential test: 829 / 829, 0 mismatches.
- lean.py in-process: 1607 / 1607.
Worked example 2 — “factorial” Z13667
The deployed Python
def Z13667(n):
k = 1
for i in range(1, n + 1):
k *= i
return k
The theorem
theorem runFac_eq_factorial (n : Nat) : runFac n = some (Nat.factorial n)
- Proved (clean axioms
[propext, Classical.choice, Quot.sound]). - lean.py in-process: 21 / 21 vs
Nat.factorial.
Honest boundaries
These are features of an honest verification claim, not disclaimers to bury.
- Only 2 of 25 addressable functions are proved so far; the others have specs / oracles (tracked on the spec page).
- We also have Dafny + Verus proofs of coprime — but those verify a re-implementation against a spec written in the prover's own logic. That is a weaker, different claim than the Lean one (“the actual Python ⇔ Mathlib”).
- The deductive proof's one assumption (Lean semantics ≈ CPython) is the irreducible gap of verifying any real code. lean.py mitigates it empirically by running the genuine interpreter.
Source & reproduce
Every claim above is in the repository — the embedding, the two proofs, and both CPython cross-checks.
Wikifunctions/Python/Imp.lean
— the imperative-Python embedding (AST + operational semantics).Wikifunctions/Python/Z13701.lean
— the coprime proof (runProgram_eq_coprime).Wikifunctions/Python/Z13667.lean
— the factorial proof (runFac_eq_factorial).native/leanpy/Main.lean
— lean.py: real CPython loaded in-process, checked against the spec.native/difftest.py
— the cross-process differential test vs real CPython.Reproduce the proofs:
git clone https://github.com/Deicyde/wikifunctions && cd wikifunctions lake exe cache get lake build Wikifunctions.Python.Z13701 Wikifunctions.Python.Z13667
sorry and clean axioms; the two CPython
cross-checks report zero mismatches.