WikiLean

How we verify Wikifunctions

The deployed code, proved equal to its Mathlib specification — and cross-checked against real CPython.

We give Wikifunctions' executable functions the formal specification they lack, by joining each function to a Mathlib (Lean 4) declaration via its shared Wikidata QID, then proving the deployed code meets that spec. Two functions are fully verified end-to-end so far; the rest of the addressable set has specs (see the spec tracker).

Three layers of assurance

Each layer compares something different, covers a different slice of inputs, and rests on a different trust assumption. They are complementary: the proof is the strongest claim but rests on a model of Python; the lean.py check executes the genuine interpreter and closes exactly that gap.

Layer What it compares Coverage Trust assumption
Deductive proofLean the deployed Python, embedded in Lean, ⇔ the Mathlib spec all inputs (∀), kernel-checked our Lean model of Python's semantics matches CPython
Differential testcross-process real CPython vs our compiled embedding 829 cases (coprime) none — runs real Python
lean.py in-processembedded CPython real CPython (loaded into Lean) vs the Mathlib spec 1607 (coprime) / 21 (factorial) none — runs the genuine interpreter

The deductive proof is the strongest result: it is a theorem about every input, checked by Lean's kernel. But a proof about embedded Python is only as good as the embedding — it rests on the one assumption that our Lean model of Python's semantics faithfully matches what CPython actually does. The lean.py layer closes precisely that gap: it loads the real CPython interpreter into the Lean process and runs the genuine deployed source, so its agreement with the spec carries no semantics assumption at all. The cross-process differential test is a third, independent witness from outside the Lean process. Together: the proof gives universal coverage, and the two CPython checks empirically discharge the proof's lone assumption.

Worked example 1 — “are coprime” Z13701

The deployed Python

def Z13701(a, b):
    while b != 0:
        a, b = b, a % b
    return a == 1

The Lean embedding

The program is encoded as data over a tiny imperative-Python AST (Imp.lean) with an operational semantics; then we prove it equal to Mathlib's Nat.Coprime:

theorem runProgram_eq_coprime (a b : Nat) : runProgram a b = some (decide (Nat.Coprime a b))

Worked example 2 — “factorial” Z13667

The deployed Python

def Z13667(n):
    k = 1
    for i in range(1, n + 1):
        k *= i
    return k

The theorem

theorem runFac_eq_factorial (n : Nat) : runFac n = some (Nat.factorial n)

Honest boundaries

These are features of an honest verification claim, not disclaimers to bury.

Source & reproduce

Every claim above is in the repository — the embedding, the two proofs, and both CPython cross-checks.

Wikifunctions/Python/Imp.lean — the imperative-Python embedding (AST + operational semantics).
Wikifunctions/Python/Z13701.lean — the coprime proof (runProgram_eq_coprime).
Wikifunctions/Python/Z13667.lean — the factorial proof (runFac_eq_factorial).
native/leanpy/Main.lean — lean.py: real CPython loaded in-process, checked against the spec.
native/difftest.py — the cross-process differential test vs real CPython.

Reproduce the proofs:

git clone https://github.com/Deicyde/wikifunctions && cd wikifunctions
lake exe cache get
lake build Wikifunctions.Python.Z13701 Wikifunctions.Python.Z13667
Both proofs build green with no sorry and clean axioms; the two CPython cross-checks report zero mismatches.