On Theorem proving and formal verification

Note: There is a more complete explanation at https://github.com/hwayne/lets-prove-leftpad/tree/master/idris

import Data.Vect

We define this eq_max theorem, since it's the core part of how the nice implementations of leftpad work: You subtract your existing length from your padding, and then add it back on, and since minus here is saturating subtraction, this does what we want and gives the maximum of the two.

For people not familiar, implementing a recursive function with a type like this is providing an inductive proof of a theorem. A function that takes some arguments is saying "forall," and then we're proving the equality in the result type.

eq_max : (n, k : Nat) -> maximum k n = plus (n `minus` k) k

The cases for n Z and Z (S _) are able to be directly proved, they're the base cases for our recursion.

eq_max  Z    (S _) = Refl

Every time we say rewrite _ in _, we're modifying the type of the value in the second position using an equality in the first position. For example, if you wrote a function that doubled the length of a vector, you might use it to prove that Vect (n + n) a (from appending) is the same as a Vect (n * 2) a. In this case we use it to explicitly cancel the 0 in the type of the first base case, since Idris doesn't do that for you.

eq_max  n     Z    = rewrite minusZeroRight n in rewrite plusZeroRightNeutral n in Refl

The final case is the inductive step. We use our rewrites to pull the S out of the equation, and then we're able to prove it by the inductive hypothesis (the recursive case).

eq_max (S n) (S k) = rewrite sym $ plusSuccRightSucc (n `minus` k) k in rewrite eq_max n k in Refl

Now that we've got our eq_max theorem, we can actually define our program. We want to implement leftpad. What does leftpad do? Well, we take a list (xs), a padding value (x), and a length (n). If the list is smaller than the padding value, then we repeat the padding value in front of the list until it's long enough. Symbolically, we might say that there's some number m such that we repeat x m times, and prepend that to list, and then the result is max (length xs) n.

If we want to make sure that we're implementing this correctly, one way to do this in Idris is return the result as well as a proof that it matches the spec. The tool to use is a "dependent pair," written (x : T ** P x), which says that we're returning a value of type T named x, and also that the fact P x is true. (Recall, theorems are just types, and proofs are just values, so we give a proof in the second position of the pair.) We want to say, we we start with a list xs of length k, which we write as xs : Vect k a, a value x, and a length n, and return a result ys : Vect (maximum k n) a, just like we said above in the spec. Further extending the spec, we want to say that there's some number m such that ys = replicate m x ++ xs, also a direct translation of our spec. Given this type, we can just write the same code we'd write in Haskell, and put the proof arguments after it. We also need to rewrite with the eq_max we made earlier to make sure that the maximum k n in the type matches the n `minus` k + k that we get from appending the lists in the implementation.

leftPad : (x : a) -> (n : Nat) -> (xs : Vect k a)
       -> (ys : Vect (maximum k n) a ** m : Nat ** ys = replicate m x ++ xs)
leftPad {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs ** n `minus` k ** Refl)

We can skip the "existential" for the padding size and inline the n `minus` k... I'm not sure if this is better, but it's slightly smaller!

leftPad' : (x : a) -> (n : Nat) -> (xs : Vect k a)
        -> (ys : Vect (maximum k n) a ** ys = replicate (n `minus` k) x ++ xs)
leftPad' {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs ** Refl)

	module HillelVerificationProblems where

	open import Data.Nat
	open import Data.Nat.Properties.Simple using (+-right-identity; +-comm)
	open import Data.List
	open import Data.List.All
	open import Data.List.Properties
	open import Relation.Binary.PropositionalEquality

	-- NB. (a ∸ b) is saturating subtraction and (a ⊔ b) is max(a,b).

	-- The one-line definition of leftPad is:
	leftPad' : ∀{Char : Set} -> Char -> ℕ -> List Char -> List Char
	leftPad' c n str = replicate (n ∸ length str) c ++ str
	-- Now let's prove this correct.

	-- First, we'll need some general lemmas about arithmetic and lists.
	+∸=⊔ : ∀ m {n} → m + (n ∸ m) ≡ n ⊔ m
	+∸=⊔ zero {zero} = refl
	+∸=⊔ zero {suc _} = refl
	+∸=⊔ (suc m) {zero} = cong suc (+-right-identity m)
	+∸=⊔ (suc m) {suc n} = cong suc (+∸=⊔ m)

	module _ {a : Set} {rest : List a} where
	take-++ : ∀{l n} -> n ≡ length l -> take n (l ++ rest) ≡ l
	take-++ {[]} refl = refl
	take-++ {x ∷ l} refl = cong (_∷_ x) (take-++ refl)

	drop-++ : ∀ l {n} -> n ≡ length l -> drop n (l ++ rest) ≡ rest
	drop-++ [] refl = refl
	drop-++ (x ∷ l) refl = drop-++ l refl

	all-replicate : ∀{n} {a : Set} {c : a} -> All (_≡_ c) (replicate n c)
	all-replicate {zero} = []
	all-replicate {suc n} = refl ∷ all-replicate

	-- Now let's move on to things specific to leftPad.
	module _ {Char : Set} (c : Char) (n : ℕ) (str : List Char) where
	padLen = n ∸ length str
	pad = replicate padLen c
	leftPad = pad ++ str -- the padded result.

	-- Lemmas.
	length-pad : length pad ≡ padLen
	length-pad = length-replicate padLen

	-- Correctness theorem #1: `leftPad` has the right length.
	leftPadLength : length leftPad ≡ n ⊔ length str
	leftPadLength = begin
	length leftPad
	≡⟨ length-++ pad ⟩
	length pad + length str
	≡⟨ cong₂ _+_ length-pad refl ⟩
	padLen + length str
	≡⟨ +-comm padLen _ ⟩
	length str + padLen
	≡⟨ +∸=⊔ (length str) ⟩
	n ⊔ length str ∎
	where open ≡-Reasoning

	-- Correctness theorem #2: The initial fragment of `leftPad` is all `c`s.
	initialOk : All (_≡_ c) (take padLen leftPad)
	initialOk = subst (λ r → All _ r) has-padding all-replicate
	where has-padding : pad ≡ take padLen leftPad
	has-padding = sym (take-++ (sym length-pad))

	-- Correctness theorem #3: The final fragment of `leftPad` is `str`.
	finalOk : drop padLen leftPad ≡ str
	finalOk = drop-++ pad (sym length-pad)

	-- Note: There is a more complete explanation at https://github.com/hwayne/lets-prove-leftpad/tree/master/idris
	import Data.Vect

	-- `minus` is saturating subtraction, so this works like we want it to
	eq_max : (n, k : Nat) -> maximum k n = plus (n `minus` k) k
	eq_max n Z = rewrite minusZeroRight n in rewrite plusZeroRightNeutral n in Refl
	eq_max Z (S _) = Refl
	eq_max (S n) (S k) = rewrite sym $ plusSuccRightSucc (n `minus` k) k in rewrite eq_max n k in Refl

	-- The type here says "the result is" padded to (maximum k n), and is padding plus the original
	leftPad : (x : a) -> (n : Nat) -> (xs : Vect k a)
	-> (ys : Vect (maximum k n) a m : Nat ys = replicate m x ++ xs)
	leftPad {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs n `minus` k Refl)

	-- We can skip the "existential" for the padding size and inline the n `minus` k... is this better?
	leftPad' : (x : a) -> (n : Nat) -> (xs : Vect k a)
	-> (ys : Vect (maximum k n) a ** ys = replicate (n `minus` k) x ++ xs)
	leftPad' {k} x n xs = rewrite eq_max n k in (replicate (n `minus` k) x ++ xs ** Refl)