Scott Encodings Are Awesome
If you've read a little about Lambda Calculus, you've most likely heard of the Church encoding; a mechanism in which one can encode data types (like numbers or lists) as functions. However, you've probably never heard about an alternative encoding, the Scott encoding, but before diving into it, let's take a quick look at Church's.
Both Scott and Church encodings transform raw data (as in a number, a list, etc.) into a function that warps control flow depending on that data. To understand how Church encodings work, let's look at an example encoding a natural number.
First we'll describe what type a Church Natural will have,
CNat :: forall r. r -> (r -> r) -> r
Let's understand this type: CNat takes an initial value
zero :: CNat
zero = \i f -> i
Zero ignores the function, returning the base value right away (i.e. applying the function zero times). Defining Succ is slightly more involved, but not too hard,
succ :: CNat -> CNat
succ pred = \i f -> f (pred i f)
What it does is first pass the base value and function to the predecessor; this applies the function the first N times. Then, it applies the function once more to the result; this makes the application count N+1. Let's see a basic example implementing 'add':
add :: CNat -> CNat -> CNat
add a b = a b succ
How does this work? To add 'a' and 'b', we fold over 'a': the base value will be 'b', and the transformation (run 'a' times) is to take its successor. This effectively takes 'a' successors of 'b', resulting in the sum.
And that's it. We've implemented the Naturals using Church encoding, known as Church Numerals. Scott encodings are similar, but they don't treat recursion as a fold; with Church encodings, passing a function to a number results in that many calls to the function. But with Scott encodings, the function is only called once.
Scott encodings are closer to what we'd consider a union type, or ADT. Let's define what a Scott Natural looks like:
SNat :: forall r. r -> (SNat -> r) -> r
Notice it's very similar to Church's, with only one distinction: the argument the function receives is the raw SNat, instead of another 'r'. This is because the Scott numeral does not run the function on the predecessor and pass the result; it passes the predecessor as-is. This is why Scott encodings don't treat recursive types differently.
As before, we can easily define both zero and succ:
zero :: SNat
zero = \caseZero caseSucc -> caseZero
succ :: SNat -> SNat
succ pred = \caseZero caseSucc -> caseSucc pred
Notice I've prefixed the arguments' name with 'case' to reflect the fact that these are like branches; the caller provides what to do in each case, and the numeral simply calls one of the branches, passing its arguments (none for zero, pred for succ).
Let's implement 'add' for Scott numerals as well to see clearly how they work:
add :: SNat -> SNat -> SNat
add a b = a (b) (\p -> succ (add p b))
How does this one work? We destruct the outermost 'a': if it's zero, we just return b (0+b => b). If it's p+1, we return the successor of p+b; (p+1)+b => 1+(p+b). Notice how, unlike with Church numerals, we need the add function to be recursive. This is because Scott numerals don't perform recursion by themselves like Church's did.
Also, notice how calling a Scott encoded value look a whole lot like pattern matching: we provide what should happen for each case, and the value chooses one of those branches.