Tuesday, November 28, 2017

Or, we could not, and say we don’t have to

I previously wrote “Values never change types”, whose central thesis statement I hope is obvious. And this still holds, but there is something I left unsaid: values do not have identity, so the notion of “mutating” them is as nonsensical as “mutating 4”. And the formal system of Scala types treats objects with identity similarly, by not permitting them or their variable aliases to change type, even though they are not quite values. But this is a design decision, and other choices could have been made.

There are very good reasons not to make other choices, though. Other type systems come with features that come very close to making the opposite design choice; by imagining that they went just a little farther down this garden path, we can see what might have been.

Refinement, flow, or occurrence typing, by any name

In Flow and TypeScript, when you test properties of a value in a variable, you can “change the type” of that variable. For example, you could have a let s: any; if you write an if block that tests whether s is a string, the type of s—at compile-time, mind you—“changes” to string within the if body. Within the body of that if, you could perform further tests to refine s’s type further; you might also have various other if blocks alongside checking for other types, so that s might variously “change” into number, function, object with a whatsit property whose type is another object with a whosit property, and so on.

So, instead of having a single type attached to a lexical variable over its entire scope, a variable has several types, each tied to a block of code that uses the variable. It is an order more sophisticated, but still tied to the lexical structure of the program, as if the variable has multiplied to honor all the faces it might have. This is a great way to model how people are writing programs at the type level without overly complicating the formal system, which still must always obey a complete set of sound rules.

Contradictory refinement

In the systems I’ve described, no further refinement can contradict a prior one. So once you determine that a variable is a string, it’s not going to turn out to be a number later; at most, it can get more specific, like being proven to be a member of a known static set of strings. So this way you know that inner blocks cannot know less than outer blocks about the nature of a variable; that is what I mean by “tied to the lexical structure.”

What “real JavaScript code” could be written that would violate this assumption?

function foo(arg) {
    let s = arg
    if (typeof s === 'string') { // refines s to type string
        s = {teaTime: s}
        // "point of no return"
        if (s.teaTime === "anytime") {
            drinkTea(s)
            ...

The first if test establishes a block in which s’s type is string. Then we pull the rug out from under the type-checker by assigning to s; with that assignment, it is no longer true that s is a string. Why does this make type-checking more complex?

Let’s twist and tangle the program to support our beloved mutation

The type of the variable s no longer follows the block structure of the program, in the way we usually perceive blocks in a structured program. That’s because the fact established by the outer if test is suddenly invalidated partway through the block. So our first problem is one of raising the complexity burden on human interpretation of the program—the reader can no longer assume that the specificity of a variable’s type only increases as you move inward, reading the block structure of the program—but it is not fatal in itself, at least for this example. We can salvage the model via the Swiss Army knife of semantic analysis, the continuation-passing style (CPS) transform.

function foo(arg, k0) {
    return (s => if (typeof s === 'string') {
                     (_ => if (s.teaTime === "anytime") {
                               drinkTea(s, k0)
                           }
                     )(s = (teaTime: s))
                 }
           )(arg)

Now it is still possible for inner blocks to contradict outer blocks, but at least this is only possible at the block level. So, by “merely” revisualizing our programs in terms of the flow of continuations rather than the visibly apparent block structure, we can sort of still think of the type of variables as a “block”-level concern, as it was before.

Unluckily, performing a CPS transform in your head with all the code you see is a kind of “reverse Turing Test”, something that an AI would have to not be able to do very well in order to fool us into thinking it was human. So no matter what, we are stuck with a significant new complication in our formal model.

But not a fatal one. Yet.

Loop unrolling according to the phase of the moon

What will prove fatal to the formal model of type-changing mutation is delay. Let us see how the seeds of our destruction have already been sown.

while (forSomeTimeNow()) {
    log(s.substring(1))
    if (itsAFullMoon()) {
        s = {teaTime: s}
    }
    drinkTea(s)
}

The first question, and the last, is “is it safe to drinkTea?”

One necessary prerequisite is that it has been a full moon. I’m using this boolean expression to inject the Halting Problem—we cannot determine in a Turing-complete language whether a boolean expression will evaluate to true or false, generally—but it is probably sufficient to say that it is nondeterministic, even if not Turing-complete. (Pragmatists love Turing completeness and nondeterminism, because “more power is always better”.) So it’s hard enough—by which I mean generally impossible—to say whether the s assignment has happened.

The next prerequisite, which should drive us wholly into despair now if hope yet remains, is that the moon has been full once. Eh? Here’s where the tie of variable types to any semblance of code structure breaks down completely, because s takes on a surprisingly large number of types in this code sample.

To assign a precise type to this program, we have to accurately model what is happening in it. So, suppose that prior to entering this loop, the type of s is string. Each assignment to s—made each time the while’s test is true and the moon is full—takes us one step down this list of types.

  1. string
  2. {teaTime: string}
  3. {teaTime: {teaTime: string}}
  4. {teaTime: {teaTime: {teaTime: string}}}
  5. {teaTime: {teaTime: {teaTime: {teaTime: string}}}}
  6. •••

Now if you want to assign (potentially) all of these infinite possibilities to the program, you have to go even further from the block structure model. Imagine a third dimension of the program text: at the surface, you see s having only the first two types above, but as you look deeper, you see the branching possibilities—oh so many iterations in, oh so many times the moon has been full—each assigning different types to what is on the surface the same code. Looking at this as two-dimensional text, you would only see the infinite superimposition of all possible types of s, weighted according to their probability.

Three dimensions might be too few for this code.

Of course, there’s a well-known, sensible way to type this code, sans the log call: abandon the folly of modeling mutation and assign this recursive union type to s for at least the whole scope of the while loop, if not an even wider scope:

type TeaTimeTower = string | {teaTime: TeaTimeTower}

And supposing the drinkTea function is so polymorphic, all is well, and as a neat side bonus, easy to understand. But we aren’t here to pursue sanity; we gave that up to try to model mutation.

The devil in the delay

If fully desugared, while is a two-argument (not counting still thinking in CPS) higher-order function, taking test functions as arguments. Just like you’re writing Smalltalk.

while(() => forSomeTimeNow(),
      () => {
          log(s.substring(1))
          if(itsAFullMoon(),
             () => s = {teaTime: s})
          drinkTea(s)
      })

The thing that makes so much trouble for flow analysis is this delay. Type-changing requires us to contradict earlier refinements of a variable’s type, not simply refine them further. But the ability to capture a reference to a variable in a lambda means that we need a deep understanding of how that lambda will be used. It might never be invoked. It might be invoked later in the function, just when we thought it was safe to contradict whatever refinement it was type-checked with. It might be saved off in another variable or data structure elsewhere in the program, making reasoning about when the variable might be referenced in the future a futile endeavor.

Doing flow analysis with sources of delayed execution whose behavior is 100% known, like if and for, is tricky enough. Doing it in the presence of unknown, novel, potentially nondeterministic sources of delay is intractable, if not impossible.

And that’s for the computer. How many dimensions does the model in your head have, now? Zero, no, negative points for abandoning this ivory-tower static analysis and declaring “common sense” the arbiter of your programs’ correctness.

Did anyone else see you come here?

An object with known properties can be thought of as a group of named variables. This is a longtime, straightforward way to represent modules of functions, or clean up a global namespace by putting a lot of related items laying on the floor into the same drawer.

Since we love mutation, and we love mutating variables, we should love mutating object properties (and their types) even more, right?

function schedule(s) {
    s.teaTime = "anytime"
}

The type of s in the caller after schedule finishes is straightforward: it’s whatever it was before, with the teaTime field (whatever it might have been before, if anything) type set to string, or perhaps the literal singleton type "anytime".

But what schedule is so eager to forget will not be so easily forgotten by the rest of the program.

Namely, the contradicted, earlier type of s is very hard to reliably eradicate. This is an aliasing problem, and it brings the excitement of data races in shared-mutable-state multithreaded programs to the seemingly prosaic JavaScript execution model.

To type s in the aftermath of schedule, you must perfectly answer the question, “who has a reference to s”? Suppose that teaTime was a now-contradicted function. Any code that calls that function via property lookup on its reference now takes on another dimension: before schedule executes, it is safe, but afterwards it no longer is, so it takes on the prerequisite “can only be called before calling schedule(s).” The dimensional multiplication directly results from the multiplication of possible types for s.

The problem broadens virally when you try to model other variables that are not s, but whose types will still change due to schedule being called! Here is an example of such a variable.

const drinks = {coffee: coffeeMod; tea: s}
  // where s is the value we’re talking about

So all the analysis of references to s induced by the type mutation means references to drinks must undergo the same ordeal. And references to something that refers to drinks, and references to something that refers to that, and so on, ad infinitum.

And that is assuming we can statically determine what object identities will be flying around the program. As with so much else in this article, this is generally impossible.

By the way, the problem with lambdas is just a special case of this one; it’s exactly that lambdas alias variables that causes so much grief for our wannabe-mutating ventures.

A different kind of power, at a much more reasonable price

Since we are only imagining this insanity, not attempting to truly partake of it, we have something good to feel about, after all: the grass is really quite brown on the other side, alas, but that on this side is a little greener than first glance might indicate.

Type systems like those of Haskell, Java, OCaml, Scala, and many other languages simply don’t permit the types of variables to change. When you consider the introduction of type equalities in Haskell or Scala GADTs, or more direct refinements made in Typed Racket, Flow, or TypeScript; you can include all of these languages in the broader category of type systems whose variable types can only complement, never contradict.

This is a powerful simplifying assumption, because under this restriction, none of the above problems matter. “Functional types” are not only powerful enough to model programs, and far easier to understand for the human programmer, they are the only way out of the quagmire of complexity wrought by trying to “model mutation”. Even problems that almost look amenable to mutation analysis, like the while example above, admit a simpler solution in an immutable type like the recursive TeaTimeTower.

More power is sometimes worse.

When you forbid unneeded capabilities, you get back capabilities in other areas. Sometimes this comes in the form of improved understanding, such as we get for large programs by introducing type restrictions. It makes sense to give up “power” that is not practical to get benefits that are.

Take the forbidding of mutation. We take type-level immutability for granted in the same way that many practitioners take value-level mutability for granted. Perhaps one reason for resistance to functional programming might be that we are so accustomed to the drawbacks of unconstrained mutability that it does not seem quite as insane at the value level as, seen above, it is at the type level.

But, familiarity cannot make the insane any less so.

This article was tested with Flow 0.57.2 and TypeScript 2.5.