June 23, 2010

Type Systems, From 1000 feet high

I posted a link to one of my favorite articles ever on Facebook that is now gone. Here's the reddit link; the article was about what you should know before you debate type systems, since most people have fuzzy notions of what type systems are, what they do, and their properties. He said it best, but now that it's gone, I'll go over a few points that I remember the author addressed.

(edit: Reddit provides a Wayback Machine link, so you can read the original!).

First off, what is a type system, really? This is a bit hard to answer, but the a simple way to describe it is as a mechanism to prevent your code from executing nonsense by investigating what operations you are performing to what data. An example of this would be if you had "2 + potatoes" in your code: a type system would see that you cannot add a number and the symbol "potatoes" (nonsense!) and prevent you from doing so.

Note that we've already encountered a subtle distinction that is the source of confusion: when does this happen? There are two major forms of type systems: static types and dynamic types. A static type system will investigate your code before you run it, reporting any errors it sees, whereas a dynamic type system will tell you of errors during runtime.

When most people talk about a type system, they almost always mean static types. This does not mean that dynamically typed languages don't have type systems. Far from it. Compare this 'untyped' Ruby code:


my_array = [1,2,3]
0.upto(100){ |i| puts (my_array[i] + 1).to_s }

with the 'typed' C code:

unsigned array[3];
array[0] = 1;
array[1] = 2;
array[2] = 3;
unsigned i;
for(i = 0; i < 100; ++i) {
    printf("%d\n", array[i] + "Stack Pointer!");
}

The output of the Ruby:

2
3
4
types.rb:2:in `block in
': undefined method `+' for nil:NilClass (NoMethodError)
    from types.rb:2:in `upto'
    from types.rb:2:in `
'

And the C:

3915
3916
3917
3917
1606417842
... (continues for 100 lines) ...

We see that Ruby stops and C will plow right through! (To be fair, the C compiler will warn you of the type mismatch). Ruby's type error shows that, while dynamic, Ruby does have a type system, and shouldn't be called untyped. And while C has something people call a type system, it doesn't really function as one might expect.

Which brings up the next point: What should a type system do?. There are two major properties that a type system should strive to provide (incidentally, C/C++/Java don't provide these):

  • Progress: A well-typed expression can be evaluated further, unless the computation is finished. In short, if the expression is well typed, the rules of evaluation and type system guarantee "there's something we can do with it" (this excludes exceptions: "1/0" is well-typed, but we can't detect this beforehand without solving the Halting Problem).

  • Preservation: Evaluation of a well-typed expression leads to another well-typed expression.


Type systems that provide these properties mean that if your program passes the type checker, it can't "go wrong.", by that we mean, "the computer will always know what do to (progress), and never lead you into a false corner (preservation)." This doesn't mean your program will be bug-free, just that any bugs are logical bugs, or unhandled exceptions (basically, your own fault).

Languages with great type systems (SML, OCaml, Haskell) have proven these properties about their type systems, and it makes programming in those languages a joy.

My favorite part of the article, however, was the Fallacies section. Things people believe which just aren't true. I covered the one of the biggest ones already with the example ("dynamic typing means untyped!"), but here are two others that really get my goat:

  • Typed code is longer, more verbose. This, again, is untrue. Most people saying this are referring more to type annotations, which is text you write in your program to tell the compiler what the type of everything is. You'll find these in C, C++, Java, and C#; I once had a really ugly line of Java that looked something like:


    HashMap<String, ArrayList<Integer>> scoreMap = new HashMap<String, ArrayList<Integer>>();


    (I've ranted about Java's verbosity before). You won't find lines like that (or at least they're not mandatory) in SML, Haskell, or Scala. Using technology from 70's, we can infer types from the context of the code. So those ranting about statically typed code being verbose should really rant against type annotations, which are distinct.

  • Strong typing vs. Weak typing. THESE TERMS MEAN NOTHING. At least, nobody's agreed upon what they should mean. Even Wikipedia agrees with me. So stop saying it, and say what you mean; it's like saying a food is 'flavorful.'


It's a pity the original link is gone, he said a lot more than I did, and a lot more clearly. Still, type systems are fun, and go a lot deeper than this. If you're looking for a good introduction to programming with types, The Little MLer is hard to beat. For the theory, people seem to love Pierce. I'm not too far into it, but it looks promising.

No comments:

Post a Comment