February 24, 2010

GC, Continuations, Ruby Internals

When I was about 6, I told my mom I wanted to be a garbageman when I grew up. I mostly just liked the truck, and hanging from the side of it while it drove from house to house.

Garbage collection still fascinates me. I found this talk on Ruby's garbage collector of great interest. While we program with abstractions to make application development easier, the sad fact is that it still helps you avoid shooting yourself in the foot to know how your languages features are being implemented. Even if it didn't, wouldn't you be curious?

This is also a great example of a poor (or simply inflexible) design choice having major consequences down the road. In the talk they explain how Ruby's conservative, stop-the-world mark and sweep collector can't really be replaced by other, more efficient collectors due to the representation of objects. Personally, I'm partial to generational stop-and-copy's; but it only showed up in the talk as an impossibility. The best they could do was curb their lame mark-and-sweep (also Python uses reference counting lol).

I dropped a lot of jargon there; for any curious folk, I can explain what they mean in a later post ^_^ I'll spend the remainder of this one talking about another Ruby feature whose underlying implementation destroyed my algorithm.

---

Ruby is one of the few languages to support continuations out-of-the-box (in Ruby 1.8 anyways, in 1.9 you have to 'require continuation'). The presence of continuations are a sign of flair: when a designer has worked to put it in the language (Matz called it the hardest language feature of Ruby to implement) they are pretty much telling you they are committed to writing a flexible, powerful language that lets the programmer do whatever they want. Naturally, when I see a use for them, I use them.

Kent Dybvig shows us that continuations can be implemented to be very efficient. Unfortunately in Ruby, they aren't: like the GC, continuations are implemented in about the most bare-bones way possible. In Ruby, they implement a continuation by copying the entire program stack in its current state and storing it elsewhere. When you call the continuation, they copy back the old stack over the current one.

Here's the problem: I wrote a program to push the current continuation onto a stack before every call to the recursion. The idea was to use the continuations to keep track of backtracking over several parameters, and the stack meant you would only call as many as you needed.

The problem was, every recursive call increased the size of the stack, since Ruby doesn't support tail-call optimization. So at every recursive call, you would copy over the entire stack somewhere in memory, augment it, and recur. DEATH!

Needless to say, I found another solution. But this was another example of how the implementation of a feature can make the feature usable or not. Had I implemented the same algorithm in Dybvig's Chez Scheme, with both tail-calls and efficient continuations, this algo would have sailed.
---
Coming back full circle, there was a time when I considered peeking into the Ruby source and forking it to support Dybvig's stack frame model (gutting the whole language primarily to support... continuations?). Looking at the object representation from the GC talk though, it's probably much harder than I imagined ^_^.

February 22, 2010

m4d H4X

For two years I was a UTA for our department's Introduction to Security course, and my current roommate is the current Head TA. So when a friend was looking for someone to perform a security audit on his web application, he called my roommate, who called me in as his surgeon's assistant. Here's what we found:

Dictionary Attack


Anytime you have a problem in computing, there's always a 'dumb way to do it,' which normally involves checking every possibility. Remember being a kid and asking someone to guess your birthday? The first thing they ask is 'What month is it in?' Suppose you say 'August.'

A dictionary attack is the kid who closes his eyes and says "August 1st August 2nd August 3rd August 4th August 5th..." (and ruins the game).

The idea is this: if you want to guess someone's password, try every value it could be. You do this by trying to log in as them with every password, and you stop when one of them works.

Sounds dumb? It is, but never underestimate a fast, dumb computer. After all, it worked against Twitter.

The attack is called a Dictionary attack because the idea is that you try someone's e-mail address with every word in the dictionary. A simple dictionary (one I used for this) consists of the 500 most common passwords, a couple hundred first names, and an actual dictionary (the puzzle links to a text file). Since most people use real words as their passwords, there's a good chance you'll stumble upon the correct one.

To stop this, you have a few options:

  • Create a delay after some failed attempts, and report the behavior to an admin. So if someone messes up their password 3 times, make them wait 15 minutes. Another 3, make them wait an hour, etc. This slows down your opponent, and makes you aware of suspicious activity.

  • Demand strong passwords. We all get annoyed having to mix numbers and letters (one of the most common passwords is 'password1', the most common is '123456'), but it helps your security, since you won't find 'h4ll0MRP3ANut' in a dictionary.

  • Keep track of your requests, and stop trolls. This is a similar tactic to a DoS, but keep track of where people are logging in from. If you have 100 failed logins in 1 minute from IP Address 113.154.2.110, stop letting them try to connect (again, at least for a day or two).


File Upload


Most web applications let you upload files to share, or view online. There was once an artist who bound his book with sandpaper so that shelving and re-shelving it would destroy the books next to it. That bookshelf is your application, and that book is the other exploit we found.

The site in question had a file upload feature, so we uploaded an executable file that would run whatever command we fed it on the computer where it resided (in this case, the company server). As soon as we 'viewed' the document, it would execute. So a command like

find ../../ -name "config.rb" -exec grep password '{}' \;

Will find a configuration file, and find all the passwords in it (most web frameworks have contain a file).

The fix to this one is simple: don't let users upload any type of file your server might execute (unless, of course, you're a code hosting site, in which case you don't need to be told about this).

----

Security isn't the way movies make it out to be: most hacks aren't mathematical or cryptographic breaks, and they're never as dramatic. The brilliance in the best ones is that they're so simple. Most security holes are little leaks in the way software gets written, or more usually (like weak passwords) flaws in predictable human behavior.

February 19, 2010

Fruits

I made a video game! Czech it out ^_^



Follow my progress, as well as the rest of my games group, at our blog!

February 18, 2010

A short history of my belief

One of the (few) things I used to blog about was my atheism. Since I'm starting fresh in this new blog, an introduction is in order, since my relationship to belief has varied widely over the years, and religious belief is something I care passionately about.

I 'came out' as atheist almost as soon as I embraced the label, about 2 years ago, in my junior year at Brown. I'd been fighting doubts in my mind for a few years prior, but only really challenged and criticized my faith with any vigor or curiosity around when I came out.

I was raised culturally Catholic: technically my mother was a Catholic and my father a Protestant. I say was because my mother isn't technically allowed to practice anymore: she is my father's second wife. Since the Catholics don't believe in divorce, my mother could not longer 'officially' practice. When they got married, there wasn't a Catholic priest in town who would officiate the ceremony. They now attend a Methodist church.

We didn't grow up going to church, but I was still raised with a significant religious bent, in spirit more than practice. My parents always emphasized Doing the Right Thing that Christ Would Want as more important than saying any number of Hail Mary's. In Middle School especially I would pray every night before going to bed. I would wear a cross, and if I took it off before showering, would apologize to God (no joke).

As I got older, questions started forming. First, the simple facts-on-the-ground ones ("why does the church hate gay people?") to the higher level ones ("If God knows what I'm going to do, and even controls it, why does s/he care?"). They mounted over time, and after a while I just couldn't think about it anymore.

When Junior year rolls around, I read a few books, and realize the answer to all those questions is a very simple one: S/He doesn't exist. Suddenly, almost all the issues go away.

No more fighting my brain; no more defending stupidity manifesting itself in religious institutions. No more tainting my observations with false expectations of a Watcher or Creator. What happens on Earth is our own doing. It remains the most liberating moment of my life.

It can be depressing, however. Dawkins had an cute essay called Gerin Oil, where he compares religion to drug use (paraphrasing: "in light doses, it's rather harmless and often used as a social lubricant. In heavier doses...").

I think this is apt. Just like consuming substances can 'turn off' a part of your brain and allow you certain associated comforts, going atheist is like being the sober guy at a party. While other people are comfortable embracing blatant contradictions and living by hollow aphorisms, you get a far less sugar-coated view. Preventable injustices happen (no, God didn't want it). You aren't that special. When you die, its over.

All things considered, however, I'll take it, because a) if you look at it from another angle, it's not really bad at all, and b) I can believe it ^_^

So I might post about religion from time to time. If you agree, great! If you don't, better! Let's have a conversation.

February 17, 2010

This made me laugh today.

Yes he was a traitor, but I don't mind taking orders from Lando Calrissian when he plays it so cool.



Gotta love the cooool lovemakin' music they score it with.

February 15, 2010

Hiatus - And a voyage of vim

Sorry for the lack of updates. I hate when blogs say that, but publishing less than once a month defeats the purpose of a blog. I blame what I always have: my insistence that what I publish be Significant (the last two posts were more than epic enough), even when it goes against the interests of a) myself, and b) my readers (are there any there)?

But let's continue. What am I up to? Oh, yes! These are things I'm working on in school, and am likely to write about (if I ever do, in fact, write):

Solving Hard Problems with Combinatorial Optimization is one of the 5-6 legendary courses everyone at Brown CS is told to take, something like a rite of passage. I missed the chance to take one of those classes (Operating Systems) in lieu of another (Programming Languages Seminar), and while I loved loved loved PL, I won't allow myself to graduate without taking this course.

I'm also taking an independent study on Android Game Development, with some pretty awesome partners. It's something very different than I'm used to: whereas I'm used to writing programs that run on your terminal, performing computation or purely manipulating data, Android is framework programming with GUI's and graphics. The data in this case is usually mundane plumbing, and the easy part of your application.

Altogether, good fun. Hopefully I'll still have time (and this is dubious) to continue pursuing my lovely programming puzzles.

In the meantime, I'll bloviate some. Today I'll talk a bit about my tools with a focus on text editing.

I feel like refining and expanding upon your coding environment is something like building your own lightsaber: you can use someone else's, but you really only work productively (and more importantly, happily) when you find the tools to fit your needs yourself, and through much experimentation. For me, that environment is usually all-out vim + Makefiles (or in Java, vim + ant), and as much from the Terminal as possible.

A lot of junior hackers feel this way, but most 'enterprise developers' (those who know everything there is to know about the JVM and its classloading process but have no notion of first-class functions, or final vs. const in C#) love to stick it to me on their IDE's. Similarly, when I witnessed an emacs vs. vim brawl (one of our professors left their terminal exposed, revealing which side they were on), the emacs user felt under attack before I said my first words.

Here is why I use vim instead of emacs: because I learned it first. Alternatively, you can use any of the equally compelling reasons in the list below:

  • Because I prefer to map Caps Lock to Esc instead of Ctrl.

  • Because my brilliant hacker friends next to me use vim exceptionally well and inspired me (they could have just as easily been emacs users).

  • Because a friend put up a photo of Richard Stallman in the lab and joked about the fact that he used Ctrl^H for Help.


None of these are real reasons, folks. They're all true, and honest-to-Baal the only things that pushed me to vim. Since I have a soft spot for Lisp, I hope one day to pick up emacs, and remain open to the possibility that it may suit me better.

What I learned in the aforementioned editor brawl (and I hope we as an industry have outgrown the lame fighting*): if you see someone warring over their editor, don't feed the animals. It's clear that very smart and productive people use one or the other; that you don't like it is not reason for indignation. Further, who cares what someone else uses?

I would extend this to Tabs vs. Spaces as well, or where to put your open braces in if/else blocks. There are tools to convert to your preferred style. Change it, work on it, change it back.

---

For now, I use vim because I know it, and only started using it last year. Before that? Kate on the department filesystems, SubEthaEdit on my home computer, and nano when I ssh-ed from the terminal. I edited Java in Eclipse.

Do I recommend switching as I did? Yes. Keeping it general, if you are a coder who uses a primarily graphical environment, I highly recommend you become proficient with a serious, terminal-based text editor. You don't have to love it, you don't have to keep it, but try it, and give it a fair shot. Why?

  • Learning a Terminal-based editor lets you know you have a comfortable editor with you anytime, anywhere. When I used SubEthaEdit, it was not only Mac-specific, but commercial. If I wasn't editing on my laptop, I wasn't editing in my preferred environment.

  • Both major players encourage keyboard use. You will never know the shackles of inconvenience that frequent mouse usage is until you've been set free. Doing everything on the keyboard makes using the computer for 6-7+ hour stretches much more tolerable.

  • If you are in a resource-scarce environment (say, ssh on a shady connection), you can bet that software written in the early 90's or 70's will load quickly, and leave a small footprint on your resources.

  • They are scriptable. Many editors are programmable, but there's a difference to writing a tool in ELisp or Vimscript, and writing an Eclipse plugin.

  • Most of all, you look like a badass.


Of course, you lose some conveniences as well (especially in the case of Eclipse), but so far they haven't outweighed the benefits.

Of course, if you are an IDE user, there's a very (very) good chance you're a more premium coder than I am. I'm not making a value judgement on you; I simply suggest you try loving your Mother Terminal, who paved the way for the sexier IDE's.

*= Open question: What is our generation's equivalent? Ruby vs. Python? More generally, the static typers vs. dynamic typers? Chip in, as a former actor in training, the only things I love more than Computer Science is Drama ^_^