The fighting's been fun and all, but it's time to shut up and get along

About once a week, I get an email in my mailbox that reads like this:

Hey, Kiln looks neat, but Git is totally the bee’s knees, so why the fuck are you using Mercurial?

Note that these emails are rarely (if ever) actually interested in why Kiln chose Mercurial; what they’re instead interested in is trying to piss me off enough that I get into a flamewar about why Mercurial is going to bring about Nirvana while Git causes people to eat babies using nothing but A1 sauce and a spork.

This is stupid.

Mercurial and Git are both DAG-based DVCSes. They use the same patch format. They both handle directories implicitly. They both can autodetect file deletions and renames. They both run on just about every platform I can think of. They both do a bunch of “cool” stuff, like rebasing, editing history, signing changesets, and serving as the inspiration for Ferris Bueller. They both have nearly identical performance characteristics, they both have social code sharing sites, they both are used by really big projects, they both have really good documentation, and they both got compared to James Bond or MacGyver or something like that in an analogy I didn’t really follow.

So why is there so much hating? I think what’s going on is that people are coming to these tools from Subversion or CVS, have their massive epiphany on how totally awesome DVCSes are, and then assume that only their tool can have this level of awesome, so they begin evangelizing. The problem, of course, is that the other guy feels the same way, and is also evangelizing, so the Git and the Mercurial guy end up in a locker-room-style temper-tantrum over whose tool has the best performance or whatnot, instead of how much more awesome their tools are than the competition.

This has to stop.

Mercurial’s enemy is not Git. Git’s enemy is not Mercurial.

Their enemy is Subversion.

For example, take a look at this tripe. Anyone who has seriously used Mercurial or Git for any length of time is going to spend most of the video alternating between laughing, peeing their pants, and trying to explain that they merely spilled a bunch of water on their crotch, honest. While there’s some truth to the video—a lot of people do ditch Subversion for off-line commits or for shelving or the like—the video also totally misses why people love and stay with DVCSes, which is that they make branching and merging actually work like they’re supposed to, and make source control so fast and seamless that you find you’re suddenly using it for everything.

But if you show that video to a Subversion user, they’re going to nod. And there’s really no reason for them not to: if you don’t grok what the DAG gives you, if you’ve never kicked an entire branch back-and-forth across a LAN, if you’ve never used a site like Bitbucket or GitHub, then the only tangible benefits you see from DVCSes are…well, partial commits, and the fact that their “checkouts” aren’t littered with .svn directories all over the place. And these are problems that Subversion’s upcoming versions actually might solve.

So it’s time to focus on the real enemy: the holdouts still using centralized systems. The way I see it, there are three parts to this:

  1. Git and Mercurial need to do a better job handling one thing that Subversion is still better at: binary files. I know there are Git projects that are working on improving the transfer and storage of binary files within Git’s existing repository format, such as git-bigfiles. Mercurial is taking a slightly different tack through projects such as bfiles, which aim to deliberately move large binaries out of the store. We need to improve these workflows so that “DVCSes are totally better, unless” becomes simply “DVCSes are totally better.” (For our part, Fog Creek is helping fund development of bfiles via UCOSP, a semester-long student project.)
  2. Git and Mercurial advocates need to remember that they need to be converting Subversion, CVS, and Perforce users, not each other. The fundamental evil here is centralized, branchless version control systems that effectively encourage all development to occur in trunk, and which make propagating bug fixes and features properly borderline impossible for all but the most disciplined shops.
  3. Git and Mercurial advocates need to remember that anyone going to either system is a win for the both communities. It’s easy, in the yin/yang of Hacker News and proggit, to forget that most developers are not even aware of what DVCSes are or what they do. Yeah. Sounds crazy, I know, but trust me on this. The goal right now, if you honestly believe that the DVCS workflow is better—and I do—should be to get the mindset out there, to make more people aware of what DVCSes have to offer and why they should be using them. I for one definitely do not care whether you end up deciding that Mercurial is better than Git for you or not (well, I kind of do, because I want you to use Kiln, but otherwise…), but I get a warm fuzzy feeling knowing that, if I ever have to work with your code, I’ll be able to use a sane version control tool.

So let’s do it. No more fighting. Git, I hereby acknowledge you rock. And Mercurial, you rock so much I helped build an entire product around you. You’re both awesome. So you two shake hands…very nice. Now, see that other dude over there? The one with no real tagging or branching support?

Let’s get him.

Kiln's Evolution, Part 2: From Prototype to Beta

This article is a continuation of Kiln’s Evolution, Part 1: DVCS as Code Review.

In the fall of 2008, Joel was getting increasingly adamant that FogBugz needed source control integration, and most people in the company seemed to think Subversion would probably be the best SCM to make that happen. Tyler and I disagreed, believing strongly that we should use a DVCS instead, and that our code review tool gave a really compelling example of why DVCS was better that any software shop would instantly “get.” But to convince the rest of the company, we’d have to show them a version of our tool that was more polished and usable than what we’d submitted to Django Dash.

And so we began a skunkworks project.

Unable to use time at work on much besides Copilot, I instead used my week of Thanksgiving vacation cleaning up the prototype’s user interface and functionality, named the result Kiln, and gave it its first logo. Tyler spent evenings in December making Kiln a proper, pluggable Django application, made the UI actually usable, and fixed a pile of bugs that would have blown up in our face if we’d tried showing Kiln to anyone else. By January, 2009, Kiln was ready to demo.

Kiln's review interface circa January, 2008
Kiln’s interface, directly after the winter skunkworks changes.

After lunch on a cold winter day, Tyler and I dragged everyone out into the kitchen and demoed the current state of Kiln. We showed repository management and the FogBugz-inspired code review workflow, and then made the case that this, or something very similar, should be FogBugz’ source control system.

And an amazing thing happened: somehow, everybody basically agreed. Sure, some people thought Kiln should be in C# or Wasabi instead of Django, no one could agree on whether Kiln should be a direct part of FogBugz or be an independent product, and Tyler and I argued strongly for Kiln as a hosted-only solution to a bunch of people who knew their bread-and-butter came from licensed applications, but everyone agreed that the basic of idea of a Mercurial-powered SCM with DVCS-backed code review made for a compelling product. And so Kiln was born.

For Kiln to develop into an adult, though, we had to assemble a Kiln team. Tyler and I were still working on Copilot, and the newly appointed team lead, Ben Kamens, was busy with the FogBugz 7 release. Even if all three of us started work immediately, we couldn’t possibly turn the project from prototype to beta by our target date of August 2009, and starting immediately seemed…well, optimistic, at best.

But we work at Fog Creek, and if there’s one thing Fog Creek knows how to do, it’s how to help interns churn out awesome products over the course of a single summer. After all, Tyler and I started at Fog Creek by developing all of Copilot in the summer of 2005; why not go for broke and try for a repeat? What we therefore decided to do was to bet the farm and put all of our summer interns on Kiln. The three of us would try to wrap up the work we had to do on our current projects as quickly as possible, and, as we transitioned off, we’d focus purely on building up enough Kiln infrastructure that the interns could immediately be productive when they arrived. Meanwhile, until the three of us could start work on Kiln, we’d have our project managers figure out the details of the user experience so that, once we finally could work on Kiln, we’d be able to focus as much as possible on coding instead of decision-making meetings.

As we moved ever closer to June, we made several key decisions about the design of the product:

  1. Kiln could launch hosted-only, but we’d need to ensure that its design was amenable to on-site installation.
  2. Kiln would depend on FogBugz for user management and bug-tracking integration, but would otherwise be its own code base.
  3. Kiln’s website would be written in C# and ASP.NET MVC, completely freeing it from the FogBugz legacy code base.
  4. The part of Kiln that needed to talk directly to the DVCS would be a separate component so that we target different (or even multiple) SCMs without changing the website.
  5. Code reviews on branches would be eliminated in favor of arbitrary discussions on files and changesets.

By the time the first interns arrived, we had a beautiful set of specs with lovely Balsamiq Mockups put together by Jason and Dan, and we’d managed to cobble together a basic framework that supported repository hosting and FogBugz integration, and that learned as much as possible from our best example of a known-good ASP.NET MVC code base, StackOverflow.

Our plan paid off ridiculously quickly. A week into the internship, the interns had already managed to get key pieces of Kiln limping along. By the end of the second week, they had enough ownership they were starting to challenge us when they felt the user specs or the engineering didn’t make sense. Their strong focus on core Kiln freed Tyler, Ben and me to focus on performance, billing, On Demand integration, and all the other things that absolutely must get done for a real product, but that no one would otherwise ever do.

Just over a month into the summer, Kiln might not win any speed or beauty awards, but nearly all of its features were working in one way or another, and it was usable for its intended purpose. In other words, we’d hit pre-alpha. With great fanfare, we decided that Kiln was ready for dogfooding, and Kiln development moved to Kiln itself.

There’s a slightly unfortunate thing about dogfooding, though: features that looked great on paper, and even worked perfectly in the prototype, end up not being what you want in the real product. Some interfaces end up not scaling the way you want. Some end up too complicated, or end up solving one particular problem at the expense of all others. It’s a testament to our PMs that we had comparably few of these occur, but sometimes the difference between the prototype and what we ended up shipping was massive. For example, compare the Balsamiq mockup of the code review system, which was actually used by the pre-alpha:

The Balsamiq Mockup of the Coventi-inspired code review system

with the version that we ended up actually shipping:

The current review system used by Kiln

Or take a look at the original specification for the Kiln Dashboard:

The mockup of Kiln's original developer dashboard

compared to the shipping equivalent, the Activity Feed:

The mockup of Kiln's original developer dashboard

(I apologize about using the mockups, rather than screenshots, for the earlier versions; trying to get Kiln circa June 2009 running at this point proved a royal pain in the butt, and I don’t honestly think that it makes a big difference.)

What’s not obvious in these two screenshots is that the change from the pre-alpha interface to the shipping interface frequently happened over the course of just a couple of weeks. Everyone was very vocal about what they liked and didn’t like, and the interns were happy to go through several iterations rapid-fire to find one that everyone liked. In that way, the weakest parts of Kiln ended up getting the most attention, and rapidly matured into some of its strongest features.

In a massive code sprint at the end of the summer, Kiln matured into something resembling a fully grown product. One of our interns made a beautiful JavaScript renderer for Kiln’s DAG, the novel repository management our PMs designed fully matured into beautiful JavaScripty goodness, our FogBugz/Kiln workflow became increasingly seamless, and we further loosened review requirements so that you could review arbitrary discontiguous changesets. We transitioned Copilot and FogBugz to be hosted on Kiln as well, got mostly positive feedback from the rest of the team, and worked on swiftly addressing their complaints. Despite these feature additions, Kiln’s performance went from tolerable, to better, to fast. We knew we had a winner on our hands. We prepared the FogBugz On Demand environment to become Kiln On Demand, made our first deployment, and turned the switch for our first batch of beta users.

And while you might expect this part of the story to be about everything going haywire and all hell breaking loose, what actually happened is that, against all odds, everything basically worked. The beta was quite boring: while there were a lot of bugs to fix at first, and some of them were extremely tough (for example, supporting very large repositories, or making history views faster, or legitimately supporting Internet Explorer) or really ticked off our customers (Kiln at once point let you rename and move repositories without breaking URLs, which sounded like an absolutely great idea when I helped hammer it through the design committee, but which completely blew up in one of our client’s faces a few weeks later), everything basically worked. Our beta testers seemed increasingly excited about the product. All of our gambles seemed to have paid off as we readied Kiln 1.0 for its November launch date.

Except, of course, that Kiln did not ship in November, 2009. That’s because, just a week or two before it was supposed to ship, we went to the Business of Software conference in San Francisco.

You see, that was where we realized we were doing it all wrong.

To be continued…

Firing Up Kiln

As Kiln draws ever closer to release, I realized that we have long since passed the point where I should move all of my personal projects to it.

So as of today, I have.

If you’re interested in grabbing the most recent version of FogBugz Middleware, my fork of Kiln Backup, or any of my other public projects, they’re now all at https://bqb.kilnhg.com. Log-in with the user guest and the password anonymous, and you’ll have full read-only access to the entire site.

Even if you’re not that interested in the stuff I’ve written, you still may be interested in checking out the site: by enabling read-only guest logins on my FogBugz account, I’ve made it very easy for you to take a look around a real, active Kiln and FogBugz install, without setting things up or filling out any annoying forms. If you just wanted to a get a quick feel for what Kiln looked and felt like, now’s your chance.

So go ahead and check it out. Feel free to post question or comments here, or (if appropriate) over at the Kiln StackExchange, and I’ll be happy to answer them.

Kiln's Evolution, Part 1: DVCS as Code Review

One of the things that really sucks about doing online code reviews is that, in all the systems I know, your code reviews do not integrate with your source control. If the code reviews are versioned at all—and they’re frequently not—then they’re in an entirely different system than your real VCS. For larger reviews, where you’re talking about a major piece of functionality, that means that your source control system will end up lacking the history of how a feature came to be. In other words, the more you use code reviews, the less actual history you have in your VCS.

That’s totally broken. You’re being punished for doing the right thing.

A little over a year a year ago, Tyler came to me and asked me to join him in the Django Dash, a weekend code sprint. Tyler and I had been talking about the code review problem, and had been thinking of writing our own that lacked these issues. Django Dash seemed like the perfect time to try to actually do that.

That opened up a question: how do you actually achieve better code review? If you accept that a huge part of the problem is that the code review history is out-of-stream with your VCS, then it follows that you have to somehow store the in-process code in the VCS.

In most systems, that means using branches. But branches in almost any system suck. Everyone has a horror story about trying to do a merge in Subversion, or CVS, or Perforce—and these are usually not meaningfully large merges; just small feature branches. Trying to use branches for long-running code reviews in these systems simply isn’t viable.

But DVCSes are great at handling this kind of problem. To be distributed, they have to have extremely robust branching and merging systems. Because their systems are so good, it’s very common in DVCSes to do quick experiments and features in their own branch, then merge when complete.

Tyler and I were both big fans of Mercurial (in fact, we convinced all of Fog Creek to switch to Mercurial from Subversion), so using Mercurial as our DVCS base seemed like the best bet. After some discussion of the technical details for making the system work, we got a good night’s sleep, woke up early, threw a nice breakfast on the table, and started coding.

Forty-eight hours later, we had our first prototype. When users wanted to contribute code to a repository, they would fork the repository, push all of their changes to the fork, and then request a review on the fork. Users would see the exact diff of what they would be approving to the repository; no more, no less. Code could not be approved unless it had already merged in the trunk, ensuring that the user who wrote the code had taken care of the merge. When the review was approved, it’d be seamlessly merged into trunk (guaranteed seamlessly, due to the previous rule), with full history.

The design was inflexible and unintuitive, and would have had serious issues in a shipping project, but we achieved what we set out to do:

  1. Approving a code review was the same as pushing it
  2. Which meant that we could fully separate the concepts of code author and code approver
  3. And which meant that the full history of reviewed code was completely preserved

The original Kiln code review

A screenshot of the prototype that would later become Kiln

Even in its nascent form, the tool was already impressive enough, and unique enough, that we won the Django Code Dash.

Tyler and I talked of making our code review tool into a real product, but we were knee-deep in Copilot work, so it had to wait. But we had proved, if only to ourselves, that using a DVCS, even in a centralized model, provided some very unique capabilities that simply were not possible in other systems.

When Joel announced a few months later that FogBugz needed a source control system, we’d be ready.

The Launch of a Secret Product

For the past year, an odd thing has happened, if you’ve followed my doings. My work on Fog Creek Copilot seemed to dwindle, I became tight-lipped about what I was working on, and I started getting really excited about an upcoming product release. Also around this time, my knowledge of Mercurial, Python, C#, and ASP.NET MVC all seemed to dramatically increase, even though my free-time code output shrank to nothing. What was going on?

Oh, the usual. I was working on a top-secret brand-new project. And now, it’s released to closed beta.

Kiln, the modern take on DVCS

I’d like to introduce to you Kiln, a brand-new source code hosting and code review tool from Fog Creek. Kiln introduces what I believe is a truly novel take on code reviews that integrates the strengths of Mercurial and FogBugz to provide rejectable code review after commit. We also have some unique takes on Mercurial features, such as the ability to preview what will go into a merge beforehand, really awesome branch management, the most beautiful DAG view I’ve seen in any DVCS product, and lots more.

Over the next few days, I’ll be providing a couple of blog posts detailing how Kiln was developed. In the meantime, go check out Kiln and sign up for the beta. We’re approving new people for the beta on a regular basis, so if you don’t get an invite immediately, don’t worry; we’ll get to you sooner rather than later.

hg log -R tips_and_tricks

I was delighted to find that Steve Losh has begun making a website called hg tip—a site updated on a regular basis with Mercurial tips for both beginner and expert users. The site’s beautifully designed and a pleasure to read. If you use Mercurial, do yourself a favor and go take a look.

(My favorite tip, incidentally, is definitely the tutorial on making a command called nudge, which allows you to push only the current head by default, rather than all of them. I’ve been using a variant of that, combined with bookmarks, to have Git-style lightweight branching in my Mercurial work when appropriate.)

Finally, a Phone I Can Code For

Finally, a phone whose dev program doesn’t make me want to vomit. Now if only Palm would get the Pre out on Verizon faster, I might actually do so…

local_settings.py Considered Harmful

One piece of increasingly conventional wisdom when developing Django applications is that your settings.py file ought to conclude with some variant of

try:
    from local_settings import *
except ImportError:
    pass

the idea being that you can put a local_settings.py file on each server with the appropriate information for that site.

I would challenge that this approach is wrong. By definition, you cannot put local_settings.py in version control, which means that a critical part of your infrastructure—how to deploy the thing—is not getting versioned. At the same time, using the same settings for your development box and your deployment box is unreasonable, and usually stupid.

After being unhappy with several solutions, including the one Simon Willison advocated in his excellent Django Heresies talk (which has a pile of other great tips as well), I stole a bit from everything I’d seen to come up with what I believe is my own design.

Rather than importing from local_settings, I decided to make settings be a full-blown module and use some Python magic. First, kill settings.py and make a settings directory. Then, add the following to settings/__init__.py:

import socket

hostname = socket.gethostname().replace('.', '_').lower()
try:
    custom_settings = __import__(hostname)
    __all__.append(custom_settings)
except ImportError:
    from development import *

While this is definitely somewhat magical, it’s not too hard to follow: grab the hostname, convert it into a more Python-like form, and attempt to import it. If that succeeds, add it to __all__ for export. Otherwise, default to the development import.

With this change, we can now have a file for each machine we intend to deploy on, named after the machine, put into the settings directory, and it’ll be used automatically. Now, you’ve got the best of both worlds: site-specific configuration, managed by source control.

There’s one little caveat here that may or may not bother you: one of the data that generally go into local_settings.py is your database password, which you probably do not want in version control. Thankfully, that’s trivial to work around: in a non-version-controlled file on the server, create a settings/passwords.py file which contains only relevant passwords for that system. Then, at the tail end of that site’s settings file, add:

from passwords import *

It’s the same trick we used to use for local_settings.py, except that now, the only thing that’s not version-controlled is your password. Everything else—memcached settings, custom middleware and template directories, ADMINS lists, and so on—will be safely in your VCS of choice.

The One in Which I Call Out Hacker News

“Implementing caching would take thirty hours. Do you have thirty extra hours? No, you don’t. I actually have no idea how long it would take. Maybe it would take five minutes. Do you have five minutes? No. Why? Because I’m lying. It would take much longer than five minutes. That’s the eternal optimism of programmers.”

Professor Owen Astrachan during 23 Feb 2004 lecture for CPS 108

Accusing open-source software of being a royal pain to use is not a new argument; it’s been said before, by those much more eloquent than I, and even by some who are highly sympathetic to the open-source movement. Why go over it again?

On Hacker News on Monday, I was amused to read some people saying that writing StackOverflow was hilariously easy—and proceeding to back up their claim by promising to clone it over July 4th weekend. Others chimed in, pointing to existing clones as a good starting point.

Let’s assume, for sake of argument, that you decide it’s okay to write your StackOverflow clone in ASP.NET MVC, and that I, after being hypnotized with a pocket watch and a small club to the head, have decided to hand you the StackOverflow source code, page by page, so you can retype it verbatim. We’ll also assume you type like me, at a cool 100 WPM (a smidge over eight characters per second), and unlike me, you make zero mistakes. StackOverflow’s *.cs, *.sql, *.css, *.js, and *.aspx files come to 2.3 MB. So merely typing the source code back into the computer will take you about eighty hours if you make zero mistakes.

Except, of course, you’re not doing that; you’re going to implement StackOverflow from scratch. So even assuming that it took you a mere ten times longer to design, type out, and debug your own implementation than it would take you to copy the real one, that already has you coding for several weeks straight—and I don’t know about you, but I am okay admitting I write new code considerably less than one tenth as fast as I copy existing code.

Well, okay, I hear you relent. So not the whole thing. But I can do most of it.

Okay, so what’s “most”? There’s simply asking and responding to questions—that part’s easy. Well, except you have to implement voting questions and answers up and down, and the questioner should be able to accept a single answer for each question. And you can’t let people upvote or accept their own answers, so you need to block that. And you need to make sure that users don’t upvote or downvote another user too many times in a certain amount of time, to prevent spambots. Probably going to have to implement a spam filter, too, come to think of it, even in the basic design, and you also need to support user icons, and you’re going to have to find a sanitizing HTML library you really trust and that interfaces well with Markdown (provided you do want to reuse that awesome editor StackOverflow has, of course). You’ll also need to purchase, design, or find widgets for all the controls, plus you need at least a basic administration interface so that moderators can moderate, and you’ll need to implement that scaling karma thing so that you give users steadily increasing power to do things as they go.

But if you do all that, you will be done.

Except…except, of course, for the full-text search, especially its appearance in the search-as-you-ask feature, which is kind of indispensable. And user bios, and having comments on answers, and having a main page that shows you important questions but that bubbles down steadily à la reddit. Plus you’ll totally need to implement bounties, and support multiple OpenID logins per user, and send out email notifications for pertinent events, and add a tagging system, and allow administrators to configure badges by a nice GUI. And you’ll need to show users’ karma history, upvotes, and downvotes. And the whole thing has to scale really well, since it could be slashdotted/reddited/StackOverflown at any moment.

But then! Then you’re done!

…right after you implement upgrades, internationalization, karma caps, a CSS design that makes your site not look like ass, AJAX versions of most of the above, and G-d knows what else that’s lurking just beneath the surface that you currently take for granted, but that will come to bite you when you start to do a real clone.

Tell me: which of those features do you feel you can cut and still have a compelling offering? Which ones go under “most” of the site, and which can you punt?

Developers think cloning a site like StackOverflow is easy for the same reason that open-source software remains such a horrible pain in the ass to use. When you put a developer in front of StackOverflow, they don’t really see StackOverflow. What they actually see is this:

create table QUESTION (ID identity primary key,
                       TITLE varchar(255), -- why do I know you thought 255?
                       BODY text,
                       UPVOTES integer not null default 0,
                       DOWNVOTES integer not null default 0,
                       USER integer references USER(ID));
create table RESPONSE (ID identity primary key,
                       BODY text,
                       UPVOTES integer not null default 0,
                       DOWNVOTES integer not null default 0,
                       QUESTION integer references QUESTION(ID))

If you then tell a developer to replicate StackOverflow, what goes into his head are the above two SQL tables and enough HTML to display them without formatting, and that really is completely doable in a weekend. The smarter ones will realize that they need to implement login and logout, and comments, and that the votes need to be tied to a user, but that’s still totally doable in a weekend; it’s just a couple more tables in a SQL back-end, and the HTML to show their contents. Use a framework like Django, and you even get basic users and comments for free.

But that’s not what StackOverflow is about. Regardless of what your feelings may be on StackOverflow in general, most visitors seem to agree that the user experience is smooth, from start to finish. They feel that they’re interacting with a polished product. Even if I didn’t know better, I would guess that very little of what actually makes StackOverflow a continuing success has to do with the database schema—and having had a chance to read through StackOverflow’s source code, I know how little really does. There is a tremendous amount of spit and polish that goes into making a major website highly usable. A developer, asked how hard something will be to clone, simply does not think about the polish, because the polish is incidental to the implementation.

That is why an open-source clone of StackOverflow will fail. Even if someone were to manage to implement most of StackOverflow “to spec,” there are some key areas that would trip them up. Badges, for example, if you’re targeting end-users, either need a GUI to configure rules, or smart developers to determine which badges are generic enough to go on all installs. What will actually happen is that the developers will bitch and moan about how you can’t implement a really comprehensive GUI for something like badges, and then bikeshed any proposals for standard badges so far into the ground that they’ll hit escape velocity coming out the other side. They’ll ultimately come up with the same solution that bug trackers like Roundup use for their workflow: the developers implement a generic mechanism by which anyone, truly anyone at all, who feels totally comfortable working with the system API in Python or PHP or whatever, can easily add their own customizations. And when PHP and Python are so easy to learn and so much more flexible than a GUI could ever be, why bother with anything else?

Likewise, the moderation and administration interfaces can be punted. If you’re an admin, you have access to the SQL server, so you can do anything really genuinely administrative-like that way. Moderators can get by with whatever django-admin and similar systems afford you, since, after all, few users are mods, and mods should understand how the sites work, dammit. And, certainly, none of StackOverflow’s interface failings will be rectified. Even if StackOverflow’s stupid requirement that you have to have and know how to use an OpenID (its worst failing) eventually gets fixed, I’m sure any open-source clones will rabidly follow it—just as GNOME and KDE for years slavishly copied off Windows, instead of trying to fix its most obvious flaws.

Developers may not care about these parts of the application, but end-users do, and take it into consideration when trying to decide what application to use. Much as a good software company wants to minimize its support costs by ensuring that its products are top-notch before shipping, so, too, savvy consumers want to ensure products are good before they purchase them so that they won’t have to call support. Open-source products fail hard here. Proprietary solutions, as a rule, do better.

That’s not to say that open-source doesn’t have its place. This blog runs on Apache, Django, PostgreSQL, and Linux. But let me tell you, configuring that stack is not for the faint of heart. PostgreSQL needs vacuuming configured on older versions, and, as of recent versions of Ubuntu and FreeBSD, still requires the user set up the first database cluster. MS SQL requires neither of those things. Apache…dear heavens, don’t even get me started on trying to explain to a novice user how to get virtual hosting, MovableType, a couple Django apps, and WordPress all running comfortably under a single install. Hell, just trying to explain the forking vs. threading variants of Apache to a technically astute non-developer can be a nightmare. IIS 7 and Apache with OS X Server’s very much closed-source GUI manager make setting up those same stacks vastly simpler. Django’s a great a product, but it’s nothing but infrastructure—exactly the thing that I happen to think open-source does do well, precisely because of the motivations that drive developers to contribute.

The next time you see an application you like, think very long and hard about all the user-oriented details that went into making it a pleasure to use, before decrying how you could trivially reimplement the entire damn thing in a weekend. Nine times out of ten, when you think an application was ridiculously easy to implement, you’re completely missing the user side of the story.

The One in Which I Say That Open-Source Software Sucks

These days, arguing that open-source software is crap seems dumb. How many websites are powered by a combination of MySQL, PHP, and Apache? How many IT applications, written in Eclipse, run on Java, using SWT widgets? How many design studios rely heavily on The GIMP and Inkscape for their everyday photo-retouching and page layout needs?

Er, wait. That last one. Doesn’t quite ring true. In fact, as good as most people seem to insist that Inkscape and The GIMP are, I’ve yet to see a major shop that ran on anything other than Adobe or Quark.

Okay, but, at least I can point to the many offices that run OpenOffice or KOffice! Or that have ditched FileMaker and Access for Kexi! Or that proudly rely on OpenGroupware for their scheduling needs!

…well. Except that I can’t.

I mean, yeah, there are some real businesses, here and there, that actually use those products, and some of them are successful. But, by and large, despite the success of open-source on the backend, open-source end-user applications have failed. In fact, when it comes to end-user applications that people other than open-source developers actually use, you’re pretty much limited to a single application: the web browser. (And even there, if you’re on Safari, then only your engine is open-source—and if you’re on IE or Opera, not even that.) And although I’m sure someone’s gonna say that open-source end-user apps are going to take over any day now, they’ve been claiming that since the height of the dot-com bubble, ten years ago. I wait with bated breath.

Why does open-source fail to reach critical mass anywhere but the server closet?

Easy: because open-source software is, incontrovertibly, a total usability clusterfuck.

Programmers are superb optimizers. We’ve been accused of being lazy in the best way possible: we try to ensure we do not solve problems that do not need to be solved, and that we solve those that we must deal with completely, so that they never bother us again.

On open-source projects, where anyone can quickly nail that one little bug that was ticking them off, that means that your software is gonna be lean (why implement what you won’t use?) and operate to spec (you don’t wanna keep dealing with that one annoying bug every day). So far, so good.

But what about using software? You only gotta learn software once—and, for those actually contributing on an open-source project, you probably learned from the bottom up, so that’s how you view the thing. Interface elements that expose quirks of the underlying implementation seem totally natural; what a user perceives as a bug strikes a developer as little more than the reflection of the underlying system.

Developers could fix this problem. They just completely lack motivation. Being “lazy” at this point means leaving the software as-is. If users find the software frustrating and unintuitive, or can’t get the thing installed, they should spend the time to learn the underlying, beautiful implementation, at which point they will discover a world of awesome and inspiring flexibility far greater than what closed-source offerings could possibly provide. And, until then, go bug the mailing list so that we can all call you an idiot.

What the developer-as-lazy argument misses is that companies, too, are lazy—and, if competently run, lazy in a good way. Employee time is money, and therefore a smart company will attempt to reduce how much time its employees must spend on a problem. If companies only employed developers, the end result would be the same as the open-source model. But they don’t. Companies also have support staff, and the amount of time support staff must spend fixing problems is directly proportional to how well-written and intuitive the software is.

Note that second part: any time that a user gets confused, and has to phone support to get through a problem, it costs the company money. The company is highly motivated to produce easy-to-understand and easy-to-use software to keep its support costs down. After all, at the end of the day, a user does not care how robust, or how elegant, or even how beautiful, the implementation of their software is; all they care about is whether they can use your video software to make glitter fly out their plastered boss’s butt in the Christmas party video.

So, if faced between spending time on the elegance of the implementation, or the intuitiveness of the interface, companies will optimize for intuition; open-source projects, for elegance of implementation.

In the general case, the open-source emphasis is wrong—at least, if you want your software to actually get used. All of the oddball exceptions in open-source usability—Firefox, Firefox 2, Firefox 3, and I guess GNOME, if you make twist my arm—have massive corporations backing them up, thinking in both development and support costs, and spending the time to make sure that users have a positive first-time experience with the software. Without exception, these products are only open-source because they, as a product, don’t actually confer much value to the parent company. Mozilla ultimately cares far less about whether you actually use Firefox than whether your Google queries list Mozilla as the referrer; Sun, at least in its drunk camel of a business plan, cared more whether you were running on Sun hardware than whether your desktop happened to run GNOME over KDE. Open-source, without corporate guidance, cares more about making sure you can tweak your cluster size to match the optimal expression of K-trees on BlenderFS on inverted Xeon cache vertibrates without affecting the MIPS port, than about ensuring that a new user has the faintest idea how to do anything with that package he just downloaded.

Maybe, someday, human altruism will make possible the Grand Dream of Open-Source, where projects are open-source, and well thought-out, and easy-to-use, and easy-to-install, and highly efficient, and bug free. Until then, open-source software is going to run great, but be painful to use, and closed-source software will be easy, but less efficient. Pick which you want for your own purposes. Just don’t forget to take a look around at how much Apple hardware you see these days to figure out which users actually value.

Zombie Operating Systems and ASP.NET MVC

In 1973, an operating system called CP/M was born. CP/M had no directories, and filenames were limited to 8.3 format. To support input and output from user programs, the pseudofiles COM1, COM2, COM3, COM4, LPT1, LPT2, CON, AUX, PRN, and NUL were provided.

In 1980, Seattle Computer Products decided to make a cheap, approximate clone of CP/M, called 86-DOS. 86-DOS therefore had no directories, supported 8.3 file names, and included the pseudofiles COM1, COM2, COM3, COM4, LPT1, LPT2, CON, AUX, PRN, and NUL. Further, because many programs always saved their files with a specific extension, any file with these names and an extension was treated as identical to the filename without the extension.

In 1981, Microsoft Corporation purchased the rights to use 86-DOS, renamed it MS-DOS, and shipped it to customers.

In 1983, Microsoft Corporation released MS-DOS 2.0. MS-DOS 2.0 supported hierarchical directories. To maintain backwards compatibility with applications designed for MS-DOS 1.0, which had no concept of directories, Microsoft placed the pseudofiles COM1, COM2, COM3, COM4, LPT1, LPT2, CON, AUX, PRN, and NUL, with all possible extensions, in all directories.

In 1988, Microsoft began the development of a modern, preemptively multitasked, memory-protected, multiuser system loosely based on VMS, called Windows NT. Windows NT supported a completely new file system design, called NTFS, modeled on OS/2’s HPFS, which allowed for arbitrary file names in Unicode UCS-2. To maintain backwards compatibility with DOS applications running under the new operating system, some of which were written before DOS had hierarchical directories, Windows NT placed the pseudofiles COM1-9, LPT1-9, CON, AUX, PRN, and NUL, with all possible extensions, in all directories. Windows NT shipped to customers in 1993.

In 1998, Microsoft released the Option Pack to the two-year-old Windows NT 4.0, containing a new technology called Active Server Pages, or ASP. ASP allowed the creation of dynamic websites via COM scripting. Because ASP was file-based, URLs by definition could not include /com1-9.asp, /lpt1-9.asp, /con.asp, /aux.asp, /prn.asp, or /nul.asp.

In 2002, Microsoft announced the imminent release of the .NET framework. One component of .NET was ASP.NET, a vast improvement over ASP. Although ASP.NET provided vastly superior ways to write web pages than those provided by ASP, the mechanism was still completely based on per-file web pages—and although this mechanism could be overridden by various means, for backwards compatibility reasons, ASP.NET always checked for the existence of a file first before attempting to execute a custom handler. Thus, web pages could not contain /com[1-9](\..*)?, /lpt[1-9](\..*)?, /con(\..*)?, /aux(\..*)?, /prn(\..*)?, or /nul(\..*)?.

In 2009, Microsoft released ASP.NET MVC, a thoroughly modern, orthogonal web framework supporting the most up-to-date understanding of how to architect well-factored, scalable web applications. ASP.NET MVC broke free of the per-file emphasis of previous frameworks, instead emphasizing regex-based URL dispatching, similar to popular scripting frameworks such as Django, Rails, and Catalyst. These URLs did not map to files; they instead mapped to objects that were designed to handle the request. Thus, URLs could be built entirely on what made logical sense, rather than on the confines of what the file system dictated.

But ASP.NET MVC was based on ASP.NET. Which checks for the existence of a file before running any scripts. Which means it will check directories that have COM1-9, LPT1-9, CON, AUX, PRN, and NUL in them, with any extension.

And that is why, in 2009, when developing in Microsoft .NET 3.5 for ASP.NET MVC 1.0 on a Windows 7 system, you cannot include /com\d(\..*)?, /lpt\d(\..*)?, /con(\..*)?, /aux(\..*)?, /prn(\..*)?, or /nul(\..*)? in any of your routes.

Introducing FogBugzMiddleware

While I alternate between loving it and hating it, the web framework I use the most is Django. By default, when your deployed Django application encounters a 500 error (and, optionally, an internal 404 error), it sends an email to everyone in the ADMINS sections of your settings.py. Since I don’t like using my email client as my bug tracker, and I take advantage of FogBugz Student and Startup Edition to get a free and awesome bug tracker, I had long ago told Django to take advantage of FogBugz’ submit-via-email feature to simply send the email to FogBugz. This would be a fine system, save for two flaws:

  1. Bugs that come into the system this way are marked as Inquiries, not Bugs;
  2. FogBugz’ email submission is designed for humans, not machines, so it attempts to send back a reply letting you know what case it opened—which doesn’t go so well if the reply to is, oh, say, do-not-reply@bitquabit.com. This results in a second case being opened telling me that the response email could not be sent.

The solution: Django already allows for extensible “middleware,” which allows you to inject custom filters into Django’s response stack, so I wrote FogBugzMiddleware, which allows you to make your application directly file bugs in FogBugz, rather than spewing emails all over the place. It’s already running successfully both on this blog, and on internal Django applications at Fog Creek. If you use both Django and FogBugz, give it a whirl—and, of course, feel free to file a bug if you find any, or request a feature I didn’t think of. And, as always, patches welcome.

Your Language Features Are My Libraries

At lunch today, we had a lengthy discussion about the merits of various .NET technologies, including LINQ. I complained that, although I like LINQ, and I like C#, I cannot help but be annoyed that many of these “cool” language features in C# are nothing more than libraries in Smalltalk. Joel responded that the comparison wasn’t fair; C# allows you to invent more syntax than those languages permit, and afford greater flexibility.

Without disagreeing, let’s visit some specific examples. The code is written for GNU Smalltalk and written slightly Algol-y on purpose, to make it easier for non-Smalltalkers to follow. In other words: if you program Smalltalk, don’t bother telling me I’m eschewing a more functional idiom for a more explicit one. I know.

The ?? operator

One C# feature that Joel specifically mentioned was the ?? operator. In C#, this operator returns the expression to the left of the operator if it is non-nil, or that to the right if it is.

It turns out that you can add this exact operator to Smalltalk trivially. In Smalltalk, everything is an object—including nil, which is the singleton instance of UndefinedObject, which is a subclass of Object. Thus, to add ?? to the language, we just have to add a ?? method to the Object class:

Object extend [
    ?? default [
        ^ self isNil ifTrue: [ default ] ifFalse: [ self ].
    ]
]

Let’s test it:

st> 5 ?? 0.
5
st> nil ?? 0.
0
st>

Perfect.

Well, almost. Just like the ? operator, the right-side of the ?? operator in C# only gets executed if the left side is null. Ours executes the right side greedily.

Thankfully, it turns out that we can fix that easily, too. In Smalltalk, whenever you wish to indicate that something may be executed later, you stick it into a lambda, which is simply code enclosed by brackets. Lambdas can be evaluated by calling value. Thus, although we need to make our method slightly more complicated:

Object extend [
    ?? default [
        ^ self isNil
            ifTrue: [
                (default respondsTo: #value) ifTrue: [default value] ifFalse: [default]
            ]
            ifFalse: [ self ].
    ]
]

…we can have our cake and eat it too, using the simple syntax when it’s okay to be greedy, and the more complex one elsewhere:

st> 'foobar' ?? 'cow'.
'foobar'
st> 'foobar' ?? [ Object anUndefinedMethod ]
'foobar'
st>

Most Smalltalks actually already have this operator, calling it ifNil:. But the point is to show that adding language features like this is easy, and can be done by the programmer—not the compiler designer.

Message-eating nil

Of course, we can one-up that behavior. In some languages—notably, Objective-C—trying to call methods on nil does not generate an error. Instead, nil silently eats the message and returns nil. Although you may gasp and say that’s the dumbest thing you’ve ever heard of, the overwhelming majority of Cocoa programmers would agree that message-eating nil is a great feature that ends up being far more useful than harmful. Sadly, Smalltalk’s nil raises an exception, just like C#’s and Java’s—a behavior that, admittedly, does have its uses. Wouldn’t it be nice if you could choose which behavior we wanted in which context, without having to rewrite tons of code?

Turns out that you can do that via a library in Smalltalk. Nevin Pratt has already written a library called “Message-eating null”, currently maintained by Keith Hodges, that does exactly that. How can it work?

The library defines a new class called Null, and a new singleton, null, to parallel UndefinedObject and nil. Like UndefinedObject, Null instances respond true when asked isNil, making them appear identical to most code. Unlike UndefinedObject, Null simply returns itself when sent pretty much anything else.

Now, I have the best of both worlds: if I want message-eating, I return null. If I want exception-raising, I return nil. No other code has to be any the wiser. That’s something you cannot do in C#.

LINQ

Smalltalk has actually had LINQ-like expressions since its inception. Code like the following is common:

((people
    select:  [ :person | person type = #adult ])
    reject:  [ :person | person location = 'New York' ])

which is roughly equivalent to the following LINQ:

from person in people
where person.Type = PersonType.Adult && person.Location != "New York"
select person

Unlike with LINQ, if these operators did not exist in Smalltalk, it would be trivial to add them. In fact, their implementation in Smalltalk is at the library level, not the language level. Here’s the code to select: in GNU Smalltalk, if you’re curious:

select: aBlock [
    "Answer a new instance of a Collection containing all the elements
     in the receiver which, when passed to aBlock, answer true"

    <category: 'enumeration'>
    | newCollection |
    newCollection := self copyEmpty.
    self 
        do: [:element | (aBlock value: element) ifTrue: [newCollection add: element]].
    ^newCollection
]

I imagine your response at this point is, roughly, Big whoop. So you had LINQ before C# did. It has it now, and, honestly, I’m never going to want anything other than select:, reject:, collect:, inject:into:, and a couple of others. Why do I care?

Easy: because although Smalltalk was never designed to support relational databases easily by this syntax—a key feature of LINQ—it does. Via libraries.

LINQ to SQL

Because Smalltalk needs no special syntax to support comprehensions, it’s trivial to add them to other libraries, such as ORMs. One such library, ROE, which ships with GNU Smalltalk, allows you to write the above query against a database as follows:

((people
    select:  [ :person | person type = #adult ])
    reject:  [ :person | person location = 'New York' ])

Like LINQ to SQL, this code returns a promise—not an actual result. It’s not reified until you attempt to access some elements, at which point the actual SQL is generated and sent to the database for execution. We achieved these lazily returned, database-backed collections without modifying the language at all; we merely needed to write a small library.

Of course, databases are different from in-memory collections. The query generated from the above code would return the full tuple for every row in the database—potentially very expensive if the table backed by people has many columns. ROE provides an easy solution via the project: and projectAll: methods, which take one and multiple column names, respectively, to return from the database. Thus, if we only cared about first names, we might write:

((people
    select:  [ :person | person type = #adult ])
    reject:  [ :person | person location = 'New York' ])
    project: #firstName

Of course, the default collections don’t support project:—it doesn’t make much sense there, since, as in Java, you’re really only getting a collection of pointers to the objects in the collection, which is a cheap operation—but what if we wanted to add it there for consistency? No problem:

Collection extend [
    project: aField [
        ^ self collect: [ :each | each perform: aField ].
    ]

    projectAll: fields [
        | d |
        ^ self collect: [ :each |
            d := Dictionary new.
            fields do: [ :field | d at: field put: (each perform: field) ].
            d.
        ]
    ]
]

And, suddenly, every collection that subclasses Collection—that is, all of them—can use my new project: method. I don’t need to worry about whether I’m talking to a database or a local collection; they all can do it.

Your language features are my libraries

I guess this is why, despite all of its faults, I have a lot of trouble giving up on Smalltalk. C#’s a good language, and .NET’s a good framework; but I cannot help but feel that this isn’t an issue of reinventing the wheel, as much as forgetting that we can provide programmers the tools to make their own types of locomotion.

When you hear Smalltalkers complaining that it’s all been done before, this is what they mean. Not that any particular language feature already exists, but rather, that they could have written it without switching or upgrading their language.

When to Use Seaside Components

Although, for the past couple of years, I’ve largely stopped doing development in Squeak, I’ve still kept an eye on the Smalltalk community, including looking for Smalltalk questions on StackOverflow. It’s a good way to learn, keep up-to-date, and occasionally earn some easy reputation.

Today, I saw that Julian Fitzell, one of the original authors of the Seaside web framework, had posted an outstanding answer explaining when and how to use components in Seaside. If you’re just learning Seaside, his post will serve as an informative guide for how Seaside really works in its guts. I’m especially intrigued by the new “lightweight” component functionality coming in Seaside 2.9. If I ever go back to hosting my blog on Seaside or Pier, that will make it easy to have a clean design that remains highly efficient.

FogBugz 7 Goes Beta

I’m thrilled to say that, after two years of heavy development, my colleagues have released FogBugz 7 Beta. If you’re interested in joining, head on over to the signup page to request to have your license or FogBugz On Demand account upgraded. The existing beta already adds a heavily revamped interface and tons of new features, but I have a hunch that there may be even bigger surprises in store. Keep an eye on the FogBugz team in the weeks to come.

Rebasing in Mercurial

So, you’ve adopted git. Or maybe Mercurial. And you’ve started to use it. The problem is, every other commit reads some variant of, “Merged with stable,” “merged with base,” or some such. These commits convey no useful information, yet completely clutter your commit history. How do you make your history more useful?

The answer lies in rebasing. Rebasing is a technique made popular by git where you rewrite your not-yet-pushed patches so that they apply against the current remote tip, rather than against the tip of the repository you happened to last pull. The benefit is that your merge history shows useful merges—merges between major branches—rather than simply every single merge you did with the upstream repository.

Talking to a colleague of mine today about the upcoming features in Mercurial 1.1, I was surprised to hear him say, “Oh, great, so Mercurial will finally have rebasing!” Well, while the next version of Mercurial does add a pull --rebase command, all it does is automate some mq bookkeeping behind the scenes. If you want, you can do that mq bookkeeping explicitly right now.

Note: the following will work in modern versions of Mercurial, but using hg rebase and its sister command hg pull --rebase is a much, much easier way to do what the following describes, and has been available in Mercurial for over a year at this point. Before learning MQ, see whether your version of Mercurial has a rebase command by running hg help rebase. If it does, you’re probably better off using that.

Let’s take a look at my Copilot development repository, for example:

gozer:Host benjamin$ hg out
comparing with ssh://fcs//hg/copilot-helpers-devel
searching for changes
changeset:   1937:fe42e5e8e0cf
user:        Benjamin Pollack <bpollack@fogcreek.com>
date:        Mon Nov 24 14:49:30 2008 -0500
summary:     Make CPUpgrader a proper framework so that OneClick can include it

changeset:   1938:20033e199dc7
user:        Benjamin Pollack <bpollack@fogcreek.com>
date:        Mon Nov 24 20:08:06 2008 -0500
summary:     Undo accidental rename of Host to OneClick

changeset:   1939:127619345a57
tag:         tip
user:        Benjamin Pollack <bpollack@fogcreek.com>
date:        Mon Nov 24 20:12:39 2008 -0500
summary:     Add OneClick

gozer:Host benjamin$ hg in stable
comparing with ssh://fcs//hg/copilot-helpers-stable
searching for changes
changeset:   1940:6db7e59a82c5
tag:         tip
parent:      1936:3140f971d7ab
user:        Benjamin Pollack <bpollack@fogcreek.com>
date:        Mon Nov 24 12:38:14 2008 -0500
summary:     Added tag 000253 for changeset 3140f971d7ab

gozer:Host benjamin$

So, I’ve got three outgoing changes, and one incoming change—and the incoming change is just a tag addition. Having an explicit merge is overkill and simply clutters my history; there’s no benefit to having it listed. It’s the perfect candidate for a rebase commit.

Our battle plan, then, is as follows:

  1. Import all outgoing patches to mq patches
  2. Pop them off the patch stack
  3. hg pull from the stable repository
  4. Reapply my outgoing patches
  5. Convert my patches back into Mercurial changesets

Let’s give it a shot. My base commit is changeset fe42e5e8e0cf, so I need to import everything between that and tip into mq and then pop it off:

gozer:Host benjamin$ hg qimport -r fe42:tip
gozer:Host benjamin$ hg qpop -a
Patch queue now empty
gozer:Host benjamin$

If everything was successful, I should have no outbound patches…

gozer:Host benjamin$ hg out
comparing with ssh://fcs//hg/copilot-helpers-devel
searching for changes
no changes found
gozer:Host benjamin$

…and we indeed see that’s the case. That means I can now simply hg pull -u from the remote repository…

gozer:Host benjamin$ hg pull -u stable
pulling from ssh://fcs//hg/copilot-helpers-stable
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
gozer:Host benjamin$

…and then reapply my patches, and convert them back to changesets:

gozer:Host benjamin$ hg qpush -a
applying 1937.diff
applying 1938.diff
applying 1939.diff
gozer:Host benjamin$ hg qdelete -r qbase:qtip
gozer:Host benjamin$

And finally, verify that it worked as expected:

gozer:Host benjamin$ hg glog -l 4
@  changeset:   1940:0b08e0a83965
|  tag:         tip
|  user:        Benjamin Pollack <bpollack@fogcreek.com>
|  date:        Mon Nov 24 20:12:39 2008 -0500
|  summary:     Add OneClick
|
o  changeset:   1939:a730a92acb01
|  user:        Benjamin Pollack <bpollack@fogcreek.com>
|  date:        Mon Nov 24 20:08:06 2008 -0500
|  summary:     Undo accidental rename of Host to OneClick
|
o  changeset:   1938:01954591e76e
|  user:        Benjamin Pollack <bpollack@fogcreek.com>
|  date:        Mon Nov 24 14:49:30 2008 -0500
|  summary:     Make CPUpgrader a proper framework so that OneClick can include it
|
o  changeset:   1937:6db7e59a82c5
|  user:        Benjamin Pollack <bpollack@fogcreek.com>
|  date:        Mon Nov 24 12:38:14 2008 -0500
|  summary:     Added tag 000253 for changeset 3140f971d7ab
|
gozer:Host benjamin$

Perfect. I have a linear history, and have rebased my changes on top of the remote tip.

It’s definitely more involved than hg pull --rebase, but if you don’t want to wait for Mercurial 1.1, you’ve got the same functionality available right now.

Why QA is Important

QA’s one of the most important parts of any software development project, so I’m profoundly saddened by how often it’s overlooked. Thankfully, Copilot has a solid QA process, so when we release software, I can generally be confident that it will work as-advertised.

Tricks with mq

Last night, I realized that I had accidentally made some commits to one of my Mercurial repositories without an email address set, meaning that they all showed up as benjamin@localhost. Since the repository’s private, and I wasn’t worried about destructively rewriting history, I wondered, can I fix this using mq?

The answer was a resounding, “yes.”

First, let’s bring the entire history into mq, and save the result so that we have something to go back to in case we screw up:

~/code/bqb$ hg qimport -r 0:tip
~/code/bqb$ hg qci -m "Before address change"

Next, because we’re about to rewrite the patches themselves, we’ll pop off all of the patches:

~/code/bqb$ hg qpop -a

at which point we can safely go and make our changes with some simple sed magic.

~/code/bqb$ cd .hg/patches
~/code/bqb/.hg/patches$ sed -i -e "s/@localhost/@bitquabit.com/" *.diff
~/code/bqb/.hg/patches$ hg diff

Everything looks okay, so it’s time to reapply the patches:

~/code/bqb/.hg/patches$ cd ../..
~/code/bqb$ hg qpush -a

and convert them back into changesets.

~/code/bqb$ hg qdelete -r qbase:qtip

Did it work?

~/code/bqb$ hg log | grep '@localhost'
~/code/bqb$

Yep. We’re all set.

A couple of things to keep in mind if you do this:

  1. Just to state the obvious, you’re rewriting history. You will no longer be able to interact with repositories made before the rewrite. This means you should never try the above stunt on a repository you’ve shared with others. (Well, mostly. If you only run the initial qimport on changes you’ve not yet pushed out—rather than on the entire repository, as I did—you’re fine. Just be careful.)
  2. Make very sure you’ve set Mercurial to use git-style patches, or you will unwittingly annihilate blank and binary files when you convert your changesets to patches. If you’ve not yet done so, edit your ~/.hgrc so that -g is passed by default to qimport and qrefresh.
  3. Remember that whenever working with mq, you have the potential to completely lose your history. Be careful. Use hg qci to back up known-good patch states early and often.

Why I Love Meetings

Learn why I love meetings over at the Copilot blog.

Guide to Deploying Seaside on Linux

While I’ve gotten pretty good at deploying Squeak and Seaside behind Apache, I remember that the first time I tried it, I got horribly confused and frustrated by the lack of any simple, easy-to-follow guide. Well, if your virtual host runs a Debian-based distro, now you’ve got one: Peter Osburg has provided a step-by-step guide to deploying Seaside apps behind Apache using reverse proxies. The directions should be mostly applicable to other Linux distributions as well.

Free Books on Programming Linux

When I was a junior at Duke, we had to make some changes to the Linux kernel for my OS class. Unfortunately, I was the only one who’d had much experience in Linux before, and our professor seemed to think that providing any instruction about how Linux’ kernel worked—or even how Unix programs in general worked—would be overkill. Even great coders will fail if not provided any guidance how to solve a problem.

At the time, our options were to spend serious cash on books, or to struggle along through mailing lists until we can find the answers we needed. We went with the second option, but if you find yourself in a similar boat, there are now decent low-level books on Linux programming available online for free. I only had a chance to look at the chapter on interprocess communication, but assuming the rest of the book’s at the same quality, it should be a solid reference for someone starting Linux programming for the first time.

Announcing Fog Creek Copilot OneClick

I’m proud to announce that we’ve just shipped Fog Creek Copilot OneClick. You can read about the details at the new Fog Creek Copilot blog, Air Traffic

Debugging objc_msgSend

Hamster Emporium has a great article on how to debug crashes within objc_msgSend. When I first learning Cocoa to write Fog Creek Copilot for Macintosh, I remember my first crash within objc_msgSend leaving me frustrated, as I had absolutely no idea how to proceed. I wish this article had been available then.

Congratulations, Microsoft!

You have successfully discovered Morphic.

Why, this doesn't resemble the patent at all!

Can you discover Smalltalk too, as long as you’re at it?

Objective-J and Cappuccino Released As Open-Source

When 280slides was released several months ago, it was notable in several ways. It looked like a native OS X app despite running in the browser, yet remained relatively responsive, worked quite well across browsers, and gracefully fell back to Flash when running on browsers from the Pacific northwest. The designers said that their secret sauce was Objective-J—an Objective-C–like language that compiled to JavaScript—and Cappuccino, a Cocoa-like framework that let them treat a web browswer as just another desktop platform. Unfortunately, since both technologies were developed in-house, the rest of us had to just look on and keep playing with their old toys.

Well, wait no longer. Cappuccino and Objective-J are now available under the LGPL, and you can download them today. If you’re planning on developing an application-like website soon, Cappuccino is probably worth a look. I’m not going to have a chance to play with it for a few more days because my hair is on fire, but for those of you with a bit more freetime, it’s definitely worth a look.

User Agent Abuse

I was just playing with Google Chrome on a devel version of Fog Creek Copilot‘s website when I happened to glance down at the watch window and saw this atrocity as the user agent string:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13

Guys, that’s not even funny. It’s just regex bingo.

An Overview of Python's Underscore Methods

I’ve never been an especially big fan of Python—faster but less powerful than Ruby, slower and less powerful than Smalltalk and Common Lisp, and not as usefully grungy as Perl—but I’ve become a rather strong pragmatist vis-à-vis programming languages, and realize that Python probably isn’t going anywhere anytime soon, so I’ve been trying to improve my proficiency through activities such as Django Dash (which Tyler and I won) and small coding projects, such as the one-off tasks I have to do at work. It’s a technique that’s served me very well in the past to learn new languages and frameworks.

One of the reasons I’m not a particularly big fan of Python is the pervasive use of underscore methods—methods named __like_this__—to accomplish things that Python’s bondage-friendly design wouldn’t otherwise let you do (e.g., operator overloading, destructors, and functors). To that end, I was happy to discover a concise list of Python’s underscore methods and what they do. If you want a quick overview of what can be accomplished through underscore magic, that page should serve you very well.

The End of MySQL (Updated)

Sun has just announced that they will begin close-sourcing MySQL. For years, I’ve avoided MySQL due to a mixture of paranoia (I’ve had extremely bad experiences with MyISAM-backed data stores) and disdain for their shoddy standards compliance (which has bitten me before in nontrivial ways). Now I can also avoid them for not being open-source.

My standardization on PostgreSQL for this website feels more rational by the minute.

Update: The originally linked article wasn’t quite correct. MySQL AB’s CEO explains that they will not be making the MySQL core closed-source; merely new, enterprise-specific features. You can read his whole statement for more information. This position is more reasonable, and is similar to the relationship ElephantDB has with PostgreSQL.

The Return of Ada

I’m somehow having a really hard time feeling anything but dread about the prospect of a return of Ada. It seems to bear more than a passing resemblance to the last few seconds of Carrie

Google App Engine now on Amazon EC2

One of my main complaints about Google App Engine is that it locks you into using Google’s servers and APIs, giving you little recourse if Google decides to terminate either the service or your contract. Well, good news: that’s changed—somewhat. Chris Anderson of Grabb.it fame has gotten a proof-of-concept App Engine clone running on Amazon’s EC2 service. Because Anderson has done little more than repackage the App Engine SDK for deployment on EC2, it cannot scale the same way that Google-based hosting (or properly written EC2-hosted apps) can, but it’s a good first step towards decreasing App Engine’s lock-in. If they’re able to layer BigTable on top of SimpleDB or an equivalent system, I could well see App Engine becoming the de facto high-traffic web application architecture.

A Poor Man's Time Machine

One of the cool new features of Mac OS X Leopard is Time Machine, a really simple backup solution for Mac OS X that not only transparently backs up your data, but also does so with an amazingly ugly GUI that lets you quickly jump back to the way that your documents were at any given point in the past. Unfortunately, Time Machine doesn’t run on my Linux boxes, so I’m forced to come up with an alternative.

The good news is that getting a 90% solution is ridiculously easy. On the back-end, all that Time Machine does is create a collection of hard links from one backup set to another. Here’s the bare minimum of a shell script that will back up the last three editions of your home folder to an external drive:

#!/bin/sh
SOURCE=/home/benjamin
DEST=/mnt/tardis
rm -fvr "${DEST}.3"
mv "${DEST}.2" "${DEST}.3"
mv "${DEST}.1" "${DEST}.2"
mv "${DEST}.0" "${DEST}.1"
rsync -av --exclude-from='backup-excludes' --link-dest="${DEST}.1" "${SOURCE}/" "${DEST}.0/"

You’ll obviously need to modify the SOURCE and DEST variables to be something appropriate for your computer. You’ll also need to create a file in your home directory called backup-excludes that’ll look something like this:

Documents/Code/3rd-party
Documents/Code/Builds
.emacs-backups

Add and modify the glob patterns so that it contains a list of things that should be skipped. Now, just make sure your external drive is plugged in, run the script, and presto! Instant backup. Quick and dirty, but gets the job done.

Write Your Own Google Apps

Google has announced Google App Engine, which allows you to write applications that run on Google’s infrastructure. I personally would be a bit nervous using the service for anything important—unlike Amazon S3 and EC2, making your own service-compatible alternatives is not feasible, making you inextricably tied to Google—but I suppose it could be useful for certain applications that need high scalability, integration with other Google applications, or both.

Note that all 10,000 initial invitations have already been distributed, but you can sign up to be notified when more become available.

Mercurial v. git

RockStarProgrammer has a great article on the differences between Mercurial and git at a technical level. This is exactly the article I wanted back when I was trying to pick between the various distributed version control systems.

JavaScript for Emacs

Steve Yegge has been working for quite some time on a way to integrate JavaScript with Emacs, allowing you to code extensions in JS rather than elisp. To that end, he recently announced the availability of js2-mode, a new mode for Emacs that brings heavily revamped syntax highlighting, indentation, and on-the-fly parsing for JavaScript code.

I’ve recently been having an internal fight with myself on whether to use vim or Emacs. Although I’ve used Emacs for an incredibly long time, have a massive collection of customization scripts, and know the editor inside and out, I’m being forced to come to the conclusion that vi-based editors may have the upper edge for actually editing text. They’re simply kinder to my hands. Yet every time that I decide to give up vim for good, someone writes a tool like js2-mode that simply cannot be implemented in vim.

Maybe someday vim (or a clone) will become as trivial to modify as Emacs. Until then, I suspect I’m going to have a very hard time letting bad habits die.

Pilots, Programmers, and Perl

I don’t normally post chat conversations, but some are simply too good to ignore:

Me: So I’m getting back into Perl.

Devin: heh

Devin: does your boss know?

Me: ?

Me: I don’t get why I’d need to notify Joel

Devin: well, you know

Devin: it’s like when a pilot starts drinking again

Devin: the airline should be notified

I think Devin may have a point.

Hating C++

I’m no fan of C++, but I look like a C++ evangelist compared to this poor chap, who assails C++ with a vengeance and eloquence that I have rarely seen. A choice excerpt:

C++ is philosophically and cognitively unsound as it forces a violation of all known epistemological processes on the programmer. as a language, it requires you to specify in great detail what you do not know in order to obtain the experience necessary to learn it. C++ has taken premature optimization to the level of divine edict since it cannot be vague in the way the state of the system necessarily is. (I’m not talking about totally vague, but about the kind of vague details that programmers are supposed to figure out even after a good design has been drawn up.) in other words, a C++ programmer is required by language design to express certainty where there cannot be any. a C++ programmer who cares about correctness is a contradiction in terms: correctness is a function of acquired certainty, not random guesswork. I’m discounting the ability to experiment with something through educated guesses, because the number of issues that must be experimented with in this fashion is gargantuan in any non-trivial piece of code. C++ is a language strongly optimized for liars and people who go by guesswork and ignorance. I cannot live with this. I especially cannot live with it when I cannot even ask for help in maintaining the uncertainty I have. e.g., if I do not have enough foreknowledge to know exactly which type I need for a variable (and there is no type hierarchy, so I cannot be vague), I must guess, which is bad enough, but I have to repeat the guess all over! I cannot say “this thing here has the same type as that thing over there, and I’d rather you go look there because I just want to put down my guesses once” — there is no typeof operator and it certainly cannot be used in those anal-retentive declarations to afford a limited form of type propagation.

If you’re still a fan of C++, and have yet to adopt a decent language, maybe such a well-written, passionate rant will change your mind.

Grabbing Selected Songs from an iPod

Today, I was over at a friend’s house and got sidetracked talking about music we liked. I mentioned that I’d recently discovered Jonathan Coulton and really liked his music and played her a few songs of his. She liked them and asked whether she could have a copy. Since his songs are all licensed under the Creative Commons, that was no problem.

Unfortunately, the only copy of the songs that I had were on my iPod. As everyone knows by now, Apple makes it very difficult to copy songs off an iPod due to piracy concerns. On a Mac, at least, it’s easy to see the song files—they’re just under /Volumes/[iPod name]—but because their names are completely randomized, it can be hard to find any single song. Since ID3 tags are preserved, you can traverse the entire file hierarchy, scan the tags, and copy those files that match, but that solution doesn’t work either if all your files aren’t MP3s or if you have a prohibitively large number of files. In this case, I needed to extract about 40 songs out of several thousand, so I wanted something that would work faster.

The good news is that this problem is blissfully easy to solve with a little shell script and the magical tool lsof. Simply get iTunes displaying all the songs you want to copy, either with a search or a playlist, hit “Select All,” and then execute the following script:

#!/bin/sh

COUNT=`osascript <<EOF
set num to 0
tell application "iTunes"
    repeat with song in selection
        set num to num + 1
    end repeat
end tell
return num
EOF`

mkdir ~/copies
for ((n=0;n&lt;$COUNT;n+=1)); do
    lsof | grep iTunes | grep mp3 | awk '{print $9}' \
         | xargs -J % cp % ~/copies
    osascript -e 'tell application "iTunes" to next track'
 done

Most of the script is self-explanatory, but some notes on the more interesting parts:

  1. AppleScript supports a count element on most collections, but for whatever reason, count of selection is always 0. I still have a hunch this is a PEBKAC issue on my end, but I don’t mind kluging around it
  2. Bash actually supports C-style looping, which for some reason seems to catch a lot of people by surprise.
  3. lsof is an incredibly useful command. It stands for “list open files,” and lists all files (and sockets) that are open on your machine, along with which program opened them. We filter down to MP3 files opened by iTunes.
  4. awk has largely been replaced by more powerful scripting languages, yet it still has its uses—here, serving as a quicker and more powerful cut to extract the ninth column of whitespace-delimited text, which happens to be the full path to the open file.
  5. BSD’s xargs command allows you to specify an explicit marker for inserting text with the -J option, which we use here so that we can manipulate what’s being copied while holding the destination constant.

So, basically, we just make iTunes play each song long enough to find where on the iPod it’s located, copy it to a safe location, and then tell iTunes to advance to the next track. It’s not especially fast, but it’s far faster than walking the entire iPod looking for the relevant songs. On my system, it took iTunes about twenty seconds to run for forty songs, which is much better than the alternative.

There are relatively few times I honestly find myself wanting to copy songs from my iPod, but the next time you find yourself in a similar situation, you can use the script above to get your copies done quickly.

Boldly Going Where We've Gone Before

I empathize with Smalltalkers and Lispers who are in a perpetual state of been-there-done-that. Tons of “new” technologies (on-the-fly code reloading, edit-and-continue debugging, refactoring, and anonymous functions, among others) have been available in Smalltalk-80 since its inception (and frequently in Common Lisp’s predecessors and peers since before that).

That said, when I read C# developers lamenting that .NET 3.5 is only a bad imitation of Smalltalk-80, I have a slightly different reaction than they. Yes, I wish I could program in Smalltalk (or its successors, such as Self more often, and yes, I wish Smalltalk had wider adoption, but, at this point in my life, I also recognize that neither Smalltalk nor Common Lisp will ever be a mainstream language. Although I may scream “But we already had this!” from time to time, in my gut, I know that our last best hope is to see existing mainstream languages steal as much as possible from Smalltalk and Common Lisp. To that end, I’m actually excited about C# 3, which I feel adopts more Lisp- and Smalltalk-like features than any other mainstream language I’ve yet seen. Is it perfect? Is it really on par with Smalltalk or Common Lisp? No. But in the end, I’d much rather have 90% of Smalltalk available everywhere than 100% of Smalltalk available only in a few nooks. This is a step in the right direction.

Sorcery with iptables

I normally don’t link articles I see on reddit on the theory that you’ll already have seen them, but this one was too good to pass up. New Artisans has a superb article on iptable tricks to defend against common attacks, and even provides some hands-on examples of what they’re defending against. If you administer your own server, I strongly recommend taking a look.

(Note: try the “attacks” they show only within your own LAN. A competent ISP may notice what you’re doing and shut you down if you try some of the simpler things they talk about. Besides, it’s just plain not nice.)

rm -rf /var/www/* ... wait, which server am I logged into?

Unix needs an undo command.

This morning, my roommate and I hauled out some of our “big iron” (a languishing Pentium 4 box) to use as a photo server. Because we had initially planned to use that box to host bitquabit.com and its sister sites—a plan since scrapped—it had a full clone of all the data on my Linode hub. Before my roommate got going, then, I thought I’d quickly clean the box and return it to a neutral state. First stop, hose the duplicates of the websites I host. Fire up SSH, sidle into /var/www, double-check with pwd I’m where I think I am, fire off an rm -rf *, and check that the directory’s clean. It was, so I decided to write a message to my roommate. Since I couldn’t remember what his Unix login was, and I knew he was logged on, I ran the w command.

benjamin pts/0    fcfwbeac.fogcree Fri18    2days  1.20s  0.08s citadel
benjamin pts/1    192-168-40-51.c3 21:26    1.00s  0.06s  0.01s w

I don’t remember even installing Citadel on this machine, I thought. And why am I the only one on the server? As very, very dark thoughts started to wander out of my amygdala and set up shop in my frontal lobes, I decided to check that w wasn’t malfunctioning.

benjamin@bitquabit ~> w
9:43:23 up 18 days,  4:29,  2 users,  load average: 0.00, 0.04, 0.01
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
benjamin pts/0    fcfwbeac.fogcree Fri18    2days  1.20s  0.08s citadel
benjamin pts/1    192-168-40-51.c3 21:26    1.00s  0.21s  0.00s w
benjamin@bitquabit ~>

I wonder why my roommate’s not showing up, I wondered to myself. Maybe if I run it one more time…

And then it hit me. benjamin@bitquabit. benjamin@bitquabit. Oh frak, I hosed my production server!

But no, wait. I’d set up the spare box to be a bitquabit clone; it makes sense it’d think it was bitquabit. So I dropped out of ssh and read back .ssh/config:

benjamin@bitquabit ~> exit

Good bye
Connection to bitquabit.com closed.
Mungus:~ benjamin$ cat .ssh/config
Host vera
HostName 192.168.1.10

Host bqb
HostName bitquabit.com
Mungus:~ benjamin$

…frak. Sure enough, I hosed all the websites on my production server. Oh, what jolly day.

There were two good things: first, although backup on my server wasn’t automated, I had written a backup script, and it had everything except four images from a very recent article and my hit tracking system, Mint. Second, I got motivated to actually automate my backup system.

Now, every day, at three AM, the server makes a tarball of all relevant data and throws it into a special backup directory. A launchd-powered script on my Mac grabs the tarball daily and puts it in a place on my machine where Mozy can find it. The end result is a system I’m pretty happy with. I shouldn’t lose data that way again.

All that said, it seems to me like this situation shouldn’t even be possible nowadays. I understand that I should’ve been more careful; I won’t argue that. But…shouldn’t Unix have an undo command? I know for a fact that I’m hardly the only person to have hosed data by not paying enough attention while doing sysadmin tasks. Indeed, it’s regarded as a right-of-passage for system administrators, and focusing on the fact that this is the first time I’ve ever done something nearly so stupid on a production system makes me happy. But, still: on a Mac, or even on Windows, I have undo all over the place. I can’t think of any command on Unix that has undo. Isn’t it about time that started changing?

A REPL for...C?

I’ve talked before about the value of a good REPL (scroll down to “The REPL in .NET”). Unfortunately, the programming language I write the most code in, C, lacks one.

Or at least, it used to.

The aptly named C REPL provides a REPL for C. Their trick: compile a DLL for each line of code, then load it into a new process. Presto: instant, portable interactive C.

Over-Securing WordPress

I’m generally quite paranoid when it comes to server security—doubly so because I’m no guru at it—so I tend to take a shotgun approach. The virtual server running bit qua bit has a restrictive firewall setup, has root disabled, only allows secure IMAP/POP/SMTP, disallows password login through SSH, and mails me daily security audits, among other things. I also monitor Debian’s security-announce list like a hawk. (If you’re the sysadmin for a Debian server and you’re not on that list, sign up. There’s no excuse for not keeping your box secure when you can get told about all the known vulnerabilities.)

One of the things that’s crossed my mind in the past month as I’ve been whipping bit qua bit into shape is that the login pages for WordPress and a handful of other utilities I keep on the server are served over HTTP, when they ought to be served over HTTPS. A couple weeks ago, I made available a secure version of the login page for those who wanted it. (Because I don’t want to bother paying a CA, I left the default insecure, since you’ll get a Scary Dialog Box telling you to panic and fear because the world is ending and you are going to die and incidentally the cert’s self-signed, but nobody reads that far and fewer people understand what it means, so they just leave the site instead of posting.) A week ago, I moved all the back-end pieces so that they were available HTTPS only, and a few days ago, I decided to move the WordPress admin interface to HTTPS as well.

I naïvely thought the process would be simple. I fired up Emacs, opened the configuration file for bitquabit.com’s insecure site, and added the following line:

Redirect /wp/wp-admin/ https://bitquabit.com/wp/wp-admin/

I restarted Apache, checked things out, and bingo! Everything seemed to be working spiffy.

Except…I gradually began to notice that something wasn’t quite right. WordPress no longer automatically saved my drafts, and I couldn’t modify the websites appearing in my blogroll. When I found I also couldn’t upload pictures for an upcoming article, I had to dive underneath the covers and figure out what was going on.

It turns out that a lot of URLs in WordPress—among them, all the AJAX ones—are hard-coded to be HTTP, not HTTPS. When they were trying to execute, they’d get back a 302 response code (redirect), and then fail. Not good.

There are two ways to secure WordPress: the first is to simply modify the blog’s URL to be https://—not really what I was aiming for. The second is to use the Admin-SSL plugin so that only the login and admin pages are secured. That seems to work perfectly, but it forces all users to login through the secure interface, which means that everyone gets the Scary Dialog Box when they try to post. I’m not sure what solution I’m going to end up adopting; part of me still just wants to rewrite the whole blog myself, in which case I can easily customize it to do what I want, but it’d probably be easier to learn just enough PHP to fork the Admin-SSL plugin. For the moment, I’ll just keep using an SSH tunnel to the box when I want to post, which is probably more secure anyhow.

Debugging IE Layouts

As approximately 30% of my readers undoubtedly noticed, Internet Explorer had serious issues displaying the redesigned bit qua bit properly. Unfortunately, since IE lacks developer tools like the excellent Firebug and Web Developer toolbar, trying to figure out exactly what IE was choking on threatened to become a game of guess-and-test—the main reason that I’ve been so slow to get it fixed.

Today, by happenstance, I came across a bookmarklet called Xray. You simply throw Xray into your toolbar, click it on any web page, and an easy-to-use CSS inspector pops up. Click on any element of the page and its CSS properties will be displayed; navigating through the DOM involves simply using your cursor keys. I fired up Xray in IE, and in under a minute, found the causes of the layout glitches and added two lines of CSS to fix them. The result: bit qua bit now renders properly in all major browsers. If you have to debug CSS problems in IE, I’d definitely encourage you to give Xray a shot.

VisualStudio Improvements

I was happy to read today that the VisualStudio team is working on significant performance improvements for VS.NET 2005. One of the things that frustrates me most about working on Fog Creek Copilot is simply that VS.NET can sometimes be so slow that I actually lose my train of thought. Hitting a breakpoint in some of our programs can lock VisualStudio for five to ten seconds. Building the small copilot.com website can take a full minute if the master Aardvark.dll assembly has been modified. It’s like trying to maintain flow when your work environment is a series of dams. This patch can’t come out a minute too soon.

New Open-Source Squeak Book

I was pleasantly surprised today to discover Squeak by Example, an open-source book on writing programs with Squeak Smalltalk. If you want the bleeding-edge version of the book right now, you’ll need Subversion and an up-to-date LaTeX installation, but a four-month-old PDF version is also available if you don’t want to muck with all that. Combined with Stéphane Ducasse’s compilation of free Smalltalk books, I don’t think any Smalltalk neophyte should be wanting for learning material.

Copilot for College

My day job is working on Fog Creek Copilot, a powerful, cross-platform remote assistance solution. This week, Tyler and I were talking about how it’s too bad that Copilot didn’t really exist when we were in college, because we always ended up doing tech support for our families over the phone, which always went something like:

Me: What do you see now?

Family Member: A dialog box.

Me: What’s it say?

Family Member: It’s got a stop sign with an exclamation mark and says that the server can’t be found.

Me: Okay, click “Okay,” then read me back the line that says “SMTP Server.”

Family Member: Wait, I just clicked “Okay” twice. Now what do you want me to do again?

And so on. Not fun.

The good news is that Copilot exists now, and makes doing remote tech support really easy. Unfortunately, college students are basically perpetually broke. Tyler and I remember what that’s like. It sucks. You just got to school. There are eighty bajillion things going on and at least ten girls or guys that have caught your eye and (if you’re very lucky) your pants. Having to pick between wasting an hour helping someone, or using valuable booze money just so you only need five minutes, can be a painful choice.

Well, we’ve got a proposal: for the month of September, we’ll let anyone with a .edu address use Copilot for free, up to three times. These aren’t two-minute trials; they’re real, legit, 24-hour day passes. Plus, if that’s not enough, we’ve got a referral program: if you refer a friend with a .edu address to use Copilot, you get another three day passes. Ad nauseam. No limit. No catch.

So if you’re in college, the next time someone asks you for help, grab a free Fog Creek Copilot day pass. Then spend your leftover time grabbing a beer. Best of both worlds.

The Open XML Debate, Revisited

From Slashdot, which is slowly redeeming itself, comes a link to Microsoft admitting that it bribed members of the Swedish ISO committee to vote for OOXML. Unsurprisingly, the Swedish ISO committee just voided its own vote. Due to time crunch, they will not be casting a vote at all in the Open XML ratification process.

I find it depressing but predictable that I’m unsurprised.

The WSJ on Open XML

I think that the Wall Street Journal does a fairly good job covering technology from a consumer’s perspective, but I feel that they struggle whenever they try to cover more industry-focused issues, making outright mistakes and failing to understand what in the debate is actually important, which leads them to follow up (or fail to) on the wrong points. Today was no exception: in an article entitled “‘Office’ Wars,” they attempted to cover the politics revolving around Microsoft’s efforts to get their Open XML adopted as an ISO standard. The mistakes began cropping up depressingly early in the article:

To gain approval as an international standard, Microsoft had to bare the code that undergirds the Office file format, called Open XML.

Um, no. Microsoft created a brand-new file format called Open XML that is a totally different format from the ubiquitous DOC format. They then published a 5000-page specification of its supposed operation that is incomplete and inconsistent with their own Open XML implementation, which they have not had to lay bare. This not-actually-implemented-as-specified Open XML specification is the one Microsoft is trying to ram down everyone’s throats.

Jean Paoli, one of Microsoft’s top standards experts, says the company wants Open XML adopted as a standard to encourage rivals to use its format, not squelch interoperability. He points out that other vendors, including Apple Inc., are adopting it.

Apple supports reading an extremely limited subset of Open XML in TextEdit in its upcoming Mac OS X Leopard. The most recent version of Pages can also import a slightly larger subset of Open XML, but, so far as I know, can’t write it back out. Neither Pages nor TextEdit use Open XML as their native formats. I do not think that Apple’s behavior can honestly be construed as “adopting [Open XML].”

He says IBM is stirring up opposition to Open XML’s gaining approval from the International Organization for Standardization, or ISO, to protect its Lotus Notes office suite, which uses the rival format Open Document.

Probably partially true, in the sense that Lotus Notes bundles a version of the open-source OpenOffice, which uses ODF as its native format, but since there actually are several word processors that work natively with ODF, I’m hazy how this could be construed to protect Lotus Notes. In fact, given that Lotus Notes used to ally itself with SmartSuite, which had a proprietary file format, I think this actually opens up Notes to more competition. Given that there are no word processors other than Word that support Open XML, I think Microsoft’s claim applies far more strongly against itself.

Open Document is already an ISO standard, but Microsoft says there’s room enough for more than one document standard.

Why? “Just because” isn’t a good enough reason. ODF has already been standardized for some time and has broad industry support. Only Microsoft Office uses something superficially resembling Open XML. This strikes me as a hilarious extrapolation of the old joke, “The only problem with standards is that there are so many to chose from.”

In addition, Mr. Robertson said, the technical committees should include lots of voices—and that means some on Microsoft’s side. “Where you find expansion in the committees, you will find expansion on both sides,” he said. “That’s OK because it represents the community a whole lot better.”

This is the “all ideas are equally valid” fallacy. If we are going to have a debate on whether we should require, by law, that people dress up in purple chicken suits and make monkey noises at 3 PM on the second Thursday of the month, no one would be particularly surprised if a committee were completely biased against the idea. Even if I tripled the size of the committee, having a purple chicken suit proponent likely would actually make the committee less representative, since the position would then be over-represented.

We have a similar situation with Microsoft’s Open XML. IBM, Sun, RedHat, Novell, Canonical, and Google, among others, support ODF. So far as I know, the only major company backing OOXML is Microsoft. Why does it follow that we should expect roughly equal numbers of OOXML proponents and detractors on any given committee?

I appreciate that the WSJ has recognized that the OOXML-ODF debate is an important one, and I’m glad that they’ll be increasing public awareness of it, but I still wish that they’d done a better job covering what’s actually going on. This is a case where both sides are not created equal, and fair reporting probably should not treat Microsoft as if they have equal merit.

Dreams Dashed in C++; News at Eleven

In my previous article, I discussed some alternatives to C++ for systems programming. Today, I want to provide an example of why you might care.

Tyler and I recently debated rewriting Fog Creek Copilot in Qt, a powerful, high-level, cross-platform C++ framework. The idea came to us when we started discussing the implications of maintaining four helper applications (Windows Helper, Windows Helpee, Mac Helper, Mac Helpee), each of which shares depressingly little code with the others. Because Qt is fully cross-platform, and, unlike some other toolkits, doesn’t stick out like a sore thumb on Macs, we figured we ought to be able to cut down the number of real code bases to two (and hopefully greatly increase the amount of code shared between Helper and Helpee in the process). To evaluate the practicality of using Qt, I grabbed the open-source version and a copy of C++ GUI Programming with Qt 4 and sat down to write a small demo app (“Virtual SysAdmin”—the only app that simulates a rapid-fire deluge of mindless support emails).

The good news is that I love Qt. As far as C++ frameworks go, I can now say without hesitation that it’s my favorite. Both its low-level and high-level libraries are very well thought-out and easy to use. Its willingness to extend C++ with its signal-slots mechanism effectively makes C++ as flexible as Objective-C, and enables a wonderfully clean MVC implementation compared to any other C++ GUI framework I’ve used. Working in the GUI designer is effortless, and Qt makes the refreshingly correct (and depressingly rare) decision to have its GUI designer generate descriptor files rather than the convoluted C++ code with tons of “DO NOT MODIFY BELOW THIS LINE” comments spat out by other designers. I think I still prefer Cocoa over Qt for Mac-only development, but otherwise, I’d probably pick Qt at the moment for writing an end-user app.

To be suitable for Copilot, though, the framework’s ease-of-use doesn’t matter. What matters is how large the framework is when statically linked. Qt, sadly, falls flat. I worked with the Qt mailing list to get a small version of Qt, but, even following their advice, I was only able to shrink “Hello, world” to about 2 MB—well more than double what Copilot has to be with all of its code and data files. That is, sadly, a deal-breaker. (I admittedly have not yet had a chance to play with qconfig as suggested late in the thread. If that substantially changes things, I’ll be happy to post again, but I’ve dedicated enough time to compiling Qt for the time being and need to focus on other things.)

What makes me sad, though, is that the bloat in a “Hello, world” app isn’t entirely Qt’s fault. Some of it may be; it’s possible that the framework really does have that many interdependencies. But an equally likely culprit is the C++ language and linker.

Back in 1996, when I was 13, had just learned Squeak Smalltalk, and was trying to get into lower-level languages, I picked up a copy of Delphi 3. Delphi is (or at least was) based on Object-Pascal—basically Turbo Pascal with a lot of additional object-oriented extensions (including interfaces and COM support) and a powerful run-time library to support introspection. Delphi 3 had a rich framework that was extraordinarily similar to Qt. It had the same way of building forms, a nearly identical class hierarchy, and, though not having a concept quite as powerful as signals and slots, had an event system that bears more than a passing similarity to the Qt-comparable event system in .NET. (Hardly a coincidence, since Anders Hejlsberg, who designed Delphi, went to Microsoft to work on J++ and later C#.) Yet despite Delphi’s similarity to Qt in features and capabilities, its executables were far, far smaller. “Hello World” in Delphi comes in at about 100 kB, give or take. A fairly full-featured app I wrote at the time that tells you what kind of rock you’re looking at comes in at a heftier, but still petite, 300 kB. Other Delphi apps I wrote at the time have similarly small sizes.

Part of the reason that Delphi apps could be so small was that the Borland compiler did a spectacular job compared to the likes of GCC/MinGW, which is what I was using for tests. But in turn part of why it could do such a stellar job is that the Delphi language made it easy. Object files had rich metadata that made it easy for the linker to find and remove dead code. Certain facets of the language, such as having native strings, sporting a simpler, more consistent object system, and lacking templates, meant that the linker frequently had less total code it needed to sift through. The generally higher-level nature of Object Pascal also meant that the compiler could do a better job reasoning about what the code was actually trying to accomplish, allowing it to make superior optimizations earlier on. All of these features combined made for a system that could simultaneously deliver a rich object framework without giving up small executable sizes and without requiring a massive DLL. C++, at least in its current form, simply can’t achieve the same performance.

Would Qt be small enough to be viable if only it were written in a different language? I honestly have no idea. But from experience, I can only imagine that, at the very least, it would have helped.

Avoiding the Masochist's Programming Language

As you may or may not know, ANSI is trying to push a new C++ standard out the door called C++0x (which those of you who know C may find amusing, since you can read it “C++ Hex”). C++0x’s primary goal is to take C++’s already horribly convoluted syntax and make it even worse. Looking at a summary of C++0x‘s additions, for example, we come across the concept of rvalue references, expressed as int &&x. With this move, C++ now has a bizarre hybrid of pointers and handles that solves a legitimate problem in C++, but one that really shouldn’t exist to begin with. C++0x also introduces the odd concept of nullptr, a strongly typed null pointer. There’s a reason similar constructs exist in higher-level languages like Java or Python that try to move away from the hardware, but given that one of C++’s huge initial advantages was retaining the close-to-the-metal feel of C, I really fail to see why we want nullptr when we already have 0. Most of the other changes seem frivolous (the new concept keyword), closing the barn doors after the horses have already come home (Boost‘s sharedptr and weakptr will enter the language), and trying to cram high-level constructs into a language never designed for them (such as garbage collection).

Thankfully, even if you need to stay close the hardware (which most programmers rarely do), there are several superior alternatives available. On Windows and Linux systems, you can take a look at D, which is basically everything that C++ should have been: easy low-level access when need be, a vastly cleaner and less ambiguous syntax, per-allocation optional garbage collection, optional contracts, generic containers, typesafe varargs…the list goes on. Calling existing C code from D is downright trivial, involving little more than a few extern declarations. It’s a project that anyone stuck doing low-level programming should be keeping an eye on.

On Windows, Linux, and Macs, Objective-C can also make a superb choice—especially when coupled with GNUstep, whose FoundationKit framework provides a rich, cross-platform set of low-level libraries for things such as memory management, common data structures, and file operations. With Objective-C, you can program in an almost Smalltalk-like environment for non-speed-critical components, but then drop down seamlessly into C for the tight parts. Because Objective-C, unlike C++, is a true superset of C, this can actually be easier in Objective-C than C++ anyway.

If you’re in the mood for something a bit different, but still want to stay close to the hardware, take a look at Free Pascal, a superb cross-platform Pascal implementation that includes a massive, powerful class library, a great language runtime, and the scarily easy ability to drop down into assembler that made Turbo Pascal so great. If you’re scared by what you once read in Why Pascal is Not My Favorite Programming Language aeons ago, fear not; the Pascal language implemented by Free Pascal resolves basically every single one of those problems. And, of course, calling C libraries involves nothing more than running the C header file through a converter or linking against the shared object directly. Easy.

Keep in mind, I’m a high-level language fan. I’d greatly prefer to be working in Smalltalk or Common Lisp over any of these options if I have the choice, but I recognize that there are time you really do need to be close the hardware, whether for size, speed, or both. But even in that domain, there is no reason why any sane person should feel compelled to chose C++ over the alternatives.

I have a hunch, given this blog’s audience, that I’m probably preaching to the choir, but on the off chance I’m not: if you haven’t already, take a look at something other than C++ the next time you have to do system programming. You might really like what you see.

Too Much Emacs

This afternoon, on a lark, I installed Conkeror, a Firefox plugin that makes Firefox look and act like Emacs. As far as these things go, I’m actually extremely impressed. A substantial number of Emacs commands are implemented—including the less-common ones, such as C-x h (select all), that most Emacs-style emulators seem to miss. Suddenly, navigating the web entirely by keyboard seems…pretty reasonable. If you’re either an Emacs or a keyboard junkie, check it out. You may really like what you see.

Writing an Emulator

I don’t know why, but recently, as my love of really low-level hardware and my desire for low-power, high-performance computing has increased, I’ve been researching all the old, famous CPUs and operating systems. I started over what I swore was going to be a computer-free vacation by delving into programming in assembly for 680x0 Macintoshes (during which time I fell in love with 68k assembly), then explored ARM chips, and finally somehow or another ended up at 4:30 AM on a Sunday working on an assembler and cycle-accurate emulator for the MOS 6502. Resources I’ve found useful:

  1. Assembly in One Step, which provides a superb overview of the 6502 execution model
  2. A list of 6502 opcodes, including their mnemonics, size, and how many cycles they should take to execute
  3. WLA DX, an open-source cross-assembler for a bunch of old CPUs (including the 6502, 65C02, 65816, and Z80), which lets me check that I actually understand the documentation and am writing my own implementation correctly

When I have something worth looking at, I’ll post it. On a lark, I’m actually writing my assembler and emulator in Free Pascal, a superb open-source clone of Borland Turbo Pascal. I’ll also post about how that goes. (The short version: my memory is better than the experience, but both are better than C++.)

Politics and Tech Blogs

When I first started bitquabit, I wanted it to be strictly a technology blog. When people wanted to read something about Squeak or db4objects or Copilot, they could come here. When they wanted to read someone writing a meandering essay on farm subsidies and ethanol, they could go somewhere else.

That position is becoming increasingly difficult to maintain. On the one hand, technology is inextricably tied to certain political agendas that, I feel, must constantly be discussed—patents and copyright chief among them, but also such topics as freedom of speech or the ethics of invention. Yet these topics are, well, boring, because basically every single prominent tech blogger has exactly the same position on these issues: patents are too broad, last too long, insufficiently investigated for prior art by the USPTO, and, at least in computing, do far more harm than good; copyright is great, but in its current forms last too long, fails in its original purpose, and violates the Constitution; and the Digital Millennium Copyright Act and its siblings abroad are blatantly unconstitutional due to their de facto destruction of fair use, stifle free speech, and should be at best struck down and at worst kept from spreading to more countries. Because these positions are so widespread, they’re neither controversial nor insightful. Others expound these arguments far better than I and are more up-to-date on the latest goings-on in the beltway. In short, though these topics may be relevant, they are also well-covered and therefore not issues I feel compelled to discuss here.

That leaves The Ugly Topics—the topics nominally only distantly related to technology. Topics like the war, Sarkozy’s election, the heightening tensions in the last month in Israel, the rapidly intensifying atheist-believer debate, and global climate change. These topics, charged though they may be, have profound impacts on technology. Wars divert funding from general science and research, yet fuel technological innovation—provided that the technologies in question are usable for waging war. Sarkozy happens to be sympathetic to digital rights management software and does not seem to support the EU’s drive to force Microsoft to open its networking protocols and file formats. Israel is a miniature Silicon Valley and a major exporter of cryptography and encryption software. The atheist-believer debate has profound impact for artificial intelligence and, indeed, the purpose and future use of technology in general. Global climate change drives ever-more-powerful computing clusters in an attempt to simulate weather patterns, spurs green buildings and high-efficiency solar cells, and may usher in the return of nuclear power. Even if we ignore all of their greater significance, the topics are still relevant.

The problem, of course, is that I cannot discuss just the technology side. Some readers are already fuming that I can touch on the war and step right past all the people dying, others are annoyed that I’m ignoring the “interesting” parts of Sarkozy’s politics, and my good friends, who are well aware of my fascination with religion, are curious why I’ve never touched a single religious topic on this blog. These issues are all so big that I have to discuss parts of them that have nothing to do with technology if I want to cover their technological aspects at all. Yet by attempting to address any of these topics, I will divide my audience.

If I want to speak on these topics, yet doing so costs me readers who are interested in what I have to say on technology, does that make addressing these topics “bad”? Does my hesitancy, however brief, to speak my mind and lose those readers make me a coward? If I honestly care more for this particular blog to reach a technical audience than a political one, does that change the answer? If it did, would the answer change again if I were to split bitquabit into a personal blog and technical blog?

Where is the dividing line?

Editor Addiction

A couple of weeks ago, I purchased a Dell Inspiron 6400 to replace my old and quite literally beat-up PowerBook G4 Titanium. The PowerBook is slow, its screen is damaged, its paint’s chipping off, its wireless has never been especially good, and nowadays, I find myself politely wondering when the hard disk is going to simply keel over. It’s done an amazing job over the last six years, but I felt that it was time to let it take a much-needed rest.

Because I’ve been a diehard Mac user for nearly ten years now, I initially planned to get a MacBook. It was a reflex action. Need a computer, get a new Mac. Yet on further reflection, I began to think that maybe getting another Mac wasn’t the best idea. I do have two working Macs, after all (at least for now), I’ve been wanting a Linux machine, and you really can get more bang for the buck hardware-wise by buying a generic PC. I eventually caved into this line of thinking, and, With great trepidation, I finally settled on the Inspiron.

Overall, I’ve been quite happy with the machine. The machine came with an absolutely beautiful 1680x1050 display, the machine is fast (it easily clobbers my dual G5 in anything I try), and, to my slight surprise, Ubuntu 7.04, minus the install process, has been a dream to use (although in fairness I should point out that I am veteran of Slackware 3.0-era of Linux, so my sense of a dreamy, easy-to-use Linux is probably warped, to say the least).

So, I’ve put the PBG4 in my closet, along with my Newton 2100 and my Handspring, just another old device that I don’t use but can’t bare to part with.

Right?

Wrong.

If anything, over the last couple of weeks, I find myself using the PowerBook more heavily. And it’s not that I find OS X easier to use, or prettier, or anything like that.

It’s that, after having used TextMate for the last couple of years, I honestly just can’t stand going back to the likes of Emacs.

“What?!” I hear my devoted reader(s?) cry. “But, like, you just blogged about how you use Emacs for like everything! What gives? Where’s the love, man? Where’s your lispy love?”

I have a confession to make. I do use Emacs all day at work. It does let me have the same editing environment on my Mac and my PC, and that is a boon to my productivity. I honestly think ECB, CEDET, JDEE, EDiff, and even the much-maligned Eshell are beautifully written, wonderful packages. And although I gave up using Emacs as my mail client after we switched to Exchange at work, I still have Gnus rearing to go the second I change my mind.

But here’s the thing: I’d really just rather be working in TextMate.

I won’t lie. TextMate can’t do everything that Emacs and my carefully crafted dot-file can do, and there are definitely some things that TextMate does far worse or not at all. (I’m pretty sure you couldn’t write a TextMate plugin even dimly resembling GUD, for example, which is sad, because extensions like GUD played a big role in drawing me to Emacs initially.) But what it really comes down to is that while Emacs may operate the same way everywhere, TextMate to me feels as if it acts the right way, if only on my Mac.

Rather than having my choice of carting around a ridiculously large .emacs with key remappings or simply using the arcane and contorted control and meta combinations everywhere, TextMate honors and extends normal Mac OS conventions in a good enough way that I don’t feel compelled to readjust everything. The third-party packages for TextMate are generally far easier to use, more transparent in operation, and more compact than their Emacs equivalents. Bundles can easily be swapped in and out without learning anything more complicated than drag-and-drop. The GUI actually looks like the GUI that’s used in every other program I’m in all day, yet manages to provide me with exceptional power anyway—and even passes on some of that power to my scripts. TextMate really has become the environment that I go to when I just want to get things done, right now—and at least in that sense has completely supplanted Emacs as my editor-of-choice.

I don’t think I’m ever going to completely give up Emacs, but my workflow in Emacs feels convoluted enough that returning to TextMate always leaves me feeling…relieved. Given the choice between my stolid PowerBook with TextMate and my blazingly fast Inspiron with Ubuntu and Emacs, I’ve found myself taking the PowerBook more often than not. I’m keeping a very close eye on e, but after having tried to use it for a few days, I feel strongly that it’s just not there yet. (And even if it were, its developer won’t be releasing a Linux version for awhile, which kind of defeats the point of me having a Linux laptop.)

I’m not sure that the Inspiron was a bad purchase, but I’ve definitely learned not to underestimate the value of your existing software packages when choosing a computer. At least for me, TextMate will be the quintessential example of why it’s so hard to give up the Mac.

Switching Control and Caps Lock on Windows

I’m a diehard Emacs user. When I first get into the office, I fire up Emacs, then check my mail in Emacs, then update all of my source files using either the built-in Subversion bindings or a Cygwin shell via Emacs, and finally get down to coding for the day in Emacs. Windows and Mac OS X at times feel like just the kernel that allows me to run Emacs.

Productivity-wise, that’s actually a great thing. My work environment is basically identical no matter what machine I’m on, enabling me to focus on coding instead of trying to remember exactly what keystroke does build versus clean build versus debug build on which platform. Unfortunately, Emacs relies heavily on the Control key. On older Unix systems, Control was conveniently placed directly next to the “A” key, which made it easy to use without turning your hand into a compact pretzel. On modern Mac and PC keyboards, you’re out of luck: the Control key has been relegated to the bottom-left-hand corner of the keyboard, making it hard to reach without contorting your hand.

Thankfully, fixing this deficiency is a piece of cake. On the Mac, the process is a bit confusing, but fairly straightforward: simply open System Preferences, go to Keyboard and Mouse, pick the Keyboard Tab, click on Modifiers, and set the Control key to be Caps Lock and Caps Lock to be Control. GNOME users have basically the same process: they must click Settings, Preferences, Keyboard, go to Layout Options, and then toggle what key they want to register as their Control key from among several options. In both cases, the setting takes place immediately. You do not need to log out or reboot.

On Windows, thankfully, the process is considerably more straightforward. First, open regedit (Start, Run, type regedit) and navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout. (N.B.: Keyboard Layout, not Keyboard Layouts. They’re different.) Next, right-click in the values field, choose New, and then choose “Binary Value”. Name the new field Scancode Map. The Scancode Map is a little-endian—encoded field that allows you to remap arbitrary keys. The header is simple: the first word is the version (all zeros), the next word is reserved (all zeros), and the third word is the little-endian—encoded count of how many keys you will be setting, plus one extra for good luck. So, if you’re only swapping your Control and Caps Lock keys—two custom mappings—your binary field should read 00000000 00000000 03000000. The next entries are the actual remappings, encoded as key to map (two bytes) and key to map it to (two bytes). Looking up the scancodes for Caps Lock and Control, we see that the former is 0x3A and the latter is 0x1D. So, to map control to Caps Lock, and accounting for the little-endian encoding, the next two words should be 3A001D00 1D003A00. Finally, we have a one-word null terminator, all zeros. The final value for the Scancode Map binary value, then, is 00000000 00000000 03000000 3A001D00 1D003A00 00000000. Make sure you’ve entered the key correctly, then reboot your machine. Presto! Caps Lock and Control reversed.

Give it a try. Even if you don’t use Emacs or a similar Control-heavy editor much, you may still find the reduced contortion when using the Control key is more than worth it.

Credit Card Numbers for Testing

One of the things I have to do on a regular basis is run a suite of regression tests against Copilot’s payment system to ensure that we’re doing basic vetting of invalid credit cards, such as those that do not have valid Luhn checksums. To that end, I was very happy to discover a list of credit card numbers that I could use for testing purposes. The numbers aren’t for valid cards, but should pass any basic validation check you have in a commerce site you write.

Renaming Products

Is it just me, or does anyone else enjoy pronouncing MySQL as “My Squeal”? (And of course, PostgreSQL as “Postgres Quell” makes a nice parallel, and is actually a nice hat-tip to the QUEL language Postgres originally used.)

I’ve also noticed that if you call Microsoft IIS “Microsoft Ice,” then the increasing rise of Mozilla Firefox seems to kind of balance things out.

Getting Windows Errors While Debugging

I’m probably saying nothing new to any seasoned Windows developer, but it was new to me: if you add a watch in VisualStudio for the value @ERR,hr (no quotes) you get the human-readable error string for the error code that would be returned by GetLastError() if you called it and then ran it through FormatMessage(). I was amazingly happy to discover this little tidbit of functionality, since previously I had thought I had to look up the value on the MSDN System Error pages.

Firefox 2's Kin, or: Well, That Solves That

I say nothing that should surprise diehard Mac users if I say that Mac OS X lacks decent power-user mail clients. Thunderbird suffers from a lot of the same problems as Firefox, Entourage is…well, Entourage, and Mail, pretty though it may be, is heavily underpowered in a lot of key areas. (E.g., you can’t even create nicely formatted lists, which is something I need to do quite frequently.) From this motley crew, I’ve traditionally opted to use Thunderbird. Yes, it’s ugly and doesn’t integrate with OS X, but it’s got the features I want, it runs quite fast, and I get to use the same app on both my Windows and my Mac machines.

Like a lot of people, I had been keeping a few thousand emails in my inbox to be sorted out at some point in the future (usually, “tomorrow”). Some of those were useless spam, and many were just trivial notes I should have deleted but didn’t, but a few were more important to me, like emails from my trip to Germany back in eleventh grade.

This morning, I brought my G5 out of sleep and noticed that Thunderbird had locked during the night. No biggie; force-quit and reopen. As I heated some tea and prepared my breakfast of burnt eggs cereal with spoiled milk instant oatmeal and a grocery list, I sat down to read my email, and noticed that Thunderbird was reporting that I had zero messages. It took my brain a few moments to fully process that Thunderbird was telling me that I had no messages, period. Thankfully, Thunderbird stores its emails in mbox format, and I actually do back my data up fairly regularly using rsync, so I didn’t worry too much, but I was fairly annoyed. The only other app that has ever lost my mail is Microsoft Outlook Express 4 for Mac—not good company to be in.

When I got home today and actually had a chance to look at the problem, though, things were weirder. The file for my inbox was not only fully in-tact, but Thunderbird was dutifully appending new messages to the end of the file. It simply refused to show me what it had downloaded. A few Google queries later, I had identified the problem: Thunderbird’s index file had gotten corrupted. The solution was to delete the index file.

This is dumb in so many ways I don’t even know where to begin.

  1. My biggest complaint here by far is that Thunderbird has all the information it needs to fix this problem itself. It knows how many emails used to be in the account the last time the program ran. It knows how many emails the index file thinks there are now. If these numbers don’t match, rebuild the index.
  2. It turns out that this is a known problem if your inbox gets big and you don’t compact it. What does compacting mean? It means actually deleting on disk what you deleted in Thunderbird. By default, Thunderbird simply marks certain messages as deleted, meaning that they won’t show in the GUI, but doesn’t actually expunge them from the file on disk, since that would require rewriting a large amount of data. Lots of programs use this shortcut for normal use, occasionally running a compacting cycle when the machine’s idle or at least not terribly busy. What’s unique to Thunderbird is that Thunderbird, by default, won’t compact, ever. That would be ludicrously stupid, but at least vaguely pardonable, if Thunderbird worked okay in this configuration, but the truth is that Thunderbird’s indices get corrupt if you don’t compact them often. To even discover this setting, let alone change it, you have to go to Tools:Options:Advanced, where it’s notably under “Disk Space” and not “Settings That Can Crash This App And Destroy Your Emails If You Don’t Change Them.”
  3. Normal end-users are never, ever going to realize that they have any hope of recovering their data. To your normal end user, when the indices get corrupted because all that image-based spam they’re deleting doesn’t actually get deleted and corrupts the Thunderbird database, their email just vanished. Not what I’d expect from an application that bills itself as “a robust and easy to use client”.

At this point, I’m seriously considering reviving my earlier attempt to write a cross-platform email back-end in Squeak (or, these days, maybe Factor) with powerful threading and search, and then writing a nice GUI over it in Cocoa and .NET for normal use. In the meantime, I’m migrating back to Mail. It may be underpowered and it’s IMAP support may be an exercise in frustration, but at least its never deleted my email.

Halloween Fright

A couple of years ago at Duke I served as an undergraduate teaching assistant, or UTA, for CPS108, which is basically the large-program design course. While there, I wrote a program called Crystal, which was a plugin-driven web browser written in Java. Basically, the code students received provided a couple of interfaces and a single concrete class called Crystal that implemented just a window frame and a handful of events to notify notify plugins of changes in the current URL viewed and so on. The students had to write the display components (at least an HTML view and a text view) and application services (URL bar, search bar, back and forward buttons, and so on) that would turn this shell into a real web browser. At the time, due to how close we were to finals and how much material we were trying to cram into the semester, the professor opted not to use Crystal, and I basically forgot about it. In the last semester or two, though, Crystal, now known as Websta, has made a reappearance.

While talking to Ben, a friend of mine who’s currently a UTA for 108, we got to talking about how to add tabbed browsing to the Websta framework. Ben insisted it was convoluted; I insisted that it was straightforward. (For the record, Ben was basically right; the interfaces provided are sufficient to allow for tabbed browsing, but students basically have to throw out the Crystal class and write their own shell in order to achieve a sane tab implementation.) In order to figure out how to do it, I asked Ben to send me a tarball of the current version of the source code. A moment later, I extracted the tarball, dragged the folder to TextMate, and was immediately greeted with this:

Websta

I spent a good minute or two frantically trying to figure out how to customize TextMate backgrounds before I finally realized that the TextMate icon in my dock had mysteriously changed as well:

HallowMate

It turns out that Allan Odgaard, TextMate’s author, pushed a special Halloween update into the bleeding-edge stream. Once I figured out what was going on, I had a good laugh and got back to work. I know that doing stunts like this would make a marketing department’s skin crawl, but I love how much personality little actions like this add to applications and the companies behind them. Somehow, it just makes things feel more personal.

I think it’s for that reason that I was so profoundly hurt several months ago when I learned that the Malkovich easter egg in Copilot had been removed. When doing development on Copilot, we usually run the Host (the helpee’s program) in a VMware session and run the Helper directly from our desktops. Occasionally, though, we need to test running Host on our own desktops (possibly because we need to test dual monitors, or because we want to see a Windows 98-to-Windows XP session, or because we really need the more powerful debugging facilities that VisualStudio provides to locally running applications), which results in a Copilot-into-infinity session, wherein each Copilot window contains another Copilot window which contains another Copilot window, etc. One day, Joel was looking at that and commented how similar it was to the movie Being John Malkovich when John Malkovich accidentally enters his own head and sees only other copies of John Malkovich. A few beers and a movie viewing later, Copilot had its first (and to date, only) easter egg: if the Helper and the Host ran from the same computer, the title bars would change to read “Malkovich.” I was especially proud of the code for this patch, which was, in its own little way, worthy of at least the lesser entries in the IOCCC:

void ClientConnection::Malkovich() {
    MALKOVICH MALK0VICH MALk0VICH
        MALKOV1CH MALK0V1CH MA1KOVICH
            MA1KOV1CH MaLKOVICH MALkOVICH
}

Sadly, the Malkovich easter egg, as software easter eggs are wont to do, was responsible for a bug in Copilot, and so had to be removed, but for those of you who really just gotta see it, fire off an email and I’d be happy to send you a copy of the Copilot helpers that still have the Malkovich feature in them.

Build Servers

A few days ago, Coding Horror posted an article describing build servers and why they’re important. At the time, I basically just nodded and agreed and went on. Any sizable project needs a build server to ensure sanity in the code tree. I admit that Copilot currently lacks a dedicated build server, but that was more because, the last time it came up, we were in the middle of moving all of our code from .NET 1.1 to .NET 2.0, and I didn’t really see the point of setting something up that was going to have to be rebuilt as soon as I finished it. Creating one now that the transition’s complete is very high on my to-do list.

When it came up again on reddit today, though, I got to thinking. On a normal team, build servers check exactly one problem: does the code compile? The reason you need a server to check this is that, in a typical development environment, you end up with a bunch of stale object files and source code you forgot to check in to source control that may allow a program to compile on your personal machine, but not on anyone else’s. Since compiling is slow, and since even good IDEs don’t integrate well enough with source control to avoid ever missing a checked-in file, these problems seem intractable.

For most languages, they are.

For Smalltalk, they’re not.

Because of the whole way that Smalltalk code is developed—basically, modifying a running program to suit your needs instead of writing a dead program in a text editor—you can never end up with code that used to compile but now doesn’t. If it failed to compile, it can’t get into the image, and once it’s in the image, it can’t suddenly cease to compile. Sure, you can break code that you have, just like you can check in broken but compiling code in any other system, but that’s the domain of unit testing, not compile checking. We’ve suddenly eliminated a whole class of problems that build servers try to solve. Further, due to how tightly Monticello (Squeak’s distributed source-control manager) can integrate with the environment, you also have to struggle to avoid checking in code you’ve modified, eliminating another whole class of reasons for having a build server. We’re suddenly left just with centralized unit testing. Don’t get me wrong; I think unit testing is a good idea. But, realistically, this is a problem that’s going to be solved on a developer’s workstation or not at all, and it’s not terribly hard to modify Monticello slightly so that unit tests are run before the developer can cut new package versions.

So, at least for Squeak, the whole concept of a build server becomes unnecessary.

I’m not downplaying the importance of build servers in a normal environment, but it was somewhat sobering to realize that a whole class of systems don’t really need them.

Squeak on the OLPC

This probably is, more than anything else, simply an indication of how little I’ve been following the Squeak community lately, but I was extremely happy to discover that Squeak will be on the OLPC. The concentration appears to be on eToys, an extension of Morphic that allows kids (and, with considerably more effort, adults) to make interactive graphics without writing code. If you’re interested, Google Video hosts a great video of Alan Kay demonstrating various Squeak technologies, including eToys and Croquet, that will give you an idea of how powerful a tool these kids are going to have. Another video that I think is a great watch is Squeak in Extremadura, which shows how schoolchildren in Argentina are using Squeak and eToys in their computing courses, and gives a good idea how Squeak can be integrated into a school curriculum.

Video Smalltalk Tutorials

One of the problems I’ve always had trying to explain Smalltalk is that the environment and language are so fused and so radically different from those in the mainstream (which I’m defining here as Xcode, VisualStudio, Eclipse, and IntelliJ) that people have trouble figuring out what the big deal is. That’s gradually changing as languages like Ruby, OCaml, and properly used JavaScript come into the mainstream, and as static language designers eek ever closer to having a full REPL with dynamic code reloading, but the gap’s still big enough that properly explaining why Smalltalk’s differences help productivity and flow becomes problematic. Thankfully, James Robertson at Cincom has been compiling a collection of video tutorials called Smalltalk Daily that are designed to walk new developers throught he basics of programming Cincom Smalltalk. Each of the short tutorials focuses on a new aspect of the VisualWorks environment or the Smalltalk language. Although the really good stuff isn’t up yet, enough is there that you can at least get a flavor for commercial Smalltalk development tools, and see how the language differs from ones you might be more used to. No, it’s not my favorite Smalltalk environment, but the videos still give a nice overview to what makes Smalltalk special.

WWDC Wednesday: Fire and Motion

As you may or may not be aware, one of the features of Mac OS X Leopard is called iChat Theater. iChat Theater allows you to share arbitrary sound and video via AIM. The feature is great for end-users; showing a friend a funny movie clip, or your parents a photo album of your kid, has become wonderfully trivial. In what for Apple qualifies as a rare nod for developers, they have publicly exposed a clean API to allow your applications to hook into this framework. Basically, all an application must do is draw both to its window’s contentRect and to a contentRect provided by the framework. That’s it. All the other voodoo (NAT, quality scaling, firewall, and so on) is taken care of by the OS X IMService framework.

Unfortunately for us, one of the standard applications that Apple has hooked in is Remote Desktop, which means that users simply need to click a button and they can see another user’s screen. Again, because this uses IMService, Apple’s solution plows through NATs and firewalls and automatically rate-limits. When I first saw this on Monday, a friend of mine who is intimately involved with Leopard simply said, “Sorry. What can I say? It was a good idea.”

A couple of days sleep has helped me to distance myself a bit from the situation. Firstly, Leopard won’t come out until Spring next year, and the people who need help are the ones least likely to upgrade. So today, I went to the iChat Theater session to learn how this new system works.

The good and bad news is that Leopard’s remote assistance doesn’t actually use IMService. The reason that’s good is that, currently, remote control requires both parties have exposed public IP addresses with dropped firewalls, meaning that it has the same problems as MSN’s remote control. Although the developers have said that they intend to fix these problems by the final build, the fact that they’re not going through IMService, and that IMService took about three years to get it right, means that I’m not too worried they’ll actually get everything properly working. So that’s good.

The bad is that it means we won’t be able to hook into iChat the same way that they do. The public API is called IMAVService. IMAVService has no mechanism whatsoever to return input from the remote end. That makes it a complete nonstarter not only for us, but for many innovative ideas.

Despite Apple’s strong emphasis on resolution independence, the IMAVService framework caps presentation video at 320x240. The smallest display Apple sells right now is 1152x720—10 times as many pixels. The ratio they chose is also just bizarre; 320x240 is a 4:3 ratio, whereas all Apple displays are between 8:5 and 16:9, meaning that an awful lot of content has been optimized for letterbox-like displays while iChat Theater expects TV-like displays. Further, the entire thing ties into CoreVideo’s CVPixelBuffer in such a way that I seriously doubt they ever intend to support sending vector commands over the client. Combine all of these observations and it seems to me that iChat Theater is condemned to permanent blur-hell.

For the moment, I think it makes good sense not only to finish, but to continue work on Aardvark Mac. We’re cross-platform, we’ll work in situations where iChat probably won’t, and we’ll run on all Mac versions—not just Leopard. I still think it would be very wise to keep a very close eye on this technology in the future.

Objective-C 2.0: the Bad, the Horrible, and the Ugly

Apple announced Objective-C 2.0, the first major change to the language ever. The features should all be great—support for properties, inbuilt iterator syntax, much higher abstraction in the runtime, and more—but ultimately, though many of these modifications attempt to solve real problems in Objective-C, they nearly universally solve them inelegantly and inconsistently.

One of my biggest complaints comes with the iterator extension. In Objective-C 1.0, to iterate over a collection, you had to use the Java 1.4 method: you got an NSEnumerator object, then repeatedly called -[NSEnumerator nextObject] until there were no elements left in the collection. Objective-C 2.0 introduces a new iterator object, NSFastIterator, and modifies the for iterator in C so that you can iterate over Cocoa collections using NSFastIterator, basically using the same general concept as Java 5 for its implementation.

I find this solution fundamentally flawed. Objective-C, until now, has always been extremely good about not modifying the C core at all. Objective-C syntax extensions were always deliberately alien to C so that there could be no confusion. Keywords were prefixed with @ (e.g., @selector(objectEnumerator), @interface, @implementation), method calls [lookLike this], and so on, so the logical way to add iteration to the language, if you’re going to do it that way, would be to add an @foreach keyword. Instead, Objective-C 2.0 overrides the for operator, so that you can now write

for (NSString *string in anArray) {}

This needlessly confuses ObjC and C features, and sets a very bad precedent: Objective-C features can be now intermingled with classical C.

This complaint, though, sidesteps an even bigger complaint: iterators are, at their core, trying to be a special, niche version of generic lambda blocks—in this case, a block that takes one argument and does something with it. The “proper” solution then, rather than adding a @foreach keyword, would be to add blocks/lambda methods to Objective-C.

Currently, Objective-C and .NET basically take the same tack for callbacks: you can pass around a method, but that method must exist and be named. So, for example, in ObjC we have

[[NSNotificationCenter defaultCenter] addObserver: self
                                         selector: @selector(fooDidHappen:)
                                             name: NSSomeNotification
                                           object: nil]

which, if .NET had NotificationCenter, would be equivalent to

NotificationCenter.defaultCenter.addObserver(this,
                                             delegate(fooDidHappen),
                                             SomeNotification,
                                             null)

We see the same idiom anywhere that we require a callback. What I want, instead of passing an object and a method, is to be able to pass an inline function. If I had that, then the for modification would be unnecessary; I could simply add a method -[NSCollection eachObjectDo:] that took a block and passed it each element of the collection in sequence. Sadly, not only is this not the path Apple took, but there’s no indication that it even almost made the cut. Surprisingly, Microsoft will be adding this feature to .NET 3.0, and so .NET 3.0 will be superior to Objective-C 2 in this regard.

Objective-C 2.0 also adds properties. This has me fundamentally irked. Joel has complained of the problems with leaky abstractions. To be honest, though I agree with most of his individual points, I disagree that abstractions are dragging us down; every single abstraction in the computer except raw wires are abstractions, all are at least somewhat leaky, and all that are properly designed, in my opinion, help much more than they hurt.

Properties at first seem like a good abstraction, one that helps more than hurts. Objective-C rigidly adheres to Smalltalk’s concept of how visibility should work: using Java terminology, all instance variables are protected, all methods are public. This has clear implications in how you think about your program. Historically, the inconvenience of getting instance variables has strongly encouraged a tell-don’t-ask design philosophy. When you do ask or assign, it’s clear that what you’re doing may have side-effects: [foo setBar: baz] is clearly a method invocation, and I should expect that side-effects may result.

With Objective-C 2.0, we gain .NET-style properties, and have assigned foo.bar = baz as the syntax. This is another example of Objective-C now modifying C syntax, only this modification is not simply inconsistent; it’s deadly. Firstly, the above code implies to any decent C programmer that foo is a struct. Just from the above code snippet, though, we actually have no idea. It could either be a struct or a class. Or, hell, it could be a union. Now we have three radically different concepts sharing exactly the same syntax. Worse, though, to most programmers, the above syntax implies a lack of side-effects beyond the value of foo.bar changing. After all, that’s the behavior of both struct and union (and classes in C++, for that matter). But since, as in .NET, Objective-C properties are actually full-blown methods, we have no guarantee that further side-effects won’t occur, and in fact we see this happen in Apple code. One of the demos they have for Core Animation shows how easy it is to make an object fade out. All you do is

someObject.opacity = 1;

This doesn’t simply set the value of some opacity variable, but actually begins an animation of that object’s fade-out in a new thread. Not only is this not what I would consider the expected behavior, but it also means that, for example, you can throw one of NSThread‘s exceptions within a variable assignment. This is a leaky abstraction that, in my opinion, does far more harm than good.

These complaints aren’t specific to Objective-C 2.0; they apply equally well to Java 5, .NET, Delphi, and other similar systems. So these mistakes aren’t new, and I’m actually somewhat in the minority for considering them mistakes to begin with, but I’m sticking to my guns: these are horribly leaky abstractions solving a non-problem. I would rather have seen any of three alternatives:

  • Emphasize that you can access ivars directly by allowing foo.bar in addition to foo->bar. I don’t like this solution, but at least it makes foo.bar = baz do what I’d expect it to.
  • Assign a different syntax to properties, perhaps someObject!xCoordinate = foo, [someObject xCoordinate = foo], or any syntax other than dot notation so that programmers have at least an inkling that this is not their daddy’s dot operator.
  • Implement @property so that it only generates -[MyObject setFoo:] and -[MyObject foo] methods and nothing else. Shelve the whole dot syntax. This cuts down how much code we have to write, but doesn’t provide us with a false view of how things actually work. In my opinion, this is the best of both worlds.

Sadly, Objective-C seems content to repeat the mistakes of its predecessors. This does not herald the wonderful new direction for Objective-C that Apple would like to have you believe; it heralds the beginning of the end of Objective-C’s uniqueness.

Building the Squeak VM

When I’m not busy coding in various dialects of C for Fog Creek, I enjoy coding in Squeak Smalltalk as a hobby, and I try to stay active in the community when I can. Unfortunately, lately, I’ve been slacking rather spectacularly, and perhaps because of that I missed that Göran had assembled a nice tutorial on how to build a Squeak VM for Linux/Unix. The tutorial should get you a fresh-built VM in a couple of minutes, and a custom-built one not too long after that. Do note that, if you’re on OS X, you’ll want to do

configure --with-quartz --without-x

before you make, or it won’t build properly.

Update: Göran updated the tutorial to contain the Macintosh instructions, so you can now just go straight there.

Gearing Up

After a semester of virtually no coding, I’m preparing to return to New York to do more work for Fog Creek on Fog Creek Copilot. This will be the first time in my career that I’m returning to a code base that I haven’t seen in a bit over a year.

In some ways, this is very exciting. Knowing that something I put so much time and effort into has proven to be truly useful to a large number of people is extraordinarily gratifying, and getting a fresh chance to fix all the little things that I ran out of time to fix will be wonderfully cathartic (and make Copilot an even better product than it is now).

In other ways, though, I’ve rarely had more trepidation about an assignment. The parts of Copilot adapted from TightVNC are disjoint and not well written, and a lot of the knowledge I picked up last year about how the code actually works has probably leaked out of my brain. A lot of that we documented; some of it, we did not. I’m going to be very interested to see exactly how quickly I can return to being productive when working with the Copilot code base. I have no doubt that I’ll have no problem adding the features we want to add; I just feel equally sure that I’ll have at least a small ramp-up period as I get re-acclimated.

Update: I forgot to mention, but the end-user components of Fog Creek Copilot are open-source. You can download the current version by following the links from the FAQ page.

When Cussing Should Be Okay

So, today I spent roughly two hours hacking the MySQL driver for Squeak to add the DESCRIBE command and adding MySQL support to ROE. ROE is an awesome package that lets you write queries directly in relational algebra in your Smalltalk code. It generates all of the SQL to actually execute the query and delays execution until as late as possible. The effect is very nice. I’m currently working on a tutorial for how to use ROE in Seaside, and as part of that, I thought it’d be nice to add MySQL support. Until MySQL 4.1, that would have been impossible, as ROE relies heavily on subselects to implement its logic, but since MySQL 4.1 now supports them, I thought I’d give it a shot.

Now, the first of those two tasks, once I had read through the code for the MySQL driver and gotten to understand it decently, only took about 60 seconds. DESCRIBE ends up just being a normal query, so all I had to do was modify the driver to treat DESCRIBE like SELECT. But the second one took a long time, and ultimately underscored to me yet again both why MySQL is simply a Bad Idea and why a little bit of up-front research can save you hours of frustration later on.

The first problems were “tiny” annoyances. MySQL 4.1 changed the authentication algorithm from the one used in 4.0, which wouldn’t be a big problem except that the new communications protocol isn’t publicly documented anywhere. (If you Google, some people have published descriptions of it from analyzing the source code, but the official documentation has not yet been updated to provide a good description.) There’s a toggle to downgrade MySQL’s authentication to 4.0, but it’s not retroactive, so you have to re-encode the passwords. It’d have been nice if they had, you know, documented that somewhere, but whatever. Once I got that sorted out, I added some tables, filled them with dummy data, and started writing an adaptor for ROE.

Now, in my book, when it comes to breaking standards, there are a lot of different ways to do it. There’s breaking the standard by adding features in odd ways that no one else supports, but everyone needs (NVARCHAR and friends in SQL Server, and arguably AUTO_INCREMENT in MySQL); there’s breaking the standard by not implementing features everyone else has (sequences, views, triggers, etc. in MySQL); and then there’s just totally going off and doing your own thing even though the standard already solved the problem you’re having. I refer here to the way that MySQL treats quotes. In every other database I’ve ever used, single ticks (') mark strings, double ticks (") mark identifiers with otherwise illegal characters, and back-ticks (`) are illegal. In MySQL, single ticks mark strings, double ticks also mark strings, and back-ticks mark identifiers. Why’s that matter? Because it means that I had to add in additional abstraction functionality for identifier quoting in ROE just to support MySQL, which refused to accept queries such as SELECT "id", "title", "body" FROM posts that execute fine in at least PostgreSQL 7 and 8, DB2, and SQLite 2 and 3. That’s impressive, fellas. What exactly was wrong with SQL92 there? Too plebeian?

Now, ROE depends heavily on subselects to write its queries. PostgreSQL optimizes them fine. Even SQLite optimizes them fine (although it took Avi Bryant and me awhile to chew through its rather cryptic EXPLAIN output and verify that).

MySQL generates temporary tables for every subselect.

Technically, I suppose that that does qualify as having subselect support, but just…wow.

So basically I wasted several hours of my day today making ROE run on a database that can’t support it decently. I’m keeping that particular Squeak image around in case MySQL 5 has better optimization, but the lesson here is that you should always check beforehand whether what you’re doing makes sense.

That, and stay away from MySQL.

The Art of Missing the Point

Sometimes, you’re confronted with someone who so stubbornly refuses to understand what you’re saying that you just want to cry. Let’s ignore for the moment that a lot of the hype around Rails is…well, hype. Even with that in mind, contrast the tutorial video for Rails with the tutorial video for Trails, a JSP-based framework that aims to do the same thing. Even if you completely ignore issues like scaffolding, the difference between these two videos makes me downright sad. For example:

  • Methods for properties all have to be specified manually
  • Hibernate tags aren’t generated automatically
  • Eclipse isn’t setup automatically by default
  • The WAR property file isn’t correct by default
  • The tiniest changes require a trip back to ant

It’s just ridiculous. What these people have made is a bad clone of Rails’ scaffold macro. That’s not even the point of the framework! The point is that it’s so dynamic you can easily add functionality piecemeal and check it dynamically as you go. The point is that it adheres tightly to the DRY principle. The point is that you spend virtually no time writing boilerplate code. The point is that it eschews gobs and gobs of XML crap so that you can actually focus on coding logic and presentation instead of ten miles of glue code. The video of Trails shows that Java developers either cannot solve these problems or do not believe they have a problem in the first place.

Do Java developers enjoy writing boilerplate code? Do they enjoy spending their day fixing properties files? The answer of course has got to be “no.” The alternative simply doesn’t make sense. Yet I’d say that three quarters of the video is dedicated to boilerplate code and tweaking settings. Smalltalk and Lisp developers, as well as those of other dynamic languages such as Python and Ruby, miss the fact that the appeal of Rails to many is that it provides a dynamic environment at all. Smalltalk and Lisp users in particular are addicted to live objects and the REPL. Rails, I think, may be the first time that a lot of developers are seeing this style of development. That probably has contributed a lot to its hype.

Comparing these two videos, I sincerely hope that more Java developers see the light, but watching the Trails introduction makes me wonder whether we need a drastically new approach before that happens.