September Writing Challenge, Post 6: My Favorite Trick With Swift Enums

I have to credit TJ Usiyan with enlightening me about this use of Swift enums with associated values. Thanks for that, TJ!

If you’ve used Cocoa APIs much, you’ve seen the pattern of using an NSError pointer as an “out” parameter. In Objective-C:

Or in Swift:

I am not a fan of this pattern, for two reasons (or three, if you count aesthetics).

First, the use of an out parameter, like all side effects of processing, breaks the idea of a function. Arguments should go in, a result should come out, and there should be no mysteries about what happened in between, and you should have one place to look for the output. This business about putting an important chunk of output state (the existence and kind of error plus its metadata) out-of-band seems, at best, to be a misguided attempt to preserve the purity of the intended, happy-path output type.

If that’s all we cared about, we could fix it like so (sticking with Swift from here on):

But this also suffers from the second problem with the pattern: You have no idea of the legitimate output states (unless there is documentation, and it is accurate, and you have read it – a triply-nested conditional). It’s pretty likely, however, that the legitimate output states are a subset of the possible output states.

Intuitively, we think we should get a record OR a parsingError, but not both. And that’s kind of a pain, because we still have to check one or the other for being nil, and we should probably check the other one just to be safe, and that involves a lot of conditionals, and as I’m fond of saying, unnecessary conditionals are bad, mmmkay?

But hey – what if your intuition is wrong? What if you could have both fields in that struct filled? Maybe the record could be parsed, but only incompletely, so you get back a partially-filled record and a parsingError letting you know that there’s stuff missing.

And what if both the record and the error are nil? That sounds insane, but I’m sure someone could contrive a scenario where that might be legitimate. Somewhat easier to concoct is the scenario where both come back nil because of a bug.

This pattern involves a ton of trust – the caller trusts that the code puts out only intended, documented combinations of record and error, despite having no assurance at the language level that this must be so. The writer of the parse() function is trusting that the caller is going to correctly handle all specified output states, and never force-unwrap a nil record object or what-have-you.

Well-designed systems obviate trust. (Ever signed a contract?)

So what is The Better Way™?

Ponder that a moment… This makes it so much harder to screw up. To wit:

Are you worried about whether a bug will cause both fields to come back nil? Don’t sweat it – it can’t happen. You have exactly two possible states, and the associated objects for both are non-optional. What about getting an error and a result back? Nope, can’t happen. And if you wrap your handling in a switch statement, you are forced to handle both legitimate states. (That is, unless you include a default case, but I’m going to put a stake in the ground right now and say that the use of default when switching on a Swift enum is a code smell.)

But what about that case where the record was parseable, but only incompletely?

…and now you can get back your partial record with an error telling you whether the parsing was incomplete due to a short record or a missing delimiter or what.

And calling code that switches on the result must account for that case somehow.

Another thing I’ve seen – and this can happen at serious companies run by grown-ups – is a poorly designed record format where the field delimiters are in-band characters that can be mistaken for record data, and multiple interpretations of the record are possible. Can we handle that case?

…and now you can pass back a set of possible interpretations of the ambiguously-formatted data and let the calling code ask the user or take a guess or whatever to select the right one.

I have my beefs with Swift, but enumerations with associated values is a place where Swift gets it so very, very right. You can define system states in a way that is both complete and exclusive – you can make illegal states unrepresentable, to borrow a phrase. No more nil checks, no more wondering what the legit outputs are, no more worrying (or anyway, less worrying) about whether a junior dev calling your code will handle all the cases. The compiler has you covered – so now you can spend your effort on the real, value-adding stuff.

Comment fodder: Do you have any other cool examples of making illegal states unrepresentable? No need to limit it to Swift – I’d love to learn more about how other languages do it!

September Writing Challenge, Post 3: Go Broad, Not Deep

I know developers – mostly, younger than me – who have most of the APIs of the CocoaTouch SDK memorized, or nearly so. Twenty years ago, I was one of those devs, except with MFC and ATL in Visual C++. Before that, I was pretty tight with the Windows 3.1 API. And before that, I could write assembler for my Commodore 64 without looking at a reference (much).

These days, I spend a lot of time looking at the docs for whatever SDK I’m working with. A lot. I read over the release notes for major version upgrades, so I know what’s there, and where to find it. But if a job interview hinged on me knowing the exact method signatures of everything in the UITableViewDelegate protocol, though, I would not get that job.

What changed for me? I don’t think my brain has gotten that much less nimble. I do, however, spend my attention on very different things.

On most modern web, mobile, and native desktop platforms, the API you’re learning today will change significantly in the next year. It will be legacy folderol in five. And if you change your specialty or career track, deep knowledge of the specifics of the API you were working on yesterday drops to near-zero value tomorrow. Going deep on details that will be useless in such a short timeframe does not seem like a good investment of my attention.

On the other hand, knowing the best and worst case time and memory bounds on common sort algorithms will be useful throughout my software career. Knowing how to run an Agile team is useful regardless of what technology I’m working with. Clean Coding practices will make (and have made) my code more robust regardless of what language I’m using. (Except Perl. I mean, come on.)

My high school geometry teacher told us, “life is too short to memorize the Law of Cosines”. There is information you rarely use that can be easily looked up, and you’re better off studying the stuff you’re going to use every day.

So I try to go deep on those things that I can use broadly – process, CS fundamentals, and craft. The APIs I use often enough will stick. Those that don’t can be looked up or autocompleted. And next time I want to change specialties (as I’ve done probably four or five times in my career), I’ll have a foundation that can’t be matched by someone who spent his attention on more ephemeral stuff.

Comment section fodder: What are the things you focus on that you expect to serve you well for the length of your career?

September Writing Challenge, Post 1: The Worst Thing I Have Seen in Object-Oriented Code, and How to Fix It

This is post #1 in my 30-day writing challenge for September. A couple of notes:

  1. This post is about a technical topic (because I am, after all, me), but not all my posts this month will be. So if today’s post bores you or makes your eyes glaze over, try again tomorrow.
  2. This went way over the five minutes allotted – so far over that I just gave into the urge to write a (nearly) complete post about it. I think one thing I might try to learn from this challenge is how to break down and/or distill an idea to where it can be expressed in five minutes of top-speed typing – so 400-500 words with no links, editing, or formatting, less if I want to be fancy. And maybe I could back off on the editing perfectionism. It might also help if I avoid topics that activate my ranting gene.

Anyway, on with today’s episode…

This topic goes a little beyond “pet peeve” for me. It’s an anti-pattern I’ve seen far, far too often from novice – and rarely, intermediate – coders in object-oriented languages, and it’s guaranteed to make code bug-ridden and insanely expensive to maintain. I know this, because I’ve had to alter, debug, and refactor code like this.

Here’s how it happens: You have a class – let’s say, a table cell. The cell shows an employee record with, let’s say, name, ID#, and salary. Very straightforward.

A requirement is added that two different employee types (regular employee and manager) should have different background colors. So you do something like this in your Employee class (pardon my Swift):

…and then in the code that loads the table cell:

So you’ve adjusted the background color according to employee type. The requirement is met. Pretty benign, yes?

Then the requirement is added that company officers (who are also managers) need an extra line added to show their equity-based compensation. So, you add another type to the enum (now it’s Employee/Manager/Officer), and in the function that computes the table cell’s height, you add an if statement that computes the height one way for an Officer, and another way for everyone else. And oh, yeah, you go back and make sure the Officer case is covered when you set the table cell’s background color.

Then you add a requirement that you must handle contractors differently: They need a third background color, they show hourly rate instead of salary, they need an extra line in the table cell – but this one shows their security clearance – and when the cell is selected, it takes you to a different kind of detail view than the other types. So you add a type to the enumeration, add a case to the background color switch statement, change how the cell draws so that Contractor gets an hourly rate, while everyone else gets a salary, change the if in the cell height computation to be a switch and compute the height for a Contractor like you do for an Officer (since they’re both adding an extra line), and you add another conditional in the cell selection response to differentially choose what kind of detail screen comes up based on employee type.

But then the Officer type needs to add yet another line to the table cell to indicate how many shares of the company the officer holds, so you have to go break apart the conflated Officer and Contract cases in the cell height computation and turn them into two separate behaviors.

And then a type is added for a part-time employee, which also has an hourly rate but no extra line…

And you start using this type enumeration to switch between business logic cases elsewhere in the code…

(By this point, many of you know where I’m going. If you don’t, please, please read on.)

Every single one of those switch/case statements becomes an opportunity to forget to add a case. (Less so in a few languages, such as Swift, that demand that you cover all cases – but even then it’ll bite you if you add a default case.) Very quickly, every switch/case become a tangled mess where cases are conflated and you’ll have to separate them when requirements change. What used to be a simple table cell becomes a 2000-line behemoth. Listen: I am not exaggerating. I have seen this table cell with 2000 LoC, and methods hundreds of lines long, because of exactly what I’m describing here.

And what do you think the Employee class looks like? Or any other class that touches the Employee class?

Every time you have to make a change related to employees, you dread it, because you know you’ll spend a day or more combing through all the cases and debugging all these complex decision paths for even the simplest change.

Where did we go so wrong?

The screw-up was here:

Actually, even that is too much. The screw-up was here:

If you catch yourself representing a type with an enumeration (or a similar device in your language of choice), stop. Stop. Stop! STAAAAAAAAHP.

stahp-sign

There’s a better way to represent types in an object-oriented language. You represent a type with a type:

And then the thing you absolutely do not do is switch on the class of an employee instance to determine behavior in other classes like your table cell. (I’ve seen that done too, and it’s even worse than the enumeration anti-pattern.) When a component like a table cell needs to change behavior based on a type, you make a table cell class for each class of worker you have to represent. Each child class contains only the things that make it special, and commonalities are moved up to a parent class.

What you wind up with instead of the tangled mess of switch/case statements is a larger number of much smaller classes, each with a radically reduced number of conditional statements – which means a radically reduced number of chances to screw up.

This is what objects are for in an object-oriented language – tying together groups of related types that have some common behavior and some divergent behavior, while minimally expressing the things that make each type divergent. This is the primary way we manage complexity in the OO milieu.

Of course, now we have multiple parallel class hierarchies – a hierarchy of Worker types, a matching hierarchy of table cell types, maybe a matching hierarchy of components to calculate compensation… Does that sound unwieldy or difficult to manage?

You and I are not the first to notice the problem – it is addressed with the Abstract Factory Pattern. I’d describe that to you, but it is amply explicated elsewhere. And I urge you to learn it, because it’s tremendously valuable for managing complexity in your code.

And I promise, I’ll follow up this post with an example of how to use that pattern in a case like this. (Though possibly not in September.)

Does anyone else have any computing language abuses they’d like to share? Any language, any paradigm – drop some science in the comments below!

Tuesday Talk

A note for interested parties: I’ll be giving a short (10-15 minute) talk on using Ruby with Hadoop for distributed computing. The plan is to give an ultra-brief description of the MapReduce algorithm and Hadoop, show 2 examples of working code (including one with Wukong, Flip Kromers Hadoop Streaming wrapper), and closing notes on my attempts to use JRuby to create Hadoop jobs.

The talk will be on Tuesday at the NYC.rb meeting, 7:00 at Bway.net’s offices. Get there early for a seat, the last one was packed.

Status Report

There’s a lot going on.  Yowza.

For starters: I’ve left my day job and gone back to consulting. As is my policy, I’m not going to reveal clients here, but they’re an interesting cross-section of business verticals and projects, and one prospective client in particular already has me in touch with people on at least 3 continents, all from my desk in Kew Gardens. I’m feeling very much like one of those digital nomads that the post-Web-2.0 techno-hypesters like to talk about. And liking it.

I also have a part-time, on-site gig in a very trendy New York neighborhood. On some evenings, there is a truck vending “artisanal ice cream” parked outside this client’s offices. (I have not yet seen it move, nor heard it play an endless loop of 8 bars of “Pop Goes the Weasel”.) It’s little details like that that keep the New York experience refreshingly weird.

Also – and I should have posted about this ages ago, but you know how it is – the Rails Rumble 2008 was a blast, and I know more in my bones about building complex messaging systems than I did before the Rumble. Our entry was a multi-player word game, and you may play the Rumble incarnation of it here. We’ll be blogging about Rumble lessons and putting up an update of the game in the next few weeks.

And speaking of “we”, Gabe, Abel and I have added a whimsical name to our hacker cabal; blog posts about the Rumble experience will be posted over at Kickass Labs. (Note to Abel: Get a site up already so I can link to you properly. A one-pager will do.)

In other news: When I’m not hustling paying work or hacking w/ the KAL crew, I have plenty of my own projects to work on. To wit: I have a goal to fix a minor bug in GuitarCardio this week, I will probably take down Rewardist for the time being, and I’m currently investigating solutions to data representation issues in my super-top-secret Hadoop project.

And that’s enough blogging. Back to work.

Ruby One-Liner

I know it’s not good engineering practice, but I do love code golfing and writing one-liners in Ruby. This turns a tab-delimited flat file into an imperfectly compliant CSV file:

File.open($*[0]) { |f| puts f.readline.strip.split("\t").inject([]) { |newfields, field| newfields << "\"#{field}\"" }.join(',') until f.eof? }

It works well as long as there are no commas in your fields.

Fix for sad sed

I love that Mac OS X is really BSD under the hood, but sometimes it causes unexpected pain. More than once I’ve tripped over the differences in date between Linux and Mac OS X, and over this weekend I’ve had trauma with sed.

Specifically, if I have a single character I want to replace in a file – like replacing an out-of-band delimiter in a flat file with a TAB character – I’d normally say sed 's/\d197/\d009/g' < foo > bar. Sadly, this doesn’t work with Mac OS X’s sed.

Ruby to the rescue, in the form of my own sadsed.rb:

And then just ./sadsed.rb foo > bar and away you go.

And for the record: I’m dealing with some big files, and Ruby’s Regexp is pretty snappy.

CSS Floats and Quantum Mechanics

Yesterday, I spent more time dealing with the getting floated page elements to look right than I care to admit. I am not kidding when I say that I found quantum mechanics more intuitive than I find CSS floats now.

Of course, I put more effort into studying quantum mechanics. And maybe there’s some similarity here – once you learn a few hairy rules and what they really mean, the weird results make sense.

That was more gripe-y than educational. I’ll post something meatier next time.