Tuesday Talk

A note for interested parties: I’ll be giving a short (10-15 minute) talk on using Ruby with Hadoop for distributed computing. The plan is to give an ultra-brief description of the MapReduce algorithm and Hadoop, show 2 examples of working code (including one with Wukong, Flip Kromers Hadoop Streaming wrapper), and closing notes on my attempts to use JRuby to create Hadoop jobs.

The talk will be on Tuesday at the NYC.rb meeting, 7:00 at Bway.net’s offices. Get there early for a seat, the last one was packed.

Ruby One-Liner

I know it’s not good engineering practice, but I do love code golfing and writing one-liners in Ruby. This turns a tab-delimited flat file into an imperfectly compliant CSV file:

File.open($*[0]) { |f| puts f.readline.strip.split("\t").inject([]) { |newfields, field| newfields << "\"#{field}\"" }.join(',') until f.eof? }

It works well as long as there are no commas in your fields.

Fix for sad sed

I love that Mac OS X is really BSD under the hood, but sometimes it causes unexpected pain. More than once I’ve tripped over the differences in date between Linux and Mac OS X, and over this weekend I’ve had trauma with sed.

Specifically, if I have a single character I want to replace in a file – like replacing an out-of-band delimiter in a flat file with a TAB character – I’d normally say sed 's/\d197/\d009/g' < foo > bar. Sadly, this doesn’t work with Mac OS X’s sed.

Ruby to the rescue, in the form of my own sadsed.rb:

And then just ./sadsed.rb foo > bar and away you go.

And for the record: I’m dealing with some big files, and Ruby’s Regexp is pretty snappy.

Lessons from an Unintentional Launch

As I mentioned previously in this space, I’ve written a web app called GuitarCardio.com. Of course, posting about it here informs nobody but a few family and friends, and then only a few days later when they get around to checking the site. Likewise with my Twitter feed (which as an even more circumscribed set of people who care), Facebook, etc.

I also post to a few guitar-related blogs and fora, and I got a few hits from that. It’s not lots of traffic, but the people who came were at least interested in the topic.

The other night night, on a lark, I put GuitarCardio on StumbleUpon, which is a neat little tool for finding new stuff on the web. Within hours, I had over 5,000 new visitors. In the past 48 hours, it became over 12,000 new visitors, many of whom were staying at the site. GC peaked at #2 on the del.icio.us popular links, and made the front page of popurls. People actually gave pleasant, useful comments on the blog, and I got linked on Twitter and on multiple blogs.

I’m not so much bragging about any of this, but more expressing how agog I am at the whole notion. It had not occurred to me that any of that stuff might happen.

The even better news is that the site stayed hyper-responsive (within the limits of a good shared host) through the whole traffic wave. A few 500 errors or “sorry, my database is bogged” messages, and you’ve dug a pit of negative goodwill from which your site might never emerge.

So, lessons learned:

  • The site stayed responsive in part because currently there is no database component to the site (though this will change soon). Response codes for every one of the tens of thousands of requests over the past 48 hours have been either 200 or 302 – no 500’s, which is the bane of Rails apps on shared hosts.
  • I host no graphics on the page, which I think also helped performance. It is possible to make an eye-catching design with just HTML and CSS.
  • If you’re going to use a service like StumbleUpon, be ready. If my site weren’t able to handle the traffic, I wouldn’t have been able to do anything about it from work yesterday morning. I kind of lucked out there, due to my minimalist design decisions.
  • Services like StumbleUpon can work for you, if you’ve made something people actually want.

All the activity was really a surprise to me, as this is the first time I’ve put something (of my own, anyway) on the web that anyone I don’t know was actually interested in. (This is no great surprise – my interests and priorities intersect with those of the general population only rarely.) So thanks to all those who did care enough to say something nice about GuitarCardio – who voted it up on StumbleUpon, and who tweeted, blogged, and bookmarked it.

Now, I need to digest some of the great feedback I’ve gotten on the GuitarCardio blog, pore over the Google Analytics data, see how my ads performed, &c.

And then, I’ll be making the site even better. The GC blog will cover the features I add and when I’ll be unveiling them, and I’ll be posting about technical lessons learned here.


Painful Ruby Inheritance Quirk

Let’s say you have a class Foo:

Further, let’s say you have a class Bar, derived from Foo, that redefines a couple of things:

It seems kind of obvious that:

But would you have guessed that:

This caused me no end of headaches during a weekend bout of refactoring. Pushing default constants up to the parent class completely hosed me up in those child classes that tried to override them. Overriding dumb getter methods that just return constants, though, works fine.

I’m sure there’s some computer-sciencey reason for this that Matz could explain to me, but I had really expected the every-call-is-a-message nature of Ruby to handle those two cases more consistently.