Cormac Twomey

Cormac Twomey is a nice guy. I met him at this year's Emerging Technology conference, and we geeked out on search engines several times. He and his company are big believers in RDF, they have a nice demo search engine of medical data, they don't seem to be evil. Cormac is a compact and friendly guy, you wouldn't think twice about introducing him to the parents.

Yet today I find Cormac pilloried at Mark Pilgrim's website, for the sin of ignoring a robots.txt file and repeatedly downloading image files. Having recently written my first crawler, I'm shaking in my boots a bit at the idea of this kind of vigilante justice. It's very easy to make programming mistakes, or lazy errors in judgement, when testing out a crawler. I was mortified to find that my own crawler had been re-indexing sites that gave an error, over and over again, but fortunately none of those sites was as trigger-happy as Dive Into Mark, and I was able to fix it without scandal.

Mark's response seems disproportionate. Like shooting at the kids who try and climb into his plum tree.

But Cormac - a warning. Try swiping bandwidth from Idle Words, and I'm getting Orrin Hatch to destroy your computer.

