Markup Proposal

Part of the problem in indexing weblogs is finding them in the first place. Weblogs.com and sites like it are a start, but there are plenty of weblogs that don't announce their updates anywhere. The only way to find them is by crawling.

Once you've found a weblog, you still have a problem. It's not easy to find dates, link lists, or boundaries between weblog posts. There are a zillion different formats, and none of them are all that consistent. It would be nice to offer a per-post search engine, for example, but right now it's not feasible. Is that text a post, or a comment, or a TrackBack, or part of the template, or what, exactly?

A couple of days ago, I traded ideas around with Dave Sifry and Steve Nieker, and we came up with a proposal for blog tool writers. Four small changes that would make weblog pages much easier to identify and parse:

  1. An identifying tag in the HTML header:
    <meta name="generator" content="BlogTool 1.4" />
  2. Delimiters around each post (with an optional GMT datestamp):
    <!-- begin post [published='hh:mm:ss mon-day-year TZ'] -->
     <!-- end post -->
  3. A delimiter around the blogroll:
    <!-- begin linklist -->
     <!-- end linklist -->
  4. Permalinks explicitly labeled:
    <a name="ID"  ... >

Crufty, inelegant, and a pale shadow of what RSS offers, sure. But it's something that would make a majority of sites more visible to search engines.

Textpattern gets not only fulsome praise for being the first CMS to sign on, but additional style points for requiring me to replace the word 'blogroll' with something less linguistically odious.

What do you think, gentle reader?

