« XML-RPC Interface | Poland: Global Power » |
Today I owe a shoutout to the Waypath Project. Steve Nieker was kind enough to share his list of about a hundred thousand websites, and suddenly my crawler went from adding 200 blogs per hour to adding 11,000. At this rate, we might hit 200,000 weblogs indexed later in the night.
The sites being added now have the proportions of a martini, in which the gin is represented by Pitas.com, and Movable Type serves as the vermoutn. A dry martini. With a Manila olive in it.
The Waypath Project is worth a visit because they are trying to do two very challenging and cool things. The first is, provide a per-post weblog search, rather than the kind of per-page search you can get on Blogdex or Google. The second is to search things based on similarities in content, rather than just doing keyword matches. This is the kind of stuff I've taken to calling 'semantic indexing', just to completely muddy the semantic waters. Their core technique is proprietary, unfortunately, so there's no code you can go poke your big Slavic nose into. But it's still nice to see a content technique actually implemented on live data from the Web.
Thanks in large part to Google, search engine designers have learned the importance of analyzing hyperlinks to improve their search results. On the content side, however, the approaches have remained pretty rudimentary. Various cool methods for calculating content-based similarity are being explored in the academic world, but those algorithms don't often leave the ivory tower, where they are mainly valued for their ability to impress a tenure committee.
Waypath is still quite experimental, so you have to approach it with a good deal of patient understanding. But it's a fascinating site to play with, as you look at your results and try to suss out why the engine made certain connections. And as with all good things, there's a blog attached.
« XML-RPC Interface | Poland: Global Power » |
brevity is for the weak
Greatest Hits
The Alameda-Weehawken Burrito TunnelThe story of America's most awesome infrastructure project.
Argentina on Two Steaks A Day
Eating the happiest cows in the world
Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?
No Evidence of Disease
A cancer story with an unfortunate complication.
Controlled Tango Into Terrain
Trying to learn how to dance in Argentina
Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting
Attacked By Thugs
Warsaw police hijinks
Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews
A Rocket To Nowhere
A Space Shuttle rant
Best Practices For Time Travelers
The story of John Titor, visitor from the future
100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law
Every Damn Thing
Your Host
Maciej Cegłowski
maciej @ ceglowski.com
Threat
Please ask permission before reprinting full-text posts or I will crush you.