« XML-RPC InterfacePoland: Global Power »
05.15.2003

Waypath

Today I owe a shoutout to the Waypath Project. Steve Nieker was kind enough to share his list of about a hundred thousand websites, and suddenly my crawler went from adding 200 blogs per hour to adding 11,000. At this rate, we might hit 200,000 weblogs indexed later in the night.

The sites being added now have the proportions of a martini, in which the gin is represented by Pitas.com, and Movable Type serves as the vermoutn. A dry martini. With a Manila olive in it.

The Waypath Project is worth a visit because they are trying to do two very challenging and cool things. The first is, provide a per-post weblog search, rather than the kind of per-page search you can get on Blogdex or Google. The second is to search things based on similarities in content, rather than just doing keyword matches. This is the kind of stuff I've taken to calling 'semantic indexing', just to completely muddy the semantic waters. Their core technique is proprietary, unfortunately, so there's no code you can go poke your big Slavic nose into. But it's still nice to see a content technique actually implemented on live data from the Web.

Thanks in large part to Google, search engine designers have learned the importance of analyzing hyperlinks to improve their search results. On the content side, however, the approaches have remained pretty rudimentary. Various cool methods for calculating content-based similarity are being explored in the academic world, but those algorithms don't often leave the ivory tower, where they are mainly valued for their ability to impress a tenure committee.

Waypath is still quite experimental, so you have to approach it with a good deal of patient understanding. But it's a fascinating site to play with, as you look at your results and try to suss out why the engine made certain connections. And as with all good things, there's a blog attached.

« XML-RPC InterfacePoland: Global Power »

Greatest Hits

The Alameda-Weehawken Burrito Tunnel
The story of America's most awesome infrastructure project.

Argentina on Two Steaks A Day
Eating the happiest cows in the world

Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?

No Evidence of Disease
A cancer story with an unfortunate complication.

Controlled Tango Into Terrain
Trying to learn how to dance in Argentina

Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting

Attacked By Thugs
Warsaw police hijinks

Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews

A Rocket To Nowhere
A Space Shuttle rant

Best Practices For Time Travelers
The story of John Titor, visitor from the future

100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law

Every Damn Thing

2020 Mar Apr Jun Aug Sep Oct
2019 May Jun Jul Aug Dec
2018 Oct Nov Dec
2017 Feb Sep
2016 May Oct
2015 May Jul Nov
2014 Jul Aug
2013 Feb Dec
2012 Feb Sep Nov Dec
2011 Aug
2010 Mar May Jun Jul
2009 Jan Feb Mar Apr May Jun Jul Aug Sep
2008 Jan Apr May Aug Nov
2007 Jan Mar Apr May Jul Dec
2006 Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2005 Jan Feb Mar Apr Jul Aug Sep Oct Nov Dec
2004 Jan Feb Mar Apr May Jun Jul Aug Oct Nov Dec
2003 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2002 May Jun Jul Aug Sep Oct Nov Dec

Your Host

Maciej Cegłowski


Threat

Please ask permission before reprinting full-text posts or I will crush you.