« Nuclear TidbitStalinist Propaganda Posters »

Distributed Search Engines

I promised a post a few days ago about distributed search engines, but I\'ve been dilly-dallying about it. It\'s the holidays, we\'re all full of turkey and cookies.

In my earlier post, I fretted about how Google and other centralized search services like it had become a bottleneck to finding information online, and could therefore become a tempting target in the drive to regulate ( and even censor ) Internet content. But there is a more powerful, positive argument to make in favor of distributed search engines — people are assembling their own collections of information, in the form of websites, discussion groups, blogs, and more traditional forms of writing, but there is still no way to selectively search this content. You can go to Google and search the entire Internet, or you can use a variety of rudimentary seach tools on your own comptuer or individual public websites. What you can\'t do is say \"search the New York Times, the blogs in my blogroll, and the Wayback machine for documents similar to the email message I just sent\". A distributed system would fill that middle ground.

Right up front it\'s important to say that peer-to-peer search engines wouldn\'t be intended to replace of centralized services like Google, any more than weblogs have supplanted large news or commentary sites like Salon or the New York Times. Instead, they would serve the same purpose as weblogs do, which is to create neighborhoods for specialized information, and make it easy to find, join, and participate in niche communities of knowledge.

Mena Trott mentions a phenomenon that you can often see by monitoring your referrer logs - a post on an arcane topic will become the hub of a little universe of interest. In her case, an attached discussion became the locus for a whole little special-interest group, with visitors coming in via Google, answering one another\'s questions and keeping the post \'alive\' outside the context of the weblog itself.

A peer-to-peer search engine would make such microcommunities easier to find, and easier to sustain. Instead of relying on an Internet-wide portal like Google, you would run searches through a personal search client; this could be a Web application, or a more fully-featured desktop application, like a blog aggregator . The client would let you seek out searchable collections through a kind of meta-search, akin to the way Gnutella and other file sharing networks discover new nodes , and create \"search lists\" of interesting sites to send queries to, much like an iTunes playlist. You could also keep a list of favorite queries, which you would periodically send out to chosen blocks of search engines, to find newly added material.

Queries would go out to each little search engine, get their results through a standardized API ( most likely a web serivices protocol ), and return a ranked list of relevant hits. The engine could then recombine those into a single ranked list of hits, and allow you to do all the usual post-filtering — exact phrase matches, sorting by date, and everything else we\'re used to being able to do in a decent search engine.

The net result of this would be a search network whose topology would be just as interesting as the current network of hyperlinks, and clever people would find clever ways to combine the two to make it even easier to find and join interesting conversations.

This is truly a job for the LazyWeb - the technical hurdles are not that great, and the blogging community can be the first to benefit from a working system. Then, when Google puts up the mandatory 700-pixel portrait of John Ashcroft on its homepage and removes the search box, we\'ll at least have something to fall back on.

« Nuclear TidbitStalinist Propaganda Posters »

Greatest Hits

The Alameda-Weehawken Burrito Tunnel
The story of America's most awesome infrastructure project.

Argentina on Two Steaks A Day
Eating the happiest cows in the world

Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?

No Evidence of Disease
A cancer story with an unfortunate complication.

Controlled Tango Into Terrain
Trying to learn how to dance in Argentina

Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting

Attacked By Thugs
Warsaw police hijinks

Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews

A Rocket To Nowhere
A Space Shuttle rant

Best Practices For Time Travelers
The story of John Titor, visitor from the future

100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law

Every Damn Thing

2020 Mar Apr Jun Aug Sep Oct
2019 May Jun Jul Aug Dec
2018 Oct Nov Dec
2017 Feb Sep
2016 May Oct
2015 May Jul Nov
2014 Jul Aug
2013 Feb Dec
2012 Feb Sep Nov Dec
2011 Aug
2010 Mar May Jun Jul
2009 Jan Feb Mar Apr May Jun Jul Aug Sep
2008 Jan Apr May Aug Nov
2007 Jan Mar Apr May Jul Dec
2006 Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2005 Jan Feb Mar Apr Jul Aug Sep Oct Nov Dec
2004 Jan Feb Mar Apr May Jun Jul Aug Oct Nov Dec
2003 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2002 May Jun Jul Aug Sep Oct Nov Dec

Your Host

Maciej Cegłowski


Please ask permission before reprinting full-text posts or I will crush you.