Escaped Thoughts

Thu, Sep 04, 2003

Polluting The Airwaves

Watching my referrer logs, I feel sort of guilty about blogging, or at least about having my blog indexed by search engines. Except for the handful of you who read regularly (props to you all), the people who end up here are usually doing searches for stuff that has nothing to do with anything I've ever written about. At first I didn't understand why they were showing up, as they were often quite elaborate queries which, in a world that made sense, would never have led them to my blog.

After some investigating I discovered that it's really the fact that weblogs (or at least how they are presented on web pages) are in many ways totally alien to how many people (including me) expect content to be. For example, if I do a search for the words "tree", "explore", "discrete", and "artificial intelligence", I would expect to get pages that relate somewhat to my research. Why? Because I expect pages of content to be somewhat consistent, since that's the way most content works in the Real World™ of web pages. So I assume that if someone manages to use all those words, there's a high probability that they are talking about applying research in tree-based discrete-space exploration to artificial intelligence problems.

In reality, I am relatively likely to get an archive of a month's worth of weblog entries on some random person's site, including a narrative about a hike through a new section of woods in their favorite park, a rant about how much they hated the movie "AI", and a story about telling an embarrassing secret to someone who turned out not to be trustworthy.

I think the ultimate solution would be to have an HTML division marker that was recognized by most search engines (by which I mean Google) as signaling a fundamental shift in content. Weblogs, and weblog archives, could insert it between each post, and the search engine could index each section as if it were a separate page (just one which happens to share a URL with other pages), so all the words would have to occur within one post in order for the page to be returned. It would help immensely, and it would be instantly adopted by many if the major blogging services and software turned it on by default.

Hopefully, GoogleBot is still reading avidly and it can incorporate my ideas into its programming. Until then, I'll watch people hitting my archive based on totally random queries.

If you are here because you wanted info on 'searching for alien intelligence in space', you've come to the wrong place. Next time, read the context preview Google gives you!

Category: Geek

Writebacks (0)