Friday, January 09, 2009

Finally, someone said it

What's behind this disparity? Word processors and search engines have different goals. The latter has to field queries as broad and varied as the Internet itself, so it needs a very large vocabulary in order to differentiate spelling mistakes from legitimate search terms. Word processors are much more conservative, limiting their lexicon to words that are definitely legitimate. This way, a program like Word can catch virtually every typo, even if it means misidentifying some proper names and newer words. In other words, search engines put breadth first and spelling accuracy second while word processors are the other way around. If you type in Monkees, Google will assume you're searching for the band; Word will give you a red squiggly line, thinking you've screwed up the word monkeys.

Not surprisingly, search engines and word processors build their dictionaries differently. A search engine's lexicon is typically put together using words gathered from Web pages or old search queries—a huge corpus of real-world data that constitutes a list of valid words and their frequency in the language. Word-processing lexicons are more heavily chaperoned, and the pace at which new terms enter the dictionary is much slower.

No comments: