Like most bloggers, I enjoy checking my stats.Â I especially like to see where my visitors have come from and where they are going to.Â Some of my incoming traffic has been coming from searches on Google and Yahoo! for “The Geographer’s Library”, as I recently posted about that book.Â In looking at the search results fromÂ incoming visitors, I noticed that my post was number one on several search engines.Â My first though was, “Well, it’s about damn time.”Â My second was, “This can’t be right.”Â Where was Amazon.com?Â Anytime I do a search for a book, they are invariably first.Â Where was Barnes and Noble?Â ABE?Â Alibris?Â There should be lots ofÂ results before my paltry post.
I did a search for myself.Â There was Amazon, Barnes and Noble, ABE, and Alibris.Â Ten pages into the results and I still couldn’t find Cup O’ Books.Â Â Huh?
As I went back and forth looking for what was different about the search terms, I finally noticed this:
The apostrophe in the incoming searches was different than the apostrophe in my search.Â Both failed to create the red line that indicates a misspelled word in Microsoft’s spelling engine.Â I was perplexed.Â In a fit of reason, I e-mailed our computer guy at school.Â Here is the e-mail:
I am wondering if either of you can help me understand the difference in the apostrophes found above.Â Both appear to be Times New Roman apostrophes, as neither is identified as a spelling error by the Microsoft Office spell check.Â However, if you do a Google or Yahoo search for the top one, you get the â€œDid you meanâ€¦â€ message identifying a spelling mistake.Â The second does not generate that message.Â Each search turns up unique results.
Here is his reply:
The difference is not grammatical (or “punctuational,” if you prefer). It’s a technological glitch. Every character you type looks on-screen like a letter or number or mark that you understand. In the background, however, each character is represented by a number that is part of the ASCII code. (ASCII=American Standard Code for Information Interchange). The code is a 7-bit binary number, and the combination of 1s and 0s allows up to 128 characters. These are letters (upper and lower case), numbers, and standard marks of punctuation, diacritical marks, etc. You’d think that this would allow for even more complicated punctuation (“curly” quotes instead of “straight” quotes, for instance), except that a few dozen of the codes are used for computer instructions.
Extended ASCII and Unicode are two of the codes people have produced to cover the less-common marks: copyright, registered trademark, etc.
The “curly” quotes don’t fit into standard ASCII. When a program like Microsoft Word uses the fancy quotation marks (it can be set to do this automatically), or to turn 2 hyphens into an em dash, it looks great on-screen or on-paper. But when you send this file to a program that doesn’t recognize Extended ASCII or Unicode, you get bad results.
When you see an email where an abbreviated word has turned into gobbledygook, it is the result of the current program trying to translate some other programs preferred code into its own.
So here’s my guess: Google uses standard ASCII, so it recognizes “Geographer’s” as the word you meant. But if a fancy apostrophe is used, it interprets it as a separate symbol and may see TWO terms: “Geographer” and “s” and give you different results.
From all that I have heard and read, Google programmers are the cream of the crop. I expect this is the result of a conscious decision on their part to use standard ASCII for faster searches, and to try to avoid second-guessing which other codes programs might use.
If any technogeeky types want to respond to this, I’d be interested in any comments.
Technorati Tags: ASCII, Unicode, Geographer’s Library, Search Engine, books, apostrophe