Word Separators (Google)


When searching with Google, you obviously enter the search terms into the search box. Google will then search for these very specific words in the billions of pages of its index.

Consequently it is important to make sure that a word in a page, a title or an address (URL) is recognizable by Google.

Lower-case or upper-case letters

 
Google does not make any difference between lower-case and upper-case letters.
Example : Directory = DIRECTORY = directory = dIReCtORY

Singular or plural

 
With Google, as far as the search algorithm is concerned, if one letter is different, the results of the search will be different. Thus a singular word will be different from the same word in the plural.
Example : engine is different from engines

Word separators in the texts and titles

 
In the titles and the text, one can obviously validly separate two words by a space, the end of a line or by means of other page-settings.

One can also use all the usual punctuation marks, like the point (.), the comma (), the square brackets, the parentheses and the braces ([ ] () {}),… and symbols, like “@”, “$”, “%”, “#”, “+”, “/”, “=”, etc.

The alphanumeric characters are not separators.

The alphanumeric characters are :

– the English small and capital letters (a A b B c C … x X y Y z Z) ;
– the foreign accentuated small and capital letters (à À é É è È ê Ê …) ;
– the ampersand (&) ;
– the decimal digits (0 1 2 … 7 8 9) ;
– the underscore (_).

If you put side by side a word and an underscore (_) or an ampersand (&) or decimal digits, Google will see all these characters side by side as a unique “word”!

Examples (text and title)

 
In Directory for search engine optimization (SEO-friendly), Google recognizes the words: directory, for, search, engine, optimization, seo and friendly.

In super_dir, directory of directories (free&effective), Google recognizes the words: super_dir, directory, of, directories and free&effective. It does not recognize the words super, dir, free and effective.

In LIST99, directory n°1, Google recognizes the words: list99, directory, N and 1. It does not recognize the words list and 99.

Word separators in the address of a page (URL)

 
Special care must be taken in the application of these rules within the address of a page. The ampersand (&), the slash (/) and the percent character (%) play special roles in a URL and their use should be restricted to these special functions. We strongly advise against using spaces and accentuated or special characters in the URL.

Google has confirmed that the point (.), the comma (,) and the hyphen (-) are valid word separators in URL’s.

The recommended alphanumeric characters are :

– the English small and capital letters (a A b B c C … x X y Y z Z) ;
– the decimal digits (0 1 2 … 7 8 9) ;
– the underscore (_).

Examples (address of a page or URL)

 
In http://www.internetofficer.com/google/separator.html, Google recognizes the words : www, internetofficer, com, google, separator and html. It does not recognize the words internet and officer.

In http://www.example.com/GET RICH, which can also be written http://www.example.com/GET%20RICH, Google recognizes the words : www, example, com, get and 20rich. It does not recognize the word rich, although “%20″ is supposed to represent a space.

12 Responses to “Word Separators (Google)”

  1. TheSeoDude says:

    In search results you see words bolded in url even if not separated in anyway. I think in the url search engines will check if the string is found not as an actual word requiring separators. And the presence of words in URL has a lot of weight.

    My 2cents.

  2. Jean-Luc says:

    Your statement “In search results you see words bolded in url even if not separated in anyway.” is true for Google, but this does not necessarily mean that Google indexes words which are not properly separated.

    Let us look at these two search results:

    google word separator

    Searching for the word “adsense” in this web site


     

    google word without separator

    Searching for the word “sense” in this web site


     

    We see that although Google knows a page called http://www.internetofficer.com/web-robot/adsense.html, it says the word “sense” does not exist in the web site.

    We believe that the bolded characters in search results are probably misleading. It seems that they are decided by a program that only takes care of the user-interface and not by the search programs themselves.

    Jean-Luc

  3. TheSeoDude says:

    Indeed. But you can never know if a word found both in body and in URL is not treated differently. It didn’t find the word sense but maybe the page is considered more relevant if the word found in the body is also matched as a string in the url.
    So it’s not mandatory to index the url as strings but to consider you more important if you have the searched words in the url.

    This is really something not easily provable. But it still gets you wondering.

    Mysterious are the Search Engine ways.

  4. onSite internet service says:

    What about the underscore (“_”) in URLs? I learned that it behaves linke the hyphen, but does not act like a “double” word as e.g. web-robot = webrobot and web_robot != web-robot

  5. David DeAngelo says:

    Thanks for this. I was thinking to rewriting alot of my URL’s for one of my sites because I had comma’s in them. After reading this I won’t be since the comma is safe. Though I do prefer to use hyphens as seperators in URLs since they look much cleaner. The url in one of the most important things in seo for keyword rankings.

  6. Iwenzo says:

    But whats with the german ÄÜÖ ? Is it the same like ae=ä / ue=ü / oe=ü ???

  7. HM2K says:

    In this text you say:

    “Google has confirmed that the point (.), the comma (,) and the hyphen (-) are valid word separators in URL’s.”

    Do you have citation for this?

  8. Jean-Luc says:

    A long time ago, Googleguy wrote:

    Yah, I’d stick to hyphens, periods, or commas. Most people seem to prefer hyphens. If you use an underscore ‘_’ character, then Google will combine the two words on either side into one word. So bla.com/kw1_kw2.html wouldn’t show up by itself for kw1 or kw2. You’d have to search for kw1_kw2 as a query term to bring up that page.

    This appeared in 2004 under ‘Illegal’ chars in address in WebmasterWorld (you need a WebmasterWorld account to view the page).

  9. hm2k says:

    Thanks, I included details of this in my article that fills in the gaps with regards to word separators in urls…

    http://www.hm2k.com/posts/word-separators-in-urls

  10. Malte Landwehr says:

    Even though other symbols are possible I would always use the hyphen to separate word in an URL. Its recognised not only by all the search engines but by humans as well!

  11. SEO DK says:

    searching with keywords: “Internet Officer” leaves internetofficer.com ranking at 1st spot in G with sitelinks and all.
    The whole word “InternetOfficer” is highlighted.

    So they do do train those little spiders with new tricks :)

  12. Matt says:

    That’s very interesting about the ampersand in titles and headings, I had assumed Google would view that as 2 separate words, I’ll have to start using the word and.


 

Leave a Reply

You must be logged in to post a comment.