Verve
Blog
Home > Blog > Canonicalisation Issues Why Its Bad And How To Fix It

Canonicalisation issues – why it’s bad and how to fix it.

In the past month I have been doing quite a few SEO Evaluations and technical audits of websites, and I have really enjoyed it, after  working as Head of Search for quite a while, I am genuinely loving getting back to doing more of the actual SEO work. The technical analysis, the digging into a sites link authority and even keyword and competitor research. In general I’m loving figuring out how a site is doing in the SERPs, why they’re not ranking well and how it can be improved. This is why I love SEO after all, it’s like a riddle or a puzzle, and I just got to figure it out =)

One of the things that keep on popping up over and over again in these evaluations is; websites having canonicalisation issues. So I thought I would write my first proper blogpost on Verve Search about this topic.

What is canonicalisation?

Except from being one of the most difficult words to write and pronounce… From the horse’s mouth: “Canonicalisation is the process of picking the best URL when there are several choices…”

Basically, quite often a web page will have several URLs for the same page, for example:

http://www.vervesearch.com

http://vervesearch.com (notice without the dub dub dub)

Both of these URLs load the same page, the homepage! There can also be other versions of the URL loading the same page with additional parameters such as /index.php  or even /home.php In addition the owner of a website might have bought several domains (TLDs), for example I also own the .co.uk TLD: http://www.vervesearch.co.uk If this additional domain is just pointed to the website/page this will again load the same page. So potentially I could have 8 different URLs loading the homepage for Verve Search.

This is a problem for several reasons, fundamentally because when the search engine visits your website the search engine spiders is likely to be having this experience:

canonicalisation-confused-spider11
It would be even more complicated for the search engine spiders if in addition to all these URLs your website also contained URL based sessionIDs (sessionIDs=dynamically generated a separate URL for each user in each session, including the spiders) For example http://www.vervesearch.com/?PHPSESSID=123 . Each page would then be likely to have hundreds, maybe even thousands, of separate URLs for the same page. The real problem then comes when the spiders indexes one of these sessionID URLs instead of your main URL. Yes it will look rubbish, BUT the real problem is that this URL is unlikely to have any link authority as it’s a unique URL just for the session when the spider crawled the site. The real problem is when loads of these URLs find their way into the search engine index, as these sesssion URLs are likely to have any link authority, so if you are trying to rank within a competitive market this could be holding your site back significantly. Worst case scenario the spiders can be indexing a sessionID instead of the main URL to a page.

Note: the reason some sites use sessionIDs is usually to be able to do in depth tracking of each session. For those of you that do this I would recommend using cookie based sessions instead of URL based session IDs. Yes, cookie based tracking might not be as accurate if users disables cookies but I believe it’s better in the long run as session based URLs could potentially harm your SEO efforts and over complicate things

How canonicalization issues affects link authority!

In your mind the http://www.yourdomain.com/ is usually your main URL, but don’t assume this is obvious to users and search engines. If you haven’t chosen a canonical URL (and implemented the appropriate redirects or rel=canonical tags, don’t worry explanation will come) it is likely that some links will go to one of the other URLs, for example a user types in my website direct into browser but uses the .co.uk TLD, it finds the page they wanted to link to and links to it using the .co.uk. Another example could be a user following an internal link and the internal link goes to /page/index.php but your link builders are getting links to the main URL, now you have links going to both URLs and the link authority is being diluted. You still following me? Now imagine you also have sessionIDs on your site and a user have visited your site, gets a sessionID and bookmarks the page (with the sessionID) then links to it via his/her blog. Now you have 3 different URLs to the same page with links, imagine how much more powerful the page would be if all of the links went to one URL??!!

How to fix canonicalisation problems

There is now 2 different ways of fixing canonicalization issues to your site. Quite recently Google announced supporting a new “canonical tag” that lets you specify in the HTML header that the URL in question should be treated as a “copy” and names the canonical URL that all link authority and content metrics should flow back to.

Example:
Within the HTML header of the page loading on this URL http://www.vervesearch.com/index.php   there would be a parameter like this:

<link rel=”canonical” href=”http://www.vervesearch.com/” />

This would “tell” the search engines that they should index the canonical URL specified in this tag and also weigh any link authority from the /index.php URL to the canonical URL. The rel=canonical tag should be implemented on every URL you have that is loading the same page (except from the main canonical URL you want to use of course).
This tag is really easy to implement and can solve a lot of canonicalization issues, BUT it has its limitations. For example you can’t use this for your country specific TLDs (which essentially a separate domain) or other additional domains you might have bought. There might also be issues with the fact that this tag only “redirects” the engines attention to the correct URL, users will still be able to use all the different URLs and within your analytics these are likely to come up as different pages.

My preferred method and a pretty air tight solution for canonicalization problems is using 301 redirects. A 301 redirect is a permanent redirect from one URL to another, using a 301 redirect will carry over any link authority from one URL to another, even from a different domain! As opposed to a 302 redirect which is a “temporary” redirect that won’t carry over the link authority and is general just rubbish. Just don’t use 302 redirects ok!! With a 301 redirect you will also avoid any user complications as even if the user types in a URL in the browser it will redirect to the canonical URL! Want to check if a URL is 301 redirecting correctly, try this redirect checker!

The problem with 301 redirects is that it is generally much harder to implement than the rel=canonical tag. To create a 301 redirect you will need to create (if you don’t already have one) an .htaccess file that you upload to the root of your server. More about how to implment 301 redirects in .htaccess files here.   If you are not a programmer or very technical, I advise you to get your programmer to do this for you, as messing with the .htaccess file can really mess with your site. Some hosting companies will have 301 redirect capabilities within your cPanel, which I have, in this case you can easily 301 redirect URLs and domains from there. If on the other hand your site is developed in .ASP or worse ASP.NET (only joking)please check out this site for instructions on how to do a 301 redirect when using IIS servers.

Making sure your websites URLs are organised and redirected appropriately, choosing one canonical URL that all other URLs will be redirected  to (either by 301 or rel=canonical tag) could potentially have a BIG impact on your SEO efforts. Don’t confuse the search engine spiders or your users, sort it out!

Please feel free to comment or add anything, I would love to hear about your experiences. And if you think you might have canonicalisation issue please comment and I will check it out for you, or alternatively email me:lisa [at] vervesearch.com

Further reading:

Google webmaster central blog on specifying your canonical tag
Matt Cutts on canonical tag and URL canonicalization
Obi One Kenobi (Rand) at SEOmoz on the URL canonical tag

Your email address will not be published. Required fields are marked *

6 thoughts on “Canonicalisation issues – why it’s bad and how to fix it.

  1. L. Mohan Arun

    My opinion is that, the canonicalization is something that the search engine spider should be intelligent enough to sort out on its own. Come to think of it, isnt this the same thing like exact match and broad match in adwords? They just want to create a meme or want to get talked about so they come up with these kinds of things that require webmasters to go in and put in all the canonicalization tags in every page of the site so they can show they did this and that for SEO.

    Reply
  2. Sam Page

    I have to say I agree with L. Mohan Arun. A search engine with it’s massive indexing power and creaping robots should really be able to establish that:

    http://www.vervesearch.com &
    http://www.vervesearch.com/index.php OR http://www.vervesearch.com/index.html OR
    http://www.vervesearch.com/index.asp

    are all the same page.

    Does anyone know why this doesn’t happen? Is it to do with the fact you can have diferent pages as your default document in a folder, so Google is never sure whether this is the case? Surely it could check?

    Cheers,
    Sam

    Reply
  3. stuartpturner

    This is a very interesting post Lisa, I haven’t seen anyone go into as much detail with regards to the canonical tag yet. I’m in two minds about it however, as it seems to just be a fix for a problem that needs resolving either at a site CMS or a code level.

    If you have a site which generates a number of URLs for the same page, this (IMO) shows a need for a CMS which would resolve this issue, or investigation how you can resolve the issue permanently leaving just one page in the index. As you point out 301 is the best way (or if you’re building a site from scratch – don’t duplicate your URLs :P ).

    I think that while Google haev introduced this tag to make webmaster’s lives easier, they may have shot themselves in the foot by providing a tag that it is very easy to implement incorrectly. I may be proved wrong though…

    @L. Mohan Arun

    “My opinion is that, the canonicalization is something that the search engine spider should be intelligent enough to sort out on its own.”

    The fact that the cannot do this is the very reason this tag was created. I’d suggest reading the ‘futher reading’ links provided. It isn’t really just to do with SEO, if you have the kind of duplicate content issue Lisa describes, all those pages will simply be excluded from the search engine’s index.

    If you’d actually read this post you’d know that this comment “…that require webmasters to go in and put in all the canonicalization tags in every page of the site” is somewhat off the mark.

    Reply
  4. Marc Wilson

    We have an issue with canonicalisation – due to a faulty sitemap in the past, there are multiple versions of some URLs, but they differ only in case.

    IIS is case-agnostic, of course, so treats them as the same. The problem is that Google doesn’t- and is penalising the site for duplicate meta descriptions.

    So, the obvious solution is (1) fix the sitemap, (2) have a rewritemap that will fix the known problematic URLs, forcing a rewrite to the all-lower-case “canonical” version.

    However, when I do this, the browser reports it as a rewrite loop. It fixes the URL, redisplaying it as the lower-case version, but then reports an error.

    Presumably because IIS doesn’t care about case?

    Reply
  5. Matt O'Toole

    I know this is a fairly old one, Lisa, but I was looking for a post to help explain canonicalisation to a customer and this was one of the best written examples I came across. Good job! :)

    Reply