Canonicalisation issues – why it’s bad and how to fix it.
In the past month I have been doing quite a few SEO Evaluations and technical audits of websites, and I have really enjoyed it, after working as Head of Search for quite a while, I am genuinely loving getting back to doing more of the actual SEO work. The technical analysis, the digging into a sites link authority and even keyword and competitor research. In general I’m loving figuring out how a site is doing in the SERPs, why they’re not ranking well and how it can be improved. This is why I love SEO after all, it’s like a riddle or a puzzle, and I just got to figure it out =)
One of the things that keep on popping up over and over again in these evaluations is; websites having canonicalisation issues. So I thought I would write my first proper blogpost on Verve Search about this topic.
What is canonicalisation?
Except from being one of the most difficult words to write and pronounce… From the horse’s mouth: “Canonicalisation is the process of picking the best URL when there are several choices…”
Basically, quite often a web page will have several URLs for the same page, for example:
http://vervesearch.com (notice without the dub dub dub)
Both of these URLs load the same page, the homepage! There can also be other versions of the URL loading the same page with additional parameters such as /index.php or even /home.php In addition the owner of a website might have bought several domains (TLDs), for example I also own the .co.uk TLD: http://www.vervesearch.co.uk If this additional domain is just pointed to the website/page this will again load the same page. So potentially I could have 8 different URLs loading the homepage for Verve Search.
This is a problem for several reasons, fundamentally because when the search engine visits your website the search engine spiders is likely to be having this experience:
It would be even more complicated for the search engine spiders if in addition to all these URLs your website also contained URL based sessionIDs (sessionIDs=dynamically generated a separate URL for each user in each session, including the spiders) For example http://www.vervesearch.com/?PHPSESSID=123 . Each page would then be likely to have hundreds, maybe even thousands, of separate URLs for the same page. The real problem then comes when the spiders indexes one of these sessionID URLs instead of your main URL. Yes it will look rubbish, BUT the real problem is that this URL is unlikely to have any link authority as it’s a unique URL just for the session when the spider crawled the site. The real problem is when loads of these URLs find their way into the search engine index, as these sesssion URLs are likely to have any link authority, so if you are trying to rank within a competitive market this could be holding your site back significantly. Worst case scenario the spiders can be indexing a sessionID instead of the main URL to a page.
Note: the reason some sites use sessionIDs is usually to be able to do in depth tracking of each session. For those of you that do this I would recommend using cookie based sessions instead of URL based session IDs. Yes, cookie based tracking might not be as accurate if users disables cookies but I believe it’s better in the long run as session based URLs could potentially harm your SEO efforts and over complicate things
How canonicalization issues affects link authority!
In your mind the http://www.yourdomain.com/ is usually your main URL, but don’t assume this is obvious to users and search engines. If you haven’t chosen a canonical URL (and implemented the appropriate redirects or rel=canonical tags, don’t worry explanation will come) it is likely that some links will go to one of the other URLs, for example a user types in my website direct into browser but uses the .co.uk TLD, it finds the page they wanted to link to and links to it using the .co.uk. Another example could be a user following an internal link and the internal link goes to /page/index.php but your link builders are getting links to the main URL, now you have links going to both URLs and the link authority is being diluted. You still following me? Now imagine you also have sessionIDs on your site and a user have visited your site, gets a sessionID and bookmarks the page (with the sessionID) then links to it via his/her blog. Now you have 3 different URLs to the same page with links, imagine how much more powerful the page would be if all of the links went to one URL??!!
How to fix canonicalisation problems
There are now 2 different ways of fixing canonicalization issues to your site. Quite recently Google announced supporting a new “canonical tag” that lets you specify in the HTML header that the URL in question should be treated as a “copy” and names the canonical URL that all link authority and content metrics should flow back to.
Within the HTML header of the page loading on this URL http://www.vervesearch.com/index.php there would be a parameter like this:
<link rel=”canonical” href=”http://www.vervesearch.com/” />
This would “tell” the search engines that they should index the canonical URL specified in this tag and also weigh any link authority from the /index.php URL to the canonical URL. The rel=canonical tag should be implemented on every URL you have that is loading the same page (except from the main canonical URL you want to use of course).
This tag is really easy to implement and can solve a lot of canonicalization issues, BUT it has its limitations. For example you can’t use this for your country specific TLDs (which essentially a separate domain) or other additional domains you might have bought. There might also be issues with the fact that this tag only “redirects” the engines attention to the correct URL, users will still be able to use all the different URLs and within your analytics these are likely to come up as different pages.
My preferred method and a pretty air tight solution for canonicalization problems is using 301 redirects. A 301 redirect is a permanent redirect from one URL to another, using a 301 redirect will carry over any link authority from one URL to another, even from a different domain! As opposed to a 302 redirect which is a “temporary” redirect that won’t carry over the link authority and is general just rubbish. Just don’t use 302 redirects ok!! With a 301 redirect you will also avoid any user complications as even if the user types in a URL in the browser it will redirect to the canonical URL! Want to check if a URL is 301 redirecting correctly, try this redirect checker!
The problem with 301 redirects is that it is generally much harder to implement than the rel=canonical tag. To create a 301 redirect you will need to create (if you don’t already have one) an .htaccess file that you upload to the root of your server. More about how to implment 301 redirects in .htaccess files here. If you are not a programmer or very technical, I advise you to get your programmer to do this for you, as messing with the .htaccess file can really mess with your site. Some hosting companies will have 301 redirect capabilities within your cPanel, which I have, in this case you can easily 301 redirect URLs and domains from there. If on the other hand your site is developed in .ASP or worse ASP.NET (only joking)please check out this site for instructions on how to do a 301 redirect when using IIS servers.
Making sure your websites URLs are organised and redirected appropriately, choosing one canonical URL that all other URLs will be redirected to (either by 301 or rel=canonical tag) could potentially have a BIG impact on your SEO efforts. Don’t confuse the search engine spiders or your users, sort it out!
Please feel free to comment or add anything, I would love to hear about your experiences. And if you think you might have canonicalisation issue please comment and I will check it out for you, or alternatively email me:lisa [at] vervesearch.com