Verve Search logo

Fixing Google’s Keyword Search Volume Aggregation

James Finlayson

James Finlayson

March 15, 2018

If you ask Google Keyword Planner the search volume for ‘cheap windows laptop’ in the US, it’ll tell you that it’s 100-1k – thanks for the help Google! If, instead, you turn to the tools providers you’ll get answers somewhere in the range of 1.5k (Searchmetrics) to 2.9k (SEOMonitor). Yet, what happens when you ask about variations of those keywords – you’ll get this result:


Clearly, they don’t actually each have the same search volume.

What’s going on?

Since 2012 Google has included, within exact match search volumes, the search volume of misspelt and pluralised close variants of keywords entered. Since March 2017, this was expanded to include alternate orderings of those same keywords (e.g. ‘cheap windows laptops’ and ‘windows laptop cheap’ appearing to have the same volume). Google ignores stop-words (words like ‘as’, ‘in’ and ‘of’) and understands abbreviations (that ‘lol’ is the same as ‘laugh out loud’ for example).

If you’re conducting keyword research and are putting together a list of 50 keywords this is pretty easy to solve by spotting and removing the duplication. When you’re working on a list of tens or even hundreds of thousands of keywords, though, this is practically impossible to do manually.

That means that those keywords could appear in your list with each one showing a search volume of 2.9k – when you add up the total addressable audience you end up with a figure in excess of 11k. That means that any forecast based on that data will skew too high – making what would otherwise be a reasonable forecast potentially unreachable. In tests, we’ve found this to effect anywhere between 0.5% and 10% of search volume depending on where the original keyword list comes from. 10% is the difference between confidently beating target and ending up below target.

Canonical Keywords

How we fixed this is through the concept of a ‘canonical keyword’. This is the simplest form that keyword could take, with all the words in alphabetical order. That means no pluralisation, no conjugation, no misspellings and no pesky word order differences.

It turns out, this sounds a lot easier to implement than it is.

Removing pluralisation is hard because it’s not always a case of removing the ending ‘s’ – see, for example, woman/women, genius/geniuses and tooth/teeth.

There’s no ‘fix all’ button in Excel to fix spelling mistakes and, whilst VBA scripts exist to reorder words in a cell to alphabetical order, those scripts are unwieldy and, frankly at that stage you should be in Python or R in any case.

The Keyword Cleaner

As a result, we built the Keyword Cleaner, which is available for free here.

Screen Shot 2018-03-14 at 10.21.14

Simply enter your keyword list and then click ‘clean’. After a moment (it processes roughly 3k keywords a minute depending on how many people are using it) it’ll give you the canonical version of each ready for you to export.

Next, take those values and add them into a column next to your original keywords in Excel. You’ll then want to see how many times that canonical keyword appears in your keyword list where the search volume and landing pages match (this is to stop decreasing the search volume in cases of a false match). The formula will depend on how you’ve setup your table, though should look roughly like this:

=COUNTIFS([Canonical],[@Canonical],[Search Volume],[@[Search Volume]],[URL],[@URL])

Next, you can simply divide the search volume for each keyword by the number of occurrences of that canonical keyword in the list (as computed above).

Now, obviously, that search volume won’t be accurate on a per keyword basis – we know, for example, that misspellings get roughly 10% of the search volume of the correctly spelt variant. There are two things to remember though: 1) it’s still more accurate than the aggregated volume and 2) this is about getting an accurate forecast based on all the keywords and an accurate total search volume – this solution fixes for that.

In a future version, we’ll likely identify which canonical keywords were fixed misspellings so that you can reduce search volumes accordingly, but that’s for another time and another blog post. Have a play with the tool and leave some feedback below. We’d love to hear your thoughts.

Read another blog post