Home > Blog > Compare Screaming Frog Crawl Files

Comparing Screaming Frog Crawl Files

Handing over technical recommendations often comes with some trepidation; how long might it take for them to be implemented, will they be implemented at all and, if so, will they be implemented correctly? That’s why understanding how development cycles occur, how items are prioritised and who you need to get onside is as key to successful technical SEO as the recommendations themselves. However well you understand those, though, changes are often implemented without any feedback that they’re now complete.

It’s for that reason that tools like ContentKing have sprung up; to keep an eye on the site and alert you of changes. It’s not always feasible to run SaaS crawlers on the site, though. As a result, many of us rely on running crawls with Screaming Frog’s crawler. Comparing crawl files can be a pain. Usually, you’ll end up dumping the data into excel and run a bunch of VLOOKUPS or MATCH/INDEX functions only to find that no, the developer hasn’t implemented the changes.

Meanwhile, you’ll occasionally want to compare crawl files of different sites to:

  1. Compare a dev environment with a staging environment
  2. Make sure content has been ported to a new site correctly
  3. Run technical SEO competitive analysis/comparisons – we wrote about this recently here.

This has always been a pain, which is why, for a while now, we’ve had a tool that quickly compares crawl_overview files for us. Today, we’re making it available for free.

It’s a simple Python script. If you don’t have Python installed, you can read a guide for Windows here and for MacOS here (you’ll need Python 2, rather than 3, for the script to work – though feel free to install both using virtual environments if you’re really keen on 3). The script itself, is here:

import pandas
import csv
import sys

from tqdm import tqdm

class color:
   PURPLE = '33[95m'
   CYAN = '33[96m'
   DARKCYAN = '33[36m'
   BLUE = '33[94m'
   GREEN = '33[92m'
   YELLOW = '33[93m'
   RED = '33[91m'
   BOLD = '33[1m'
   UNDERLINE = '33[4m'
   END = '33[0m'

def main(argv):
	if len(argv) != 4:
		print 'Usage: crawl_overview1.csv crawl_overview2.csv output.csv'

	headerrows = 5
	endline = 191

	fileone = get_csv(argv[1])
	filetwo = get_csv(argv[2])

	fileone = fileone[0:endline]
	filetwo = filetwo[0:endline]

	fileonesite = fileone[1][1]
	filetwosite = filetwo[1][1]

	fileone = fileone[headerrows:]
	filetwo = filetwo[headerrows:]

	fileonedata = []
	filetwodata = []
	combineddata = []
	firstcolumn = []


	outFile = csv.writer(open(argv[3], 'w'))
	for i in tqdm(combineddata):

	if fileonedata == filetwodata:
		print (color.BOLD + color.RED + "Crawl files are identical" + color.END)
		print (color.BOLD + color.GREEN + "Crawl files are NOT identical" + color.END)

def get_csv(thefile):
	datafile = open(thefile, 'r')
	datareader = csv.reader(datafile, delimiter=",")
	for row in tqdm(datareader):
	return data

def get_column(thelist,thecolumn):
	newlist =[]
	for row in tqdm(thelist):
		if len(row) >= thecolumn +1:
	return newlist

if __name__ == '__main__':

The only thing you might need to pip install is tqdm – which if you’re not already using we heartily recommend – it creates the nice little loading bars. If you’re new to Python and the script errors when you run it, mentioning tqdm, simply type:

pip install tqdm (on windows)

sudo pip install tqdm (on Mac)

You’ll only ever need to do that once.

Save it in a folder, navigate to that folder using command prompt or terminal and then run it the same way you’d run any Python script (typically ‘Python <>’). It takes two inputs:

  1. The name of the first crawl_overview file
  2. The name of the second crawl_overview file
  3. The name of file you’d like to save the output as – it should be a csv, but doesn’t need to already exist

Both files should be in the same folder as the Python script and so a valid input would look something like this:

Python crawl_overview1.csv crawl_overview2.csv output.csv

Compare Screaming Frog Crawl Files

The script’s pretty fast – it’ll chew through the files within seconds and then report that either ‘Crawl files are identical’ or ‘Crawl files are NOT identical’. It will have saved a file called ‘comparison.csv’ in the same directory that compares both crawl files – ready for you to:

  1. Send onwards as proof as to whether recommendations have or haven’t been implemented; or
  2. Create industry comparison graphs to show how the sites compare; or
  3. do with as you please.


Future Updates

Now that the script is publicly available there are a few changes we plan to make to it. These include:

  1. Creating a front-end and installer for those who don’t like to mess around with Python
  2. Allowing for the comparison of multiple crawl_overview files at once
  3. Allowing for the comparison of other Screaming Frog outputs – not just crawl_overview files.

We’d love your feedback as to what features you’d like to see added.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

2 thoughts on “Comparing Screaming Frog Crawl Files

  1. Mike

    Hi James,
    I’m not familiar with Python, but got the script to run. However I get the following error:

    331it [00:00, 10677.45it/s]
    331it [00:00, 10343.73it/s]
    Traceback (most recent call last):
    File “”, line 81, in
    File “”, line 35, in main
    fileonesite = fileone[1][1]
    IndexError: list index out of range

    Not sure if my csv files do not have the right format? Its all comma separated… but have no idea what could be wrong.

    Cheers Michael

    1. James Finlayson Post author

      The two crawl comparison files should just be two crawl overview files from Screaming Frog. That error indicates that either the files that you’re feeding it aren’t crawl overview files or the crawl overview files have something strange going on with the formatting. What version of Screaming Frog are you running?