Custom technical audit tools to support SEO projects

December 2023

Screenshot of seo-tools command line interface, featuring a --help feature to allow my team to check syntax without complex documentation.

About the stack

  • Python
  • Web scraping
  • HTML
  • JSON APIs
  • sitemap.xml

I use custom data analysis tools to standardize my site audits for any environment, but can customize my reports to meet the specific needs of any project.

Sometimes cookie-cutter tools don't work—or don't work fast enough. I build my own!

Custom command-line tools support my technical SEO audits to save time and improve repeatability. Why waste hours searching for errors if you can automate it?

Rank trackers, accessibility checks, and other tools are great—why reinvent the wheel?

Rank trackers, accessibility checks, and other tools are great! And I encourage you to use the best tools in your toolkit, no matter what they are. SEOs can do a lot of great work with Google Trends (free) for big picture, Google Ads (free if you’re not running ads) as a keyword research tool, “People also ask” and related SERP features (free) for intent-related research, and out-of-the-box site audit tools like SiteImprove (paid) or Monsido (paid) for quality assurance and accessibility checks. That said, some of these tools have pitfalls—especially when working in a stage, test, or dev environment.

Rank trackers are based, largely, on the same core concepts as Google Ads purchasing. They assume that “competition” for a keyword fully encompasses ranking signals and systems for all content strategies (hint: it doesn’t), and they tend to ignore long-tail keyphrases entirely. Rank trackers are great for exploring related keywords, tracking keywords that you’re targeting over time, and identifying opportunities for content creation in niche areas. However, they shouldn’t be the only tool in your kit for content strategy and keyword research.

Rank trackers also tend to push you toward “SEO fixes” that weren’t really problems to begin with. For example, toxic backlinks have an important role to play if you’ve been purchasing spammy links from a link farm (violating search engine policies to try to rank better) or if you are overwhelmed with low-quality links that are damaging your brand reputation—but if it was that easy to lose rankings due to spammy links, I could dominate my competitors overnight by purchasing a bunch of spammy links to their sites.

Quality assurance and accessibility checking tools are great for your site in production. I always want to know where my broken links are, or where color contrast is off, or where aria labels are missing—especially if I’m managing an enterprise scale site with thousands of individually managed pages, posts, and documents. However, I spend a lot of my time reviewing sites that are yet to be pushed to production. We all want to believe that nobody’s making errors in the process, but we also know from practice that it’s easy to launch a site with human errors.

Build properly contextualized tools to match your audit needs

To get around these challenges, and extend my toolkit with little to no extra cost, I am passionate about building custom audit tools that help me search for errors, identify content opportunities, and plan ahead before launch instead of identifying problems afterward. My post about mapping Google’s “People also ask” feature as a research strategy highlights how I’ve implemented this approach in a strategy research context, but I also use this approach to audit sites that already have a content strategy in place.

Because my process is automated and client-specific, I can check for broken or out of compliance links with one command. Something like this:

seo-tools links-status --xml-index https://geofflambeth.com/sitemap.xml --output-csv ~/Downloads/my-links-status.csv

A common challenge I face on large migrations is verifying links. However, even on large sites, I can check the vast majority of links and their destinations for HTTP errors in a matter of minutes. To do so, I’ve included the feature in a home-grown “seo-tools” command line application built using python. I enter an XML sitemap (this has to be up to date first!) URL accesible from my machine, allowing me to run the tool even on local dev environments, and specify an output file. A command might look something like this:

Screenshot of spreadsheet output from seo-tools links-status report.

This returns a useful links status report, including a “source URL” (where I found a link), a “destination URL” (the link I found), whether a link opens in a new tab, HTTP response statuses (200, 404, 301, etc.), destination URLs (in case of an HTTP redirect), and other technical details.

Based on this report, my team and I can clickly implement essential link fixes before they hit production—regardless of the CMS platform. With this information, we can also filter to help ensure compliance with certain site-specific standards. For example, if a client wants all external links to open in a new tab but wants zero internal links opening in a new tab, I can filter for that specific case and identify any outliers to follow up with the site’s publishers or with our front-end developers.

Creating “red yellow green” excercises for clients—or keyword mapping documents for me

Another common auditing process I participate in as a “red yellow green” excercise, where we collaborate with clients to determine which content they want to keep and which content they want to get rid of. I use the same base report to begin keyword mapping excercises, recommending updated title tags, meta descriptions, article h1s and other high-priority content to best align with user’s search intent for a page.

Screenshot of spreadsheet output from seo-tools links-status report.

Before I started building an always on SEO requests platform, which handles these requests for me, I built a custom “RYG Scrape Tool” to handle these types of requests. This was a big improvement from the previous process—where someone had to sign into a client’s CMS and copy links for every single page, post, and document on a site—and has helped to streamline large migrations on my team.

Should I get pages from a sitemap.xml, from an analytics platform, or from the CMS?

I’ve used all these options! A sitemap.xml can be great and simple to work with, but in many (if not most) cases I’ve found that a sitemap.xml doesn’t always have every single page I want to audit for a large migration. Analytics platforms are also great, but if you’ve made changes or have zero-traffic URLs they won’t be representative of the current state of your site. In my experience, I’ve found the best solution is to get your content to audit directly from the source. In my case, I frequently use the WordPress JSON API (easiest if your client’s WordPress version is up-to-date and supports API keys, but that’s worth its own post) to generate a list of all pages, posts, and documents in a site—including draft and private URLs that likely won’t show up in either a sitemap.xml or analytics platform.