Most professionals SEO Audits cover a multiple number of critical checkpoints that range from accessibility and indexation essentials onto penalty audits and even SERPs or competitor analysis, depending on scope and budget.
In this article I aim to highlight specific areas that, in my opinion, should get a little more attention.
These areas or checkpoint can be a less common yet essential, to avoid SEO disasters, particularly for websites targetting International markets:
- Inconsistent url casing
- Inconsistent encoding on url formations
- Incorrectly formatted robots.txt files
- Assessing TTFB (Time to First Byte)
- CDN optimisation
- Chain redirects or Redirect Loops
- Multi XML sitemaps
- Semantic mark-up
While some technical SEO audits I’ve been able to see and review were pretty decent and well researched, some others, despite them being expensive and carried out by large known SEO agencies, lacked in depth or focus.
This happens as frequently SEO audits are carried out based on the same usual checklist template, rather than being tailored to the needs of the client.
The list below is an attempt at identifying uncommon points covered on templated SEO audits, yet essential to complex large websites:
1. Inconsistent url casing
How often do you hear the term ‘canonicalisation’. It’s a common term used by SEOs in SEO and can be applied to many different contexts, most of them revolving around the idea of ‘duplicate content and how to avoid it’.
This is where establishing a strategy to define urls is key, and in particular, deciding on the casing is very important. In doing so, you will avoid having to patch things up with rel canonicals, 301 redirects and other hacks in future.
Let’s take a look at these two scenarios for urls:
Both are valid and both do exist. Although force lowercase is becoming more ‘standard’ these days, there are still plenty of websites boasting both capital and lowercase letters on urls, especially .net-powered websites. This is fine. So long as the rules for the url formations are established from the beginning, eg: only nouns would have first letter capitalised, etc… a set of rules that would define how it will be done.
The problem, starts to get out of hand when:
-Inbound links start pouring with the wrong syntax on urls, eg: capital letters everywhere instead of just first letter
-internal linking pointing, due to some flaw, to the url with incorrect syntax
This then starts becoming duplicate content realm.
All this can be avoided by strictly defining how the system should deal with incorrect url formations.
For example, a typical url syntax plan can be:
- all urls in lowercase
- no trailing forward slashes allowed (this is just my preference as I think it looks nicer, but your choice)
- dash ( – ) symbol will replace white space in between words
From those 3 rules you could create, what I have always called: the golden rule of 3 for urls.
But that’s not enough. If we want to avoid deviations from the above rules, we should specify stability-driven code rules :
- Lowercase urls as a must
- Programmatically, the platform reverses any instance of capital url onto lowercase.
Eg url: /rainy-days-are-over if you try typing it as : /rainy-days-are-Over
the system will automatically revert onto: /rainy-days-are-over via 301 redirect
This unique feature works as a shield against duplicate content, as it minimises the chances that someone may link to use with the wrong syntax. So this means if someone links to the site with a capital letter, by mistake, the system automatically reverts onto lowercase.
See where I’m going? unless these prevention systems are put into place to help minimise potential misuses of the urls, their integrity will always be on the line. I rarely see advice like this on seo audit reports, in the form of actionable recommendations.
2. Inconsistent Encoding on url formations
When the website deals with non-Latin characters or special characters, it is important to decide on the right strategy for url encoding. Is the whole product name going to be encoded, even spaces? or will space be replaced by dash as an exception, or by an underscore like Wikipedia does? Does the rel canonical url need encoding too? is there inconsistencies in the way encoding has been performed?
This is quite a technical topic that’s often a must to review as part of an audit exercise on multilingual websites displaying names with accentuations, non-latin characters (eg: Cyrillic) or special characters.
This is a matter that understandably requires thought, consultation, and patience to get right, so I can see why it is often just superficially mention throughout audits, but it does deserve due consideration and assessment. The decision should be documented and if possible included in a more overarching strategy document such as the ‘URL strategy’, or ‘URL Policy’ document, or ‘SEO Strategy’ for the matter.
The right questions to ask, in the framework of an audit, are:
- What parts of a url are we going to encode (full url or just the dbase-powered part of it)?
- Is there also transliteration to do?
- what encoding system are we going to use?
- the ampersand character (“&”) should be encoded as “&” as it’s a special character. The same should go for the rel canonical url, it should appear encoded.
- the accented “é” in some European languages like Spanish or French should be encoded as “%C3%A9”. Eg: “développement” in French
This kind of encoding is critical for websites that display database-powered products or services. For example, E-commerce websites dealing with listing categories that display products in different languages. Or Job board websites boasting jobs containing technical names with special characters. For example C# developer on Indeed. They have had to encode the symbol that accompanies the letter ‘C’, otherwise, the results would be displaying just job offers for Developers in C programming language, as opposed to C sharp (#) .
It is important to decide on the best encoding for the purpose of the website.
Unicode or UTF-8 encodings are safe choices since you can use a single encoding to handle pretty much any character you are likely to encounter. This greatly simplifies things. Deciding on a system for url encoding can help avoid a whole host of issues in the future.
Read more about url encoding here:
3. Incorrectly formatted Robots.txt files
The robots.txt file is a text file intended for search engine bots or any type of bots visiting the site. In principle, bots start exploring websites by looking for the robots.txt file at the root of the site.
By looking at the directives in the robots.txt file, search engine robots can specify which pages or content paths can or cannot be indexed. Bots do not always obey those directives but there is still value in defining the structure of a robots file.
For an SEO consultant who’s performing a technical audit, the task of auditing and assessing whether a robots.txt file contains the right directives is not easy. It should be carefully carried out allowing sufficient time to consider every aspect of the site architecture.
The way I usually advise on the set up and configuration of the robots file is by
- making small annotations of the different website paths throughout the audit.
- noting down parameters, filters, and fractions that should definitely not be crawled
- should have added enough intel to be able to give advice as to the most suitable config for the robots.txt file.
One needs to have first-hand experience of the website structure and its features such as faceted navigation, pagination, etc before they can advise on a suitable configuration for the robots.txt file.
Technical seo audits are often delivered without much input on the robots.txt file other than general syntax corrections. grab image too
While it is good correcting the syntax and ensuring that UTF-8 is use, advising on the right configuration for the robots file is key to help maximise crawlability on the site in question. Don’t let robots crawl more than they should.
I have seen audits where great advice is given as to the need to deindex content deemed as duplicated or thin (eg: e-commerce urls boasting appended parameters), but then no advice is given to help declutter the robots file or even set it up properly with an eye on the taxonomy of the website.
3. Assessing TTFB (Time to First Byte)
Google recommends a Server response time under 200 milliseconds. People want fast websites and they get frustrated when they have to wait till the page is downloaded. Therefore TTFB should be a must to cover in technical seo audits?
I’m not intending to cover in full the whole subject of Website speed touching all server/client side factors. But I feel that TTFB has a lot of weight in the overall paradigm of Website speed.
The question would be: how long does the browser wait before receiving its first byte of data from the server? That critical ‘wait’ state should ideally be optimised down to below 200 ms. Not that I’m proud of that score of 662.4 on my own website.
So, where is the bottleneck in there?
Let’s summarise how TTFB takes place in three easy steps:
A. Request to server processing: on every website visit, the client(browser) has to make a request to the server. This request can be fast or slow based on several factors. One of them is the speed of the DNS lookup. Are you using the DNS service provided by your domain name, or a premium one? this can make a big difference to delivering a faster request to the server.
You can dramatically improve DNS lookups and load time by moving as many resources as possible to the CDN provider. Yet this is rarely mentioned/covered on technical SEO audits, despite seeing large counts of resources, eg: multimedia content often hosted by the website server itself.
B. Server processing: This is the part where the server has to deal with things like database calls, caching, parsing of 3rd party scripts, quality of code. Server resources really play a role here: server processing power, memory, ability to perform compression (eg: gzip).
This is where having a good solid host can really make a difference. Some hosts claim to have very fast servers, but they don’t. At some point, moving one of my affiliate sites from WPEngine onto Kinsta saw a speed increase of over 125%.
If your site is running wordpress there are plenty of good options today. Even if you are on a budget you should be able to find good wordpress hosts.
I see websites backed up by solid well-established businesses running even with VC help, but being hosted in unresponsive or less than ideal hosting solutions.
C. Server response to client: again in here having a good host counts too as the network used by the server to send the response has to be fast. This is not to mix with the client’s internet connexion, wifi issues, etc, that would also affect the TTFB, but it’s out of scope here.
In short, assessing how robust the server is, the CDN set up and the effectiveness of the DNS lookup are key things that can dramatically affect overall website speed. These areas should genuinely be included in a technical SEO audit.
4. CDN optimisation
Content delivery networks have a lot to do with the site’s overall performance in search, especially if we are looking at a website that has international reach. Yet, CDN auditing is rarely covered in SEO audits.
CDNs (Content delivery networks) boast the ability to improve SEO in many ways:
- Accelerating page webpage load time across the globe. Webpage Loading time is an important criterion for SEO. CDNs decrease webpage’s response time and therefore improve their loading speed
- Decrease image size sent to the user whenever needed depending on the device.
- Keep stability at the times of heavy traffic to the website
- help reduce the number of server calls
In short, CDNs can be highly beneficial to websites with high levels of traffic or sites with geographically dispersed traffic.
However, there are risks associated with CDNs too.
- 1. Misconfigurations of CDN But be careful when installing your CDN: do it cautiously or seek the help of a professional. If you do not install it correctly, it will damage your SEO and your user experience.
- 2. The CDN provider may not have servers in some of the locations your company operates. Your customers will then experience Latency and longer downtime and user experience will be affected.
CDN set up and configuration, as well as possible interference with other aspects of the web architecture should be audited in the same way as other elements.
5. Chain redirects or redirect loops
This often is often more visible in SEO audits, it’s kind of a must, but I have still seen audits that did not address properly the different types of chain redirects, or perhaps chain redirects were acknowledged but redirect loops were left aside. They are two different things and both contribute largely to wasting crawl time on site.
Chain redirects can be caused by development changes made to the internal link structure, or by renaming urls or perhaps by platform migrations. It is worth assessing properly how serious the matter of chain redirect is throughout the technical SEO audit and advise on prioritisation accordingly in the list of recommendations via an audit report.
6. XML Multi-sitemaps
What can be better than a sitemap that acknowledges the taxonomy of the website? A cluster of xml sitemaps comprised of identifiable child sitemaps that represent each of the key categories or sections of a large website is like a treasure map to indexation success.
XML multi-sitemaps act as a roadmap for diagnosing indexation issues, bloated sections or even bloating in the index itself. Using xml sitemaps and matching them against content already in the index presents a phenomenal tool to help troubleshoot or optimize crawlability and indexation.
Let me clarify: I do agree that sitemaps will not give you an SEO boost, but they will help you get indexation visibility across the horizon in order to improve your technical SEO strategies. These xml files in combination with crawling data and log file data can massively inform your SEO to help you establish the next big move.
XML sitemaps get a lot of criticism, often unfairly called ‘useless’, yet they are an extremely useful SEO tool, in my opinion. I have used them to good effect in my career as an SEO and today I still use them to diagnose, test and improve technical SEO.
More often than not, if the website being audited is over 5K-6K urls, in my opinion, there is already enough justification to recommend splitting the xml sitemaps logically into sensible topics, sections or categories. But instead audits usually just kind of recommend to have an xml sitemap, or if one is present, this is ticked as if nothing else is to be done.
6. Semantic mark-up
Despite industry available advice to prioritise structured mark up and search engines like Google encouraging webmasters to embrace it, there is still little adoption of structured data today, especially by big brands.
The main goal for structured data is to declare specific elements in the websites’ HTML to help the search engines better understand what the content is about: video, images, ratings, events information, recipes, organization information, job offers.
Read Google’s guidelines for use of Schema structured data and 3rd party resources like Moz’s series on structure data for seo to start getting acquainted with it.
There are three different types of search structured data as part of the Schema.org protocol :
However other types of structured mark up exist such as the DublinCore initiative, Facebook Opengraph and Twitter cards.
In some verticals, this is huge, to the point that unless you implement structured data effectively, you will lose out big in the race to be visible. For example, in order to rank on the ‘Google for Jobs’ boxes, every individual job posting needs to followed strict Schema parameters either with Json-ld or microdata.
Here’s an example:
Pop over to Merkle to get your schema code automatically generated.
The benefits of implementing marketing up at scale on large website networks are clear. However, this is still today something not fully embraced by large organisation, either due to lack of internal knowledge, resources, clumsy old CMS or other corporate SEO impediments.
Here are some tools that you can use to help validate your mark up data:
1. Google’s Structured Data Testing tool : https://search.google.com/structured-data/ testing-tool/u/0/
2. Rich result test (Google): https://search.google.com/test/rich-results
3. Validation tool for Bing: https://www.bing.com/toolbox/markup-validator
Researching structure markup feasibility and making recommendations for its implementation if scope exists, should be included on technical SEO audits.
Sometimes though, key insights on an audit report can be missed as the auditor hasn’t really spent enough time understanding how the business works and how the website delivers against the business objectives. Internal knowledge about the company can really help formulate better recommendations upon the technical SEO audit exercise.
Whatever use you make of the report and the recommendations, it is advised to find an SEO that can work with your technical teams to implement the recommendations in a way that’s efficient and always looking at the 20/80 rule: trying to achieve 80% of the results investing 20% of time & resource.