A website content audit gives you a both a great overview and in-depth knowledge of the current state of your website content. To do a content audit, you’ll need to get a list of URLs, page titles, your analytics, and pull all this into a spreadsheet. Once in a spreadsheet, you can structure the page levels, identify content types, audiences, and pick out keywords. This article describes the process of how to do a content audit to eventually inform your site map.
What Are We Trying to Get?
Before diving into describing a content audit, it helps to have a picture of what this could look like. Here’s an example of a content audit spreadsheet with its columns, rows, and data. You’ll see all the levels, the analytics, and the audit information combined.
Get a List of URLs and Page Titles
In order to review the content on your site, you’ll need a list of URLs and page titles currently on your site. For a small site, you could do this manually but for a larger site you should track down a spreadsheet that lists this information.
Merge in Your Analytics
Analytics are so useful to a content audit. Associate the page statistics with the appropriate URL and title in your spreadsheet.
Content audits can look at many different aspects and you can have sections of columns for:
- Publishing metadata, such as content owners or last revised dates
- Taxonomy focused information, such as keywords and content types
- User experience columns for the targeted persona, intended audiences
- Editorial information for quality of page layout, writing, tone and voice adherence
- Analytics data for page views, percentage of total page views (you may need to make this column yourself). time on page, and any other meaningful data point
- Delete, Keep, Rewrite
Adding in columns depends on which kind of content audit you’re doing.
Select a Subset of Content
With a list of all the pages on your site in hand, you can determine if you can audit all the content or only a subset. For a site that has 100 pages, you could easily go through each page and make your audit notes. For a site that has 2000 pages where 500 are news and events, and then many pages repeat the same layout patterns, you may be able to audit only a portion.
For a site that has 10,000 pages, you’ll need to determine either which type of content you’ll audit, or which section of the site. It isn’t practical or necessary to go through all 10,000 pages to get a handle on the problems with the site structure and content.
Structure the List
As in the above picture, you’ll see that the page titles are separated into Level 1, Level 2, Level 3, and so on. Chances are you’ll need to do this in your spreadsheet. Put the titles in the appropriate columns based on the level they appear in on the website. You may need to go to the website to determine the hierarchy.
Now that you’ve got your list of pages and have (potentially) narrowed your focus, you can start clicking away. Based on the columns you added, you can look at:
- Publishing metadata: When was the last time it was revised? Does the content owner still work at the company? Is the content owner unassigned?
- Taxonomy: What is the observable content type? What keywords appear on the page? Does the content refer to things like topics, organizations, or departments? List what is important to your company.
- User experience: Who is the targeted audience for this content? Can you identify it? If so, note it. If not, note that too!
- Editorial information: Is the page content well structured? Does it follow the style guide? Editorial guidelines? How well written is it? Does it use plain language?
- Analytics: What is the percentage of visits to this page? How long do people spend on the page? Is there an important page that’s never accessed? Or an unimportant page that receives a lot of traffic?
- Delete, Keep-as-is, Keep-and-Rewrite: Do you want to keep the content? If so, should it be rewritten first?
Identifying the page structure, which pages aren’t used, which aren’t written to the editorial standard, which ones are lacking owners, is a huge step toward improving the content on your site. You’ll find low hanging fruit and can immediately address those items without having to consult anyone. Is 10% of your content NEVER used? And by never, I mean receives less than 0.01% of website traffic? Are you holding onto it only because you’ve just never done anything to get rid of it? If so, delete it already! Or at least archive it so it’s not interfering with your SEO.
Use Search Terms in Your Content Audit
Using search terms as part of a content audit can give insight into the structure of your site. You can analyze these search terms to inform your taxonomy and future content efforts, and to evaluate the findability of content on your site.
This article refers to using the on-site search terms in a content audit.
Search terms not in your taxonomy
As a way to improve taxonomy and search, one way to use on-site search terms to improve search results is to see how the searched terms are accommodated in the taxonomy. To do this, take the searched term and identify where it is in the taxonomy, either as a preferred term, a synonym, or a keyword or keyphrase.
You may not want all of the search terms in the taxonomy or maybe there is no content for the term. If there are search terms for which you don’t have content and you would never talk about that term, then it’s not appropriate to put it into the taxonomy or to have content for that term.
After reviewing the search terms, some of these terms may align with your organization’s content strategy. You can feed these search terms into the content creation life cycle.
If these search terms are within your organization’s domain, then creating content around the term can be a longer term goal.
Note that improving search by adding preferred terms, synonyms, and keywords and phrases isn’t as simple as adding these terms to your taxonomy. You’ll still need to ensure that the search tool uses the taxonomy to improve search results.
Each search tool is different, so you’ll need to investigate based on the tool and talk with developers.
Can’t Find Page through Browse
Browsing and searching are two ways to access content on a website. While users might start their website experience with the search box, they can also turn to search when they can’t find content through browse. It might be more typical for users to browse on your site instead of using the search box on your site. A search engine, such as Google or Bing, is a different use case that directs users to content on your site.
Going through the search terms to see what users have searched for can potentially tell you what people can’t find on your site. We say “potentially” because you really need to dig into these search terms to see what’s happening on your site.
Ways to Investigate
Some ways to investigate what is happening with search terms include:
- Does the search term and its variants appear frequently? If the term comes up frequently in the search terms, this may point to popular content that can’t be found through browsing.
- Does the search term and its variants not appear frequently? If it doesn’t come up frequently, it may point to content that is easier to find through browse or is not popular for users.
- Compare the page hits: If the content isn’t popular but is searched for frequently, this can point to the content not being findable on the site. If the content isn’t popular and not searched frequently, this can point to the content being irrelevant to the site.
- Reenact the search: Take some of the more frequently searched terms and perform that search on your site. What results appear? Do the results match the intent of the search? Do odd results appear? Evaluate the results and list the ways the content, taxonomy, and search tool can be improved to improve the search results.
Build the Site Map
The content audit will have helped you cull your content. You can use your personas to identify content gaps, and use the list of content you’re keeping, and create another spreadsheet that lists the pages for the restructured site. There’s a lot of nuance to this, but essentially you’re slotting the new and existing content into the revised structure.