Implementing a taxonomy process to a CMS

A triangle showing the various components of taxonomy.
Taxonomy should incorporate both structured and unstructured content.

Taxonomy, in a web context, can be considered as a conceptual framework for organising information that your CMS will use to improve searching. It is tightly related to a defined scope and context of the business. Together with keywords and other elements, it is a key a component of information architecture on a website.

As more and more content is added to your website, searching for relevant information becomes more difficult. Although search engines have become much more sophisticated in recent years, finding information on the internet is often a hit or miss proposition due to poorly defined content.

To aid users find the most relevant content, your CMS should mandate the use of a combination of structured (referred to as “taxonomy”) and unstructured words (referred to as “keywords”) to categorise your content.

Taxonomy can be defined as words used to describe a theme or categorise the content. The words will be predefined and authorised by the content owner for use in that theme or category. The words can follow a tree structure so that one word leads to a number of other related words up to a maximum of three levels deep.

For example, taxonomy for “animals” might look like this:

Animals
    Vertebrates
         Mammals
         Birds
         Fish
         Reptiles
         Amphibians

   Invertebrates
Plants

Taxonomy must be based on the content (or subject) area and not on your organisational structure. However, in the background, your business context and users need to be considered and where appropriate, the terms in the taxonomy need to reflect this.

It’s important the taxonomy describes a theme or category as this is semantically different from the keywords of the content on a page. In the above example, if the content of the page is about “fish and chips” recipe, then its taxonomy would start with “food” and not “animals”.

Searching advantage

The most important element is content. All your editors and publishers should be trained on the key elements of writing content suitable for the web. Supporting this content will be the other elements in this diagram. When a page is constructed with all the elements in the diagram, it greatly assists the most appropriate content to surface higher in the search results.


By integrating a taxonomy framework into the content management processes, you will increase the search utility of the CMS. The CMS is better able to understand the website structure and thereby make better use of the dynamic page creation.

Both the taxonomy structure and the keyword view are involved at different stages of the web search process. Prior to search execution, a search engine spiders and indexes web content within the taxonomy framework. 

Faceted search techniques

These are search techniques whereby you enable the user to navigate your content along multiple corresponding paths. For example, in the above scenario, animals can be grouped separately by the bird facet or fish facet etc.

Keywords

These are words that summarise the subject matter of the content. They can also be words that are related to the content but do not necessarily appear in the content. In the CMS, these words will be free text (i.e. the author is able to specify the words) and is therefore considered unstructured content classification. There should be a limit on the number of words that can be used so as to optimise searching.

At the Met Office

At the Met Office I implemented a taxonomy process to the Research content of their website. In this process, I identified a content owner (or subject matter expert) of each area of Research and they were the owner of the taxonomy for their content. To support the content owner, I also engaged specialist librarians to help in the initial training in the concept of taxonomy and later to peer review the proposed taxonomy.

Once defined and agreed, the taxonomy was fixed for that area and the editors were only be able to use those terms. If the need arose, the taxonomy can be changed at an administrator level and the changes will apply to all subsequent pages.

As a final step, I setup a review four months from the date of the full site launch to ensure that the taxonomy terms were correct. This review would be a combination of user & editor feedback, web analytics, SEO analysis and internal search term analysis. I would expect to find that the review would result in minor changes in the taxonomy as you can never define your content perfectly first time!

This page was last updated on Sunday, 5th September 2010.