Login



Lost password?

close

Apex Internet Blog

Duplicate content - what is it, why does it matter and what can you do about it?

Duplicate content – it’s one of the most misunderstood areas of Search Engine Optimisation and one which is often the subject of misinformation. In this post we’ll do our best to shed some light on the concept of duplicate content, explaining what it is, why it occurs, why it can be a problem and what you can do to reduce or rectify duplicate content issues on your site.

What is duplicate content?

Duplicate content is used to describe a situation whereby multiple URLs (pages) contain the same (or almost the same) content. Pages with duplicate content can be either part of the same or website, or they can be on different websites.

Now before we dive too deep into the subject of duplicate content, it’s important to clarify that duplicate content that is created as a result of accidental or non-malicious actions will not be subject to any duplicate content penalty. In fact, most SEO experts and webmasters agree that a duplicate content penalty doesn’t even exist, unless the Search Engines determine that the duplicate content has come about as a result of manipulative or malicious actions.

In the case of malicious duplicate content being identified, it’s likely that the Search Engines will adjust their indexing and ranking of the sites involved, causing the site’s rankings to suffer, or worse still drop out of the Search Engine’s index completely.

Why is duplicate content a problem?

The job of the Search Engines is to crawl, index and provide their users with as much unique and relevant content as possible. A Search Engine wouldn’t be doing its job properly if it provided its searchers with multiple pages of results (URLs) that each contained essentially identical information.

In order to supply searchers with a wide selection of relevant information, the Search Engines have created duplicate content 'filters' designed to filter out information they already know about. If your page is one of those that is filtered out of the search results because it contains the same (or considerably similar) content as another URL, then while it may feel as though your site has been penalised, in reality it’s just the Search Engines choosing not to display multiple versions of the same content. For more information about the duplicate content filter take a look at our earlier post titled 'What is the Duplicate Content Filter?'.

What impact can duplicate content have on my site?

If the Search Engines identify duplicate content they will attempt to identify the original source of the content and determine the best version of the page to display in their search results. Usually this choice is based on factors such as page age, domain authority, number and quality of incoming links, PageRank and so on. Hence, if there are pages in your site that contain large amounts of duplicate content then there is a good chance that those pages will not appear in the search results, thus reducing your online visibility and reducing the overall quality of your site in the eyes of the Search Engines.

What causes duplicate content?

Now that we've clarified the myth of the 'duplicate content penalty' (except in instances of malicious duplicate content), we'll take a look at some of the common causes of non-malicious duplicate content and what can be done to prevent it from occurring.

  • Multiple domain names - this problem is caused by companies that purchase and use multiple domain names for the same website without first making sure that the correct measures are in place. For example by incorrectly pointing www.example.com and www.example.co.nz to the same website it is certain that one of these sites will face duplicate content issues. To prevent this a 301 permanent redirect should be used to ensure that only one version of the domain returns the correct page.

  • URL canonicalisation - this means that one page can be accessed by multiple different URLs. When a site has a number of different URLs that display what is essentially the same content, the Search Engines may split the overall link popularity of the page across all of the URLs. This is neither ideal nor recommended as it can dilute the strength of that content's ranking signals and create confusion for the Search Engines as to which page they should include in their index.

    For example all of the following URLs could all point to the same page containing the same content, thus creating duplicate content issues:

    • http://example.co.nz/
    • http://www.example.co.nz/
    • http://www.example.co.nz/index.html
    • http://www.example.co.nz/index.htm
    • http://www.example.co.nz/index.html?somevar

    To prevent URL canonicalisation and the duplicate content problems that can result from it, in the situation described above we advise implementing a 301 permanent redirect to your preferred version of the URL .

    Lastly, in Google's Webmaster Tools interface you should also specify your preferred domain - the one you would like used to index your site's pages.

  • e-Commerce based duplicate content issues - Often e-Commerce sites use of a combination of tracking IDs, pagination, navigation breadcrumbs, sorting options and other e-commerce based options such as price and so on . When elements of these are combined to generate unique URLs that control what information is requested from the database, it's not hard to see how multiple different pages can be created with exactly the same (or very similar) content on each page.

    To solve these kinds of e-commerce based duplicate content issues we suggest implementing the canonical tag.

    The canonical tag is placed in the HTML header section of a page and is used to notify the Search Engines about your preferred URL for the particular page. It's a great tool to use when 301 redirects are either not possible or not appropriate, and best of all it still passes the link value that is lost to the duplicates back to your preferred page.

    Another problem with e-Commerce sites is that many site owners publish manufacturer’s generic product descriptions on their product pages. If no additional unique content is located on these pages, then the Search Engines may consider that the page is the same as many other pages that also contain the generic product description and subsequently not display that page in their search results. To prevent this from occurring we recommend writing original descriptions for your products, which although time consuming can be a practical way of avoiding the duplicate content filter.

  • Printer-only versions of web pages - Many sites offer print-friendly versions of the same content. In this case a meta “noindex” tag should be placed on the printer-friendly versions of these pages to prevent Search Engines from indexing them.

  • Use of the same or very similar content on different subdomains or different country top level domains (TLDs) - Different subdomains and different top level domains are often used as a way for site owners to promote different brands, focus on different types of services, or target audiences in different geographic locations. By taking this approach and simply duplicating content from one site to another runs you are running the risk that some of your pages may not get indexed by the Search Engines, or are filtered out of search results.

    We recommend you don’t create multiple pages, subdomains, or domains without ensuring that they contain significantly unique content.

    Furthermore if you're trying to target a different geographic market and serve them with country specific content, we suggest using top-level domains to do so whenever possible. According to Google, Search Engines are more likely to know that http://www.example.com.au contains Australian-focused content, for instance, than http://www.example.com/au or http://au.example.com

A final word…

Provided that you're not attempting to deceive the Search Engines and manipulate the results they deliver to their users, then having some duplicate content on your site is not the end of the world.  Nevertheless, if you can clean up your site's content, URLs and site architecture to reduce occurrences of duplicate content you'll have more control over which pages of your site the Search Engines choose to display and more control over your site's presence in the Search Engines.


Categories: Search Engine Marketing, Search
Posted on: 13th Sep 10 by Mark Vassiliou
Post Comment

New Plymouth
Phone (+64) 06 769 5640
Fax (+64) 06 769 5641
Email info@apexinternet.co.nz

Christchurch
Phone (+64) 03 374 6518
Fax (+64) 03 374 6519
Email chch@apexinternet.co.nz