Duplicate Content Tutorials – Business Review Center

Review of:

Reviewed by:
On November 14, 2012
Last modified:November 27, 2012


Don’t Be Duped, Be Informed

Just because no one knows exactly what duplicate-content is, so that among webmasters, SEO duplicate content is always a hot topic.
Google does not help much, and sometimes, I personally think of it as a hyperactive 3 year-old, who is really sharp in some areas, but not in others.
Then, the best way to go is keeping it simple, staying under the radar, and shooting for the middle of the road.
Now, let’s figure out what duplicate content really is, what it isn’t, and what you should do to stay on top of it.

 What Duplicate Content is

Here, from the horse’s mouth, generally, duplicate content refers to substantive blocks of content across or within domains that match other content completely or are similar appreciably. And, this is not deceptive in origin. Below are possible examples of non-malicious duplicate content:

  • Discussion forums, which can generate regular and stripped-down pages targeted at mobile devices.
  • Store items that are shown or linked via multiple distinct URLs
  • Printer-only versions of website pages.

Identical or substantially similar content within your domain or across others and most of it is normal and acceptable.

Duplicate Content examples:

1. URL Parameters
Click tracking and analytics code can cause SEO duplicate content issues.

duplicate content-url parameters

2. Printer-Friendly versions
This can cause duplicate content issues when multiple versions of the pages get indexed.


3. Session IDs
Another common duplicate content creator is session IDs. This occurs if each user visits a site is assigned with different session ID that is stored in the URL.

duplicate content - session ids

Why Duplicate Content Is a Problem

How would you like to search for the best pecan pie recipe and realize that every single result on the first page turned out the exact same recipe?
Obviously, users don’t like the same result, and Google also certainly does not like crawling the same results.
Duplicate content is a problem. That’s because it is difficult for search engines to decide which version is more relevant to a given search query, when there are more than one of identical content piece on the Internet. Search engines rarely show multiple, duplicate pieces of content in order to provide the best search experience. And thus, they are forced to choose which version is the best or most likely to be the original.
For a search engine, it’s a processing consideration. The crawl or indexation rates might be dampened, if there is substantial duplication. In short, the website can lose some of its ‘trust’.

Here are three of the biggest issues with duplicate content:

  1. In fact, search engines don’t really know which version to include or exclude from their indices
  2. Search engines don’t know exactly whether to direct the link metrics (trust, authority, anchor text, link juice, …) to a page, or separated it from multiple versions.
  3. Search engines do not know exactly which versions to rank for query results

When duplicate content is present, search engines provide less relevant results, site owners suffer rankings and also traffic losses. seo-duplicate-content

Two Types of Duplicate Content

We all have our own ways and ideas about duplicate content. Most of the times, they come to not republish the same article to multiple directories. Instead, you should spend hours spinning that same article straight to the point where it does not make sense any longer, then publish it to a zillion directories.
Now, let’s see the two types of duplicate content, in the spirit of being informed.

  • Cross-domain type is the most commonly thought and includes the same content, which often appears unintentionally on several external sites. 
  • Within-your-domain type is the one that Google mostly concerned about, that often appears unintentionally in several different pages or places within your website.

Now, let’s do more exploring into each type and also see what Google thinks about it.

Off-Site Content Syndication

Absolutely, with syndicating your content to different sites per se, there is nothing wrong.
When your content gets syndicated, Google simply goes through all the available versions then indicate the one that they find appropriate the most for a specific search.
You should know that the one you would prefer to have ranked might not be the most appropriate version. That’s why each piece of syndicated content includes a link back to your original post is very important. In my point of view, it would be on your site. Then, Google traces the original version and sometimes most likely display it in its search results.

I would be mindful that it can be difficult to determine how much the site wrote its own content and just used syndicated content when taking all your articles and submitting them for syndication all the place. As for me, you should 1) avoid over-syndicating the articles, and 2) make sure that you set a link to the original content, if you do syndicate content. That helps ensure your original content gain more PageRank, which aid in picking the best documents in our index. (Source from Matt Cutts)

Black Hat Syndication

However, the other side of content syndication coin is that the content is duplicated deliberately across the web to manipulate search rankings or to generate more SEO traffic.
This results in repeated content showing up in SERPs, is not only upset the searchers, but also forces Google to clean out the house.
In the rare cases which Google perceives that duplicate content may be displayed to manipulate your rankings and deceive your users, you also make appropriate adjustments in the indexing and ranking of the sites involved. Obviously, the site’s ranking may suffer, or the site might be entirely removed from the Google index, in that case it will no longer appear in search engine results. (Source from Google Webmaster Tools Help)

On-Site Content Syndication

On-site duplicate content issues are more common and entirely under your control, which makes it easy to fix them.
Learning more about your content management system should be your first step to identify the potential weak spots on your blog.
For instance, a blog post can display on your blog home page, as well as category page, tag page, archives, etc. – That’s the true definition of duplicate content.
The users have the common sense that it’s the same post; they just get to it via different URLs. However, search engines as unique web pages with same content = duplicate content.

How to Take Matters into Your Hands

In order to minimize the presence to dupe content on your site, ere are some practical “non-techie” steps you can take:

  1. Mind your canonicalization issues. In other words, http://businessreviewcenter.com, businessreviewcenter.com, businessreviewcenter.com/index.html are one and the same site we are concerned, but three different sites as far as search engines are concerned.
  2. Be consistent in internal link building: you should not link to /page/ and /page and /page/index.htm. It can cause lower per-page PageRank, if links to your pages are split among various versions.
  3. Include preferred URLs in the sitemap. Moreover, this should be set for all websites. It is considered a simple way to show Google whether a given site should be displayed with or without a www in the search result pages.
  4. Use 301 redirects, if you have restructured your site (for example, changed your permalink structure to a SEO-friendly one), use (“RedirectPermanent”) in your .htaccess file, or even use one of the many Redirection plugins available in your WordPress plugin directory. The most effective way to combat duplicate content in many cases is to set up a 301 redirect from your “duplicate” page to the original content page. When multiple pages having the potential to rank well are combined into a single one, they no longer compete with one another, but also build a stronger relevancy as well as popularity signal overall. This impacts positively their ability to rank higher in the search engines, as well as drive more traffic.duplicate content redirect
  5. Use rel=”canonical”. Another choice for you to solve duplicate content is to utilize the rel=canonical tag. As a 301 redirect, the rel=canonical passes the same amount of link juice (ranking power), and often takes up less development time to implement.
    This meta tag isn’t new, is part of the HTML head of a web page, but like nofollow, uses a new rel parameter. For instance:
    <link href=http://www.example.com/canonical-version-of-page/”rel=”canonical” />
    This tag shows Bing and Google that the given page should be treated as it were a copy of the URL www.example.com/canonical-version-of-page/ and all of the links and content metrics the search engines apply should be credited actually toward the provided URL.duplicate content-rel-canonical
  6. Noindex, follow: The meta robots tag with the values “noindex, follow” can be implemented on webpages that should not be contains in a search engine’s index. This permits the search engine bots to crawl the links on specified page, but keeps them from containing them in their index. This works well particularly with pagination issues.
  7. Use Google Webmaster Parameter Handling Tool. Google Webmaster Tools allows to set the preferred domain of your website as well as handle various URL parameters differently. However, the main drawback to the methods is that they only work for Google. Any change that you make here does not affect Bing or other search engines settings. 
  8. Minimize repetition: if you don’t post your affiliate disclaimer on every page, then create for it a separate page and linked to it web needed. 
  9. Managing the archive pages: By displaying excerpts on your archive pages instead of full posts, you can check duplicate content and avoid this issues. You want to give your readers a hint of the content and redirect them to the original posts. Then, open your archive.php of your theme and replace the_content with the_excerpt to accomplish that. Keep in mind that your tag pages as well as category should show excerpts only.
  10. Country-specific content: Google is likely to know that .de indicates Germany-focused content, than /de or de.example.com.

Oh, wait, one more important issue: robot.txt file.
Blocking duplicate URLs with robots.txt is not recommended by Google, that’s because if they cannot crawl a URL then they need to assume it’s unique. To let everything get crawled and clearly indicate which URLs are duplicates is better…. Robots.txt does not control indexing, but crawling. Because of a link to it from an external site, Google can index something, but not crawl it. That may create a duplicate content issue. (Source: TopRankBlog)
Now, let’s move on?

Duplicate Content Penalty

Let’s have Google answer the daunting question of is there a duplicate content penalty.
Below is a quote by Susan Moskwa, Webmaster Trends Analyst from Google:
A lot of people think that they will be penalized if they have duplicate content. The fact is that, Google does not penalize websites for accidental duplication in most cases. Many, many sites have duplicate content.
Google may penalize websites for duplication that is deliberate or manipulative. For instance: link networks, auto generated content or similar tactics designed to be manipulative.
She also further explained when webmasters do not need to worry about duplicate content:

  • Common and minimal duplication. 
  • Consider the cost of fixing the duplicate content situation with the benefits you would receive, when you think the benefits outweigh potential ranking concerns. 
  • Keep in mind that duplication is rather common, and search engines can easily handle it. 


How exactly Google handles it?

While pulling up the search engine results, Google basically collapses the duplicates just leaving only the most relevant, in their opinion, page in the SERPs for specific query. Google determines the most relevant result basing upon a myriad of factors and the thing you can do for your part is to always link back to your original post.

Scraping Be Gone!

A few words on the recent Google algorithm change from Matt Cutts:
My post mentioned that multiple changes should help drive spam levels even lower, containing one change that primarily affects websites that copy others’ page content and sites with low levels of original content. (source)
So, what does it mean to the average webmaster?
We can do a little chicken dance, since the probability of scraped or stolen, in other words, content ranking above the original posts that we put sweat, tears and blood into, is minimal.
Rightfully, Google is going to war against all the auto-blogs that do not have what it takes to produce their own content and all they do is to republish other people’s work hoping to rank highly in search engines, driving traffic to their crappy sites then make some money off paid advertisement, AdSense and such. (You can check out some of our reviews to take advantage of Adsense to make money with The Adsense 100k Blueprint, or with AdSense Recipe)

Good riddance!
If you realize that another website is duplicating your content by scraping, misappropriating and republishing it, then this negatively impacts your site’s ranking in Google search results pages. If you do spot a case, you are welcome to file a DMCA request to claim your ownership of the content and request removal of the other website from Google’s index.


Duplicate Content on The Blog

This SEO duplicate content topic is so important that I want to post on Business Review Center to make sure that you have your questions on the subject answered once and for all, well, until Google changes something again.
Of course, I have some special additions to the post just for my readers – you, that is.

Here are some examples of duplicate content on your blog that you might not be aware of:

  1. Comment Pagination: you create duplicate content when you break down your blog comments into pages, believe it or not?
  2. Tag/ Category pages: If you display full posts on those pages, then you create duplicate content. This issue is the same as with archives, so make sure you switch to show excerpts only because that can solve the issue. 
  3. Author pages: If you are a blog of one or many authors, and each one has an archive page, which collects all the posts he/she has created. You got it, and make sure to display excerpts.

If I think about it, I am sure to find many other examples of dupe content on your site.
Again, remember what I said: most of these examples are considered natural and Google most likely deal with them without a glitch.
Does it mean that you should forget about SEO duplicate content and live crazy life?
No, definitely not.
All in all, Google gives your website only that much “crawling” attention and you should spend those precious seconds wisely, but not make Google indicate what content it should pay attention to.

Duplicate Content – Marketing Takeaway

  1. SEO duplicate content does not cause your website to be penalized.
  2. Google is really good at picking your best version of the content to be displayed in SERPs and ignoring the rest.
  3. It’s easy to fix almost all dupe content issues and should be fixed as well.
  4. Don’t worry and don’t be afraid, be informed.

So, love it or not? Leave your comment to tell me that you’re alive! 

Image source: seomoz.org

About Tony Nguyen

Tony Nguyen is the founder of Business Review Center. Since 2011, he has managed a team that has collected customer feedback and complaints on digital products, then tested products, and written product reviews. Learn more about him here and connect with him on Twitter, at Facebook, Google+, and LinkedIn.

Commenting Policy STOP & READ: Do not use just keywords in "Name" field; you MUST leave a real name, if you want to see your comment approved. Thinking of dropping your link spam? Save the effort: your comment will NEVER show up on this blog.

Speak Your Mind


CommentLuv badge

Google Analytics Alternative