Around half the people I talk to, about Search Engine Optimisation, are terrified of duplicate content on their own websites. But 80% or more of the spam that I see, is massively duplicated. Why do real site owners, with valuable content but multiple paths to it, fear duplicate penalties from Google, but spammers who endlessly duplicate useless garbage, can fearlessly sell link building services to terrified site owners, on the basis of their ability to massively distribute spammy duplicate links?
I think the reason is that site owners who “invest” in spammy paid link purchasing rarely deeply understand what they are buying. And link spammers don’t really care whether what they do is effective, so long as there are people prepared to buy, and so long as Google and Bing mistake the links as being valid in the early days. Businesses usually evaluate the impact of an activity fairly early – so if they are told that the search engine impact will be most visible a few weeks or months after starting, then that’s when they’ll measure. A spammy link buyer will keep buying for years, because the impact is positive at the time of the measurement – the value declines with time as Google and Bing detect the patterns of spamming.
The very worst link spammers will submit your site to places that are already known to the search engines as places for low quality links, and already offering no value. So the most value that you get from the service, is a list of places that won’t have any impact… Useful if you need to go back and clean up the spammy links, later.
Take, for example, this piece of spam, submitted to this blog:

It looks appropriate. It’s about H1N1 and it has been submitted as a comment to an article about H1N1. But the URL given is for a product, even though the name offered is not a keyword. Is there any way to tell that it is spam? We could search for a key piece of text that seems unlikely to be in other comments. And here’s the traces that this is a piece of spam:

We can see the same author ID, with the same comment in many blogs – 314 blogs identified as carrying that precise piece of text, presumably with the same link to the “Fish Oil FAQ”. It is definitely spam.
In what way are Google and Bing so stupid that they can’t detect the same piece of writing in comments, when they can tell that a site has two or three paths that lead to the same product, wrapped in a templated page? It doesn’t add up that Google and Bing would penalise a site owner for multiple paths to a product that customers buy, but don’t penalise spammy links. So, do the search engine penalise spam?
Why are pages containing spam reported in search results, if the content is treated as spam?
Search engines are looking at the overall quality of the site and its’ pages. Some spammy comments to a blog or a discussion forum won’t kill the pages’ value. If users are finding the whole page is useful, then the whole page isn’t deranked – unless the web spam teams decide that the only reason for the page is to host, or be target of, spammy links. So you can find spammy postings on pages that have weight. A few spammy links on an otherwise useful page, won’t kill the page. That’s why we can still find spammy comments – they are a part of a page that is valuable.
That doesn’t mean that the reverse is true – spammy comments can be found, but it doesn’t mean that the links in the spam carry any weight. If they did carry weight, then we should find at least 314 sites are offering weight to the Fish Oil FAQ. So… where’s the site in the listings?
Interestingly, you can’t find the site named in the spammy posting. Yup. All that spamming and link dropping has had no useful effect at all – just try the search for links for fishoilfaq.com. Which just goes to show that the technique is pointless – it doesn’t work.
Back To Duplicate Content
Google and Bing are tolerant of genuine duplication within a web site. The large search engines even have a mechanism to help webmasters to signal that they are aware of duplication in their sites, and have a preferred path to that resource – the canonical link ref. A signal agreed to and used by the major search engines. It’s been so successful that search engines are now respecting the canonical link reference tag across domains (with some limitations).
But, identical postings, across a range of blogs and discussion forums, with keyword laden author names? Somehow that pretty obvious technique is supposed to defeat the search engines with wicked cleverness? It does, for a while. Then the web spam teams notice, zero weight the spam, and decreases their trust in your business. And that’s why duplicated postings in user generated content don’t work – blog spamming is an ultimately sterile exercise. If you’re going to comment, comment because you are a part of the discussion. Be interesting enough, and people will write about you and what you’ve written, in their articles – just as I’ve written about Danny Sullivan, below.
This decay in the value of blog spam (and other types of undeclared paid links) is why we hear the repeated refrain from businesses that:
“we hired an SEO agency for link building, it made an impact at first, but since we terminated the contract there’s been no impact on our business”.
Spammy link building has a positive effect on visitor volumes in the first few months or even years. But after a while, the search engines downgrade it, and decrease trust in the sites that gave you links (because those sites host undeclared paid links) and in your site (as a business that buys undeclared paid links). The activities have less and less impact with increasing time, and it is harder for your business to make headway once the search engines suspect that you are focusing on spamming as a link building strategy. You can even find that an entire chunk of your website is not being given any credibility for inbound links.
Blog spam has an unpleasant impact on the blogs it is dumped on, too. Read Danny Sullivan’s article about the way that blog spam affected a nascent site that may been useful to a specific online segment. The site did have some spam defences in place, but doesn’t it seem just a tad nightmareish that a site offering some long lasting value is taken out of action through activities that ultimately have no or little value. The economic equation is imbalanced. When that happens, as Danny implies, there is time and space for ethics and morality to play a part. Law? I don’t hold out a lot of hope for that – all that a US based law would do is to drive the targets offshore, or to use anonymising proxies, etc. (See our ancient article about tracking the steps in a spamming effort, apparently by some Ukranians).
Summary
Spend less time worrying about duplication on your own site – use the canonical link reference to help yourself and the search engines. Read Google’s official description about the canonical link reference, and how they have coordinated with Bing and Yahoo to understand the tag.
Spend more time worrying about what your linking strategy is telling Google and Bing – are you telling the search engines that your business will lie and deceive? Here’s Bings statement about spam and what they do in response to detecting spammy links. You really want those outcomes? You really want to pay people to cause work for other site owners, that has no long term benefit and may have disastrous repercussions on your own site? And when you find that the search engines no longer trust you, then you’re going to face a higher bill to remove links – there’s automated link placement, but the technologies for link removal are largely manual, and hence more expensive.
It’s pretty easy to establish the practices that lead to long lasting, higher ranking web sites. Start engaging with your prospects and clients, or find another way to engage with an online audience – at this point in the search engine optimisation game, they don’t have to be the audience that you sell to!


