Effective Internet Marketing Strategy and Technique Through Experiments, Measurement and Audit

Spam in Comments, Unattributed Content

This article was triggered by a possibly spammy comment, and lead to investigating the quality of the site that was offered in the link. Bizarrely, it leads to further reflections on Google, and the Content Network. I struggle to make the Content Network work for my clients, at a reasonable ROI, and Google itself turns out to be part of the problem.

I use Akismet for this blog, and it traps most of the spam, most of the time. Occassionally it traps a real comment but the most frequent problem it presents is a comment that is unique, but uninformative. Was the comment made just to gain rank for a less known blog? One of the clues that I use is the age of the posting. By and large, the older the posting, the more meaningful the comment has to be. So for a brand new post, I might, barely, let a “great article, you are always interesting” through. When the article is six months old, you need a substantive comment with some real content or a proper question.

Another clue that I use, but that is annoyingly hard to extract from FeedBurner and Akismet, is what brought that commenter to the site. I’ve just found a comment that fell into my “it’s an old article” and “uninformative comment” category - so I wanted to see if it was just linkspam. Looking at the webserver logfiles shows that the search this user made was ‘[SEO] “powered by wordpress” “leave a comment” “no comment”‘. So, it’s spam. Looking at the site was another clue.

Why do I care?

Well, neighbourhood, and signal to noise ratios. I really don’t like linking to Get Rich Quick sites - mostly splogs. Most of them are, IMO, a way for the owner to get rich and not for the visitors to get rich. I hope that I’m a responsible marketeer - so I want to lose the scammers and make them work harder for their money.

The other issue is signal to noise ratios. Blogs that repeat the same tired information just annoy me. Don’t they annoy you?

I was looking for a clients’ problem and the search showed the same original article information popping up in blog after blog. Very few of the repeats gave attribution, and none of those that I read expanded on the information in the original blog. This isn’t such a good example, but the original gave away too much information about my clients’ problem… but check out a search for “akismet-admin workaround“. See how the same info is repeated, sometimes without attribution, but exactly the same edit is proposed? Notice how Google suppresses duplicate information - it is indexed and ranked, but not shown unless you click the “Repeat the search with the omitted results included”.

I don’t really know how to solve the problem of other sites duplicating unattributed articles, at *this* end. That is, as a single blog, denying links to sites that exist mostly for self-promotion, any action is pretty ineffective. However, Google could apply harder filters. Even in the weak example search that I cite, blog entries like this image, are plentiful:

Content Adverts Shown Before Duplicate Blog Article

This is pretty typical of sites where I wouldn’t want my client adverts shown, and that offer little value to users (because the information is already available, better presented, part of an active community). Note just how irrelevant that advert is to the search and the article content. It’s still an impression, though. If you clicked on something that irrelevant, imagine how interested you are likely to be in the product… I suggest (and conversion rates support this) that you have no immediate interest - nothing that fits in a 30 day cookie window, at least.

The problem for Google is that sites like these generate revenue. While Google has been strict with preventing AdWords advertisers from affecting search results - there really does seem to be a strong wall that prevents AdWords from affecting organic results - Google probably should recognise that part of the reason for duplicating information is to gain rank, so that visitors will be exposed to Google’s own advertising system. AdWords isn’t directly affecting search results - but the publishing system is.

Conclusions

Personally, I’m going to continue rejecting lightweight comments that lead to splogs or splog-like sites. This is more work than I’d like, but until de-spamming tools can recognise and mark splogs in links, I’ll do it by hand. This slightly increases the value of this blog to real readers.

Since the point of splogs is to take popular (or possibly popular) articles and republish them, in the hopes that the search engines will reward them, it looks as though the main cause for their rise is search engine ranking systems. And that’ll be the reason for them to disappear again. So the main activity for splog reduction, which will in turn have an effect in reducing comments on real blogs that are merely part of a SEO strategy for sploggers, will have to come from the SE’s.

I will continue to eject splogs from my clients’ Content Networks, by using site exclusion. Conversion from splogs is the lowest that I have measured. (Yes, I have now seen a conversion from a domain park).

Google could reduce the frequency of duplicate content entries in listing by a filter that requires a proper reference to a site with similar content. This would make the “lowest possible effort” Get Rich Quick site builders to have to do some work. Since the original author should be the earliest article, this should end up that *mostly* the reward for duplicating articles will be removed. Will this be hard to do? Yes. But that’s what happens when you rely on a citation model for pagerank - Google have made the bed…

Google should demote the rank of blogs that promote AdSense content *visually* before the article. I’ve yet to come across a blog that is valuable to users, that devotes the top 600 pixels to advertising. If you can find me some counter-examples, I’d be delighted to see them! :) This will increase CTR in the Content Network. Evidence - none. But I do have experience of working with large clients and building websites; when people linger on a site rather than bouncing in a few seconds, there’s a higher chance that they’ll be engaged and will click on something else important to them.

I’d like to say that one of the clues to a good blog is whether the articles have substantive comments (not just pingbacks) - but if scammers knew this was important, then it’d be enough to generate loads of comments (automata for this would be *easy* to write and hard to detect). And “substantive” is terrifically difficult for software to detect.

Controlling the quality of content match sites will have a beneficial effect for Google, eventually. Right now, they are a junkie. They need a fix of advertising and Wall Street expects continual growth. However, adding new publishers who simply duplicate content, often without attribution, simply makes crawling the web harder and fills organic search results with useless listings. If Google is strict about reducing spammy publishers, then search results in Google will improve, and revenues will at least stay in the same range… and it is likely that overall CTR in the Content Network will rise as users spend more time on useful sites.

Perhaps Akismet could be improved for comment spam decision making if it included the first-seen referrer_info for the commenter. This would make it easier to work out why a commenter is submitting comments - I’ve no problem with marketeers promoting their sites here, if they make a substantive comment, based on solving real problems. I just don’t like degenerate freeloaders. Call me biased.

Update

2008-02-08: An older article has an interesting and recent comment by John Nagle, about a system to reduce exposure to spammy sites. Personally, I don’t think it will work for most advertisers. IME, the sheer quantity of low volume sites means that such a system will expend a lot of effort to deny sites that wouldn’t re-appear anyway. Put it like this… In a few tens of minutes I can sign up to new free domain host, and be publishing a spammy site, with an aged domain. However, getting traffic from each advertiser, sufficient to trigger a rejection, is a slow process - the site will make money. I believe that while making new, ranked, spam sites is faster than detecting and removing them (for all advertisers - not just the one) the problem will continue to drag down all content advertising.

"Spam in Comments, Unattributed Content" was published on February 7th, 2008 and is listed in adwords, content match, spamfighting, microeconomics.

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Spam in Comments, Unattributed Content: 8 Comments

  1. Automating Content Network Management - Part 1 | Merjis Internet Marketing Blog wrote,

    […] For simplicity, lets double the target ROI click volume… In this example, we must see 200 clicks and no sales at all in order to decide that this site is unsuitable *for this offer*. It may be suitable for a different offer, of course… Because Contextual Matching is fuzzy, irrelevant advertising is a frequent hazard (look at the screenshot of a poor match to a page resulting from a search for “akismet-admin”, offering Windows XP Registry tweaking - completely irrelevant). […]

  2. Gradiva Couzin wrote,

    i like post. thankyou. link:http://www.nephrop0rn.com/

    JUST KIDDING! hee hee. Seriously though, I’m wondering why you don’t use a captcha on your comments? Or, perhaps you could do one of those Q & A text fields. Often, they ask something like “What is 8 + 2?”, but you could even ask something about your industry. Perhaps, hmm… something like: “What is the correct spelling of the phrase Mat Cuts?”

    -Gradiva

  3. Jeremy Chatfield wrote,

    Hi Gradiva -

    I use Akismet, and it catches up to a thousand spammy submissions every day, of which I habitually check a few dozen a day, to look for new trends in SEO and products being promoted.

    IMO, obstacles to user interaction reduce the conversion rate. So Captcha reduces the likelihood of valid user interaction, as well as decreasing spam. I’d rather kill the spam and increase the ease of user interaction. Akismet better satisfies that goal.

    Also Captcha should be accessible - and when I looked at Captcha a few years ago, most implementations were not designed for accessible use. Google does have someone looking at accessible Captcha, though.

    I know that Matt Cutts’ blog uses a computed response field - but I can imagine natural language parsers that should be able handle the arithmetic, if the blogs this protects become valuable enough.

    For cost effectiveness, most current blog spam will be duplicated. Unique comments from unique accounts and unique IP addresses are harder to generate, and way too many blogs have absolutely no protection. The result is that most blog spam is highly redundant. IOW, the evolutionary pressure to evolve more complex responses is currently low - so there’s not much financial pressure to solve written arithmetic problems, and duplication makes tools like Akismet more effective.

    I’m less concerned about the impact on my blog and my blog admin, and more concerned about the effect on my clients’ advertising and SEO efforts.

  4. Marketing Articles wrote,

    Why didn’t you put captcha on your comments? And you’re talking about spam comments? What are you trying to show here?

    -Jan

  5. Jeremy Chatfield wrote,

    I don’t use captcha because it, and other techniques are intrusive. Making life harder for real users, while failing to significantly add costs to spammers is the wrong answer, IMO.

    The elegant design solution is to make life easy for real commenters, and preferably computationally hard but at least uneconomic for spammers. I use Akismet, which comes closest to meeting that design goal - it offers no obstacle to real commenters, and stops spammers from using this resource.

    There was no single point - there were a group of points, all related to spam, the reasons why there is spam, and the effects that it has on internet marketing.

  6. Maurice (TheCaymanHost) wrote,

    I have to agree with other comments on this thread - using a captcha of some kind is the only solution I have found to prevent the bulk of spam comments. Used in conjunction with Akismet plus the Simple Trackback Validation plugin and forced moderation keeps me pretty much spam free.

    With the introduction of a lot of commenting software over the past few months, particularly those that seek out “DoFollow” blogs, I think these lines of defence are essential.

    Captcha’s will only normally deter the drive by commenters who you don’t particularly want anyway - if people are keen enough to contribute, a simple captcha does not make it harder to leave their thoughts IMO.

  7. Jeremy Chatfield wrote,

    Some stats: Akismet has caught around 60,000 spam comments so far. I’ve personally marked around 100 comments as spam, when Akismet couldn’t decide. One person made a real comment and followed that up with spam. I’m still not seeing a compelling reason to add Captcha.

  8. search engine marketing wrote,

    The benefits of free advertising is that they not only provide you the thrill of seeing your efforts go Online, but at the same time they also give you a more stable and permanent position in the search engine rankings. Even though the results of free search engine advertising are comparatively slower than the paid searches, they are more effective because these results are permanent.

Leave Your Comment

Is this article any good? What helped you? What made you think it was wrong? What else would you like to know or discuss?

Merjis Internet Marketing Blog is powered by WordPress and the YUI-Mainstream Theme by Buzzdroid.com