Effective Internet Marketing Strategy and Technique Through Experiments, Measurement and Audit

Malware Detection Breaks Web Metrics?

The Register has an article about web analytics problems caused by the malware detection of an anti-virus package. This may have implications for advertisers and SEO, too. I have not downloaded and tried this anti-virus package, yet. I didn’t see any white papers or clear explanation for the AVG LinkScanner technology on the AVG site. Come Monday, when I sit in front of Mac with a Windows license, I’ll investigate further.

If Cade Metz is correct, then this software might be causing some interesting problems for Google and Google’s customers and possibly even for SEOs. Why? Because every time a customer of Version 8 of the AVG Technologies Linkscanner tool does a search, the tool checks all the links on the returned results page. It does so by emulating user behaviour. That means that web server log file based web analytics packages may have problems, because the interaction looks like users visit and mostly bounce immediately. The volume of apparent users would increase, and all the increase would be associated with increasing bounce rates. I can imagine spending a lot of time working on client web sites to try and solve that conversion decrease problem - only to discover that my time was wasted by this tool.

I can understand why AVG Technologies have gone this route - but I think there’s better ways to implement what is needed, than this. The current implementation apparently causes a lot of other problems. Let’s have a look at what the problems might be - admittedly this is speculative, until I actually test the wretched stuff… No, I don’t deeply trust reporters to get things completely accurate. I’ve been quoted in news myself, and so have some of my colleagues and clients; I know how the quest for a story can lead to implications that weren’t in the technical origins of the item. That link for Cade Metz above takes you to a page that says he’s pretty good and trustworthy, though.

Things That Jump Out As Problems

Apart from the suspected clicking on paid search adverts? Well, this software visits sites and tries to look like a real user. So if Google doesn’t have a threat signature for this activity, it’ll feed AdSense adverts to those pages. You won’t get clicks on those adverts - driving CTR down - but this will affect all advertisers, so you should not *relatively* suffer. But there will still be ugly questions asked of Google about decreasing CTR and to the web marketing team about why the latest innocuous changes cause response rates to collapse…

And, of course, the publishers of CPM adverts (e.g. placement targeted adverts) will pay for every thousand impressions - even if they are just bogus web page loads generated by software. So advertisers will have to expect an increased vigilance and the possibility that Google might be missing a chunk of fake clicks. Now, while Google could find a signature for this (ten or twenty requests for links off the same page), advertisers can’t - they just get an extra hit each time there’s a search results page on which they appear. Advertisers have no clue that every one else on that page also gets a hit… which means that Google could hide the problem and just reap higher impression rates and more clicks.

So the key advertiser problems are possibly clicking through on paid adverts and that might cause advertisers additional clicks that should be identifiable as invalid, and additional CPM payments for content match on clicked through adverts.

Tough On Threats. Easy On You. Looks Like Malware.

AVG has focused on servicing user needs. Normally that’s a good thing. Google also focuses on user needs. One of the golden goals of marketing is to satisfy an unstated need. “Needs” are crucial to effective marketing.

However, destroying the value of analytics systems, and causing additional advertiser costs and provoking web management teams worldwide to go into a tizzy over declining conversion rates and increased bounce rates… could be seen as less than helpful.

This smacks more of an unwillingness to look at the problem properly, or ignorance, than of malice. However, the effects on advertisers, web analytics providers, Google, hosting services and so on - well, that’s quite a substantial group of offended suppliers.

This section title, BTW, is derived from AVG’s corporate tagline… I thought it was mildly humorous, anyway, in a satirical way.

Better Ways To Build A Mousetrap

I’m a fan of Akismet. This is the tool that protects comments for this blog. Akismet has a central repository of known bad comments. When a comment is submitted to this blog, Akismet looks for the same comment in the repository. If found, then the comment is tagged as spam. Frequent and worthwhile commenters get passed by Akismet. However, I sometimes get new comments in my moderation queue from people that the software wants me to review. If I approve the comment, then it is added to the central repository as a positive for that commenter - but if it is marked as spam, then it joins the other spammy comments in the DB. That way the community gets to check and rate, and not everyone has to look at every comment - only those freshly exposed to new comments and commenters are asked to evaluate. It is an effective tool - though I can think of techniques that might undermine it. Of much more than 10,000 comments tagged as spammy, I’ve personally been asked to review less than a hundred. That’s an insignificant ratio and effective protection.

Using a similar technology might slow lookup for AVG (one query with ten items to a central DB, followed by some number of visits for sites with insufficient records; versus at least ten visits per search query) or speed it up - I haven’t done the math; it isn’t my product and I’m more concerned to make sure I have usable analytics. With my InfoSec-stuff hat on top of my AdWords and Internet Marketing hats, I’d prefer something that would allow a statistical sampling of a site by a range of browsers from different IP addresses, by means that reduce the load on web servers and minimise the perversion of web analytics.

Things That I Need To Check

Size of the problem… Cade Metz estimates up to 20 million users of AVG. AVG claim to be fourth in size on the AV market. In reasonably mature global markets like AntiVirus products, I usually assume that fourth typically means single figure percentage - somewhere in the 5-9% share range. That may not sound a lot, but 20 million seats is still a good sized business. If all 20 million used this link checker, then for search usages, it looks like 200 million or so users. That’s noticeable, especially given that for many businesses search (organic and paid) is a substantial fraction of all visitors. So this doesn’t look like a small problem - but I do want to invest a little time in confirming those numbers from some recent market share data and something that gives the market size. If it proves that AVG only have 1% of the market and less than 10% of all users use AV, the problem is a non-issue. :)

Cade Metz didn’t clearly state whether the AVG link checker executes JavaScript. If it is a browser plugin, I can imagine that it might hook to deep layers that allow it to look at the results of executing JavaScript. That would mean that it could also submit the image requests used by JS based web beacons/Page Bugs as part of its’ investigation to discover malware on the target site. So even Google Analytics, NedStat, CoreMetrics and Omniture would not be immune from perverted statistics. If the LinkChecker doesn’t check for images fed by JS based Page Bugs, then it misses a source of possibly compromised image files - so good InfoSec practice should be to check those servers, because if I were malicious, that’s where I’d hide a payload.

It isn’t clear from the article whether this LinkChecker does anything with Flash. Web Analytics and user tracking with Flash Cookies are increasingly popular - users mostly don’t know about them, and web browsers don’t have mechanisms to clear them, unlike ordinary cookies. If this malware checker is to be effective, it probably should be looking at Flash Cookies, as I can imagine that these might be used as an attack vector. So even Flash Cookie based web analytics could be affected.

It isn’t clear from the article whether the AVG LinkChecker *only* looks at organic search results or whether it also clicks through paid search links. If it does, then advertisers will see unusually high CTR from users with the LinkChecker installed, and will see conversion rates decrease and conversion costs increase. This is… undesirable… it’s a form of invalid click, normally a result of some type of malware!

However, if this tool is supposed to defend against malware, then the programmers that make malware can adapt by using low cost adverts - because if the LinkChecker *doesn’t* check adverts, that’s where anyone with the wit to write effective code will hide the payload. Duplicate an OK site under a new URL, submit 2 cent adverts and hope to pay 1 cent if the volume and CTR can be made high enough. Takes a bit of work, but I can imagine doing it.

That’s a bit of a nasty problem. If you don’t check the advertised target site, then you might offer malware loaded sites to users. If you do check, you increase advertisers costs and increase the accusations against Google that it is a rip off. Pragmatically - AVG is responsible for the implementation - if they click on my clients adverts with no intention of purchasing, and cause cash to flow to Google… if the pattern is visible to Google it should be an invalid click and no cost should be paid by my client. So AVG’s implementation has a cost implication to Google (tracking and denying this stuff costs), and an indirect cost resulting from further increased mismatch between Google Analytics and Google AdWords, thereby increasing fears of click fraud.

More subtly - what is it that triggers this code into doing a malware check? Is it the name of the site (so have AVG hardcoded all the Google domain variants?) and how do they recognise stuff like a Google Custom Search Engines? What about the Google Search Box on various sites - do they recognise those results pages? What about Yahoo and MSN Live? My guess is that if this has been implemented to work as widely as possible, then they’ll be looking at the URL parameters (tags) to see whether they look like Search Engine tags, and perhaps coupling it with some kind of wildcard (regex) matching for a built in list of major search engines. If they haven’t then they are missing at least 30% of search activity.

If they do check on signatures of search, then they may also catch some non-search sites - e.g. CMS and product catalogue sites and in-site searches, identifying links on those pages as being worth checking for malware. Oh dear.

We do have some tools for checking web server log files and we have clients with multi-GB of compressed log files per day… So I can check for the signature and get some clue as to the magnitude of the problem. However, chances are that AVG has focused on specific segments. If those segments don’t overlap with my clients segments, then I will underestimate the global impact. If the segments overlap significantly, then I’ll overestimate the effect.

Actions?

Apart from firing up Windows on Monday, I’m going to write to AVG. If their customer base is the size they claim (fourth largest AV solution) then this malware tool is likely to account for something in the range of 25% to 50% of search engine traffic from users with an active AV installed. That’s possibly quite a lot.

If The Register article is based on anything real, then this could have a significant adverse effect on metrics. Even worse, the effect will involve an escalating number of users. You can’t just apply a fixed offset (e.g. “subtract 5%”) that works for all time - it’ll need tweaking as the AVG customers upgrade. I need to check some log files on Monday and get some idea of the impact.

This may change some of the SEO and conversion improvement activities that I have ongoing. At least until we have retrospectively cleared recent months analysis of user behaviour. Some analytics packages can’t (won’t) do this. For example, a Google Analytics filter added today to remove the signature, takes effect from the time it was written, not retrospectively. So my client stats only work properly from today forwards, and until AVG change the signature.

Conclusions

The web is getting complex. Complex enough that efforts intended to achieve protection against malware can be interpreted by a different community also using the web, as malware.

More investigation is needed to see how this product handles JavaScript, Flash, other search engines and paid search adverts.

The original report suggests that this will make a difference to some advertising metrics - mostly making it look that there are more searches on at least organic rankings than before, and that a very low percentage of these will convert. This could mean increased advertising costs, until Google add an invalid click pattern. Google has to do this as individual advertisers will be unable to see the coordinated clicking on Google. It would be good to hear from Google (Analytics Blog, Inside AdWords blog, Ghosemajumder’s blog) how malware like this is handled.

If Google handle this properly, and the impact is as large as it seems, advertisers should be looking for an increased invalid click and impression rate - possibly both on content match and search.

The link checker could imply increased CTR and decreased ROI, by reducing the likelihood of sale in response to a visit.

I’m still thinking about this one. It could account for some stuff that I’ve seen on some client sites over the last month or so. I’ll just have to wait until Monday to get a good start on it. I’ll try to flag any updates…

Related Links

WebMasterWorld thread about signatures and webmaster responses to link scanners.

SEOmoz article about malware abuses of Web Analytics.

Updates

2008-06-15 - even before I publish… I’ve heard back from Pat Bitton, Head of Global Communications at AVG. I’m impressed that they are actively involved in the emerging news about this, rather than hiding till it blows over.

2008-06-16 - Fixed some typos. Edits for clarity. Heard back a second time from Pat Bitton - AVG have read this blog article. I’m beginning to think that Cade Metz should be on my watch list. Very interesting issue he dug up. AVG 8 Trial Version download now complete. Installation in progress.

2008-06-17 - Installed AVG 8 Trial. AVG 8 offers a toolbar with a Yahoo!Search default. Ran a few tests. LinkScanner shows site check mark in MSIE for Google, but doesn’t for Yahoo. Why would the preferred search engine not have link checks? Odd. Is this a way to penalise Google in some wierd way? Or an oversight to neglect checking Yahoo? Must do some more thinking about what is happening here, and some more experiments and reading of log files.

"Malware Detection Breaks Web Metrics?" was published on June 15th, 2008 and is listed in adwords, click fraud, web analytics, malware.

Follow comments via the RSS Feed | Leave a comment

Leave Your Comment

Is this article any good? What helped you? What made you think it was wrong? What else would you like to know or discuss?

Merjis Internet Marketing Blog is powered by WordPress and the YUI-Mainstream Theme by Buzzdroid.comBoosted by FeedBurner