Effective Internet Marketing Strategy and Technique Through Experiments, Measurement and Audit

Click Fraud Reconstructed - keyword search

I’ll state it clearly - we do not claim to be click fraud specialists. Our research into improving internet marketing, part of which is given here, illuminates click fraud mostly as a side effect of trying to identify user behaviour on web sites, in order to make sites work better for users and advertisers. There are probably some types of click fraud that we simply can’t assess - we don’t have enough clients and enough overlapping clients to have a statistically valid sample of all of Google’s activity. We’d need thousands of client accounts… we aren’t that big, yet!

What types of click fraud can or do we identify? We do a lot of web server log file analysis. The type of fraud we can find in web server log files, appears to vary significantly with the type of targeting system. This article focuses mostly on keyword search - we’ve a lot more to say about content match, later, and that will go into at least one more article.

For keyword search, the most important types that we can identify or that we’ve been told about are:

  • Competitors - consuming budget
  • Resold clicks - search adverts sold on to third party sites
  • Search engines - unsuitable searches

For contextual targeting (content match) the main types are:

  • Get Rich Quick - clicking adverts on sites you and your friends run
  • Search Engines - mistargeting and low quality sites
  • Spambots - automatons looking for email addresses

Does Click Fraud matter?

It depends on the type of click. If click fraud is endemic and everyone suffers from some, then it just means that the advertising channel is a lower quality. If everyone suffers from some click fraud, then all that happens (in a rational market) is that the bids get reduced and the clicks per action increase. The total cost to win a customer is the same, in the presence of click fraud and in the absence of click fraud - but only in a rational market (c.f. the Nash Equilibrium).

If evenly spread click fraud, in a rational market, was suddenly stopped, what would that do? It would result in advertisers paying more per click. The other advertisers would see that they were getting a better return, raise their bids to capture more traffic and pretty soon, your cost per action and ROI have returned to exactly where they were before - you’re just seeing fewer clicks and it takes fewer clicks before you sell.

In an irrational market, or one where the advertisers can not or do not routinely measure performance or use rational performance measures, even untargeted click fraud results in serious distortion. Eric Schmidt (Google CEO) recently opined that, although Google took click fraud seriously, it was not a major problem, because PPC is a rational market. Video of his speech, below; the point comes up about 31 minutes into the 37 minute video, so learn to use that fast forward feature, folks:

However, anti-competitive clicking, the presence of affiliates, keyword arbitrage, and prevalent advice from affiliate marketer advisors to target a specific position, and advertising without measuring performance rationally, distorts the market and means that many of the bids are overheated. This puts money in the pockets of the search engines but also helps Get-Rich-Quick fraudsters who can click fewer higher paid adverts and so reduce the risk of detection. Google would be right, if the market were rational. So - who, reading this, will put up their hands to say they act irrationally, with the company budget? I thought so - no one, eh? :)

The types of click fraud that really matter are those that happen unevenly in the market, and those about which you can’t do anything or don’t know how to do anything. These fraudulent clicks will mostly be anti-competitive or will mostly target the most expensive keywords. Those asymmetries aren’t compensated for in the Nash Equilibrium. AdWords and Overture are not primarily rational markets, yet. So click fraud hurts…

Yes, all very rational, but does click fraud matter?

Deeply, it does. Whoever does click fraud is cheating. People hate cheats. Study after study in psychology and game theory shows that people will damage themselves in order to inflict penalties on a cheat. So, whether it is a rational response or not (and many would argue that it is rational behaviour to penalise cheats), click fraud is an affront that most of us will try to prevent.

How much fraud is there?

It’s literally impossible for someone outside the search engines to state. You’d need access to a lot of accounts to get a proper handle on it. But you can infer it by using economics! You can also use the statistics that Google gives you for invalid clicks, and use your own web server log files to investigate the problem.

Here’s how you can judge the level of click fraud in a mature advertising channel, using economics. You look at the price. If the price per click is low and there are competitors, then the channel probably delivers low quality clicks - because you need a lot of them to make a sale, and the competitors know this, so they don’t bid high. In rare cases (like MSN AdCenter), you find yourself looking at a low cost channel, with high quality - that’s usually because the channel is too new to have stable bidding, yet.

Even when the market is irrational or immature, you can still get a sense of the scale, because the very worst excesses are controlled by the costs. If you had a competitor who was prepared to pay ten times as much as you, they’d better have a much more expensive product, better conversion or deeper pockets - or they will go bust. If their business model is basically the same as yours, then they will be facing similar costs and similar conversions - or there is an anti-competitive bias in effect.

What practical measures can you use to defend yourself, and what else can you do?

There’s a few things that advertisers can do to help themselves, but they involve becoming more aware of your own data, or having someone else review the data that you collect. The two main sources that you have are the search engine’s own reporting system, and your web server log files.

Standard web servers collect a history of who visits the site and what they do - the web server log file. It isn’t complete, and there’s pieces of information it can’t collect, but it does serve a great role in identifying unusual behaviour.

For keyword search, you look for important patterns:

  • Is the IP address of the click inside the target geoterritory?
  • When was the click generated?
  • Is this a known visitor?
  • What files, and in which order, did they pull them down?
  • What was the search?

IP Location Mismatch

The web server log file includes the IP address of the web browser that made the request. There are both free IP Location and paid IP Location databases, which will tell you where on the planet the IP address is located. The precision of the databases is high - often to about 10 metres (30 feet) or so - but the accuracy is much worse. The result is that you can usually tell what country someone is in, with the cheapest IP Location databases, but the city is as likely to be wrong as it is to be right.

Amongst our clients, almost all use national level targeting. So comparing where the click came from, with where we told the search engine to target, is a pretty good check on one type of click fraud - mistargeting. Here’s some data we collected earlier this year, for a UK targeted advertiser:

  • Google, at £3 to £7/click - 97% in target - much of the remainder was AOL users with a US proxy
  • Overture (Yahoo!Search Marketing) at £1.50 to £3.50/click - 95% in target, plus some AOL users with a US proxy/cache
  • Mirago at £0.5 to £2/click - 55% in target
  • Search123 at £0.07 to £0.5/click - 0% in target (yes, none at all, and few or no AOL users)

Price versus geotarget accuracy

Note how the geotargeting quality and the price offer a matching decline. This particular client now uses mostly Google and occasionally Overture, and has received significant refunds from some of the other paid search vendors (Mirago were helpful, courteous and prompt - but still had significant mistargeting months after the initial report, and the volume of uncontested search was too low to justify the management activity to recover the costs; some of the others repeatedly failed to answer questions about click fraud and mistargeting).

Click Frequency

If you click on an advert repeatedly (make it one of your own, in a test account, on an unimportant keyword), you’ll see a warning message from Google, eventually. They are watching what fraudsters and children do…

If the same IP address clicks repeatedly on the same advert, Google will no longer bill the clicks. But you will get billed if the same user does multiple searches over a period and sometimes clicks on your advert - it is behaviour indistinguishable from an absent minded user that repeats searches and can’t remember what they’ve seen.

Most telling is when the clicks arrive at a rate that can’t be matched by the human nervous system. When you see multiple clicks in sub-second intervals from the same source, over a period of several seconds, it is not a user who is double clicking - it is probably a piece of automation.

Known visitor?

The web server log files and some other configuration, can give you session information and even tell you if a visitor has returned to the site. This is usually done with a first party cookie (delivered by your web server, not a third party web server such as an ASP tracking system, for example doubleclick or Google Analytics).

If the visitor rejects cookies, then you raise the likelihood that this is a fraudulent click.

There’s several different types of tracking and cookie information, such as using JavaScript (most automation doesn’t run JavaScript) or Flash (even less runs Flash). The absence of these cookies or server requests is a flag that it may not be a human doing the clicking.

If the cookies are present, then you may be able to infer that it is the same person repeatedly clicking, even if they try using tricks to disguise their IP address.

What files?

Different web browsers will parse the web page slightly differently. This will result in a characteristic order for requesting files for each browser. If the order of download is different, or even more marked, only the HTML is requested and not the other files, it raises suspicions about the click source.

What search?

If you offer a keyword like “red lawnmower” and the search engine offers your advert to someone who searches for “lawnmower repair”, is that click fraud? Well, what about offering your “red lawnmower” advert to someone looking for “lawn care”? How about “hair loss clinic”?

We’ve done some testing of search engine matching. I’ll summarise it briefly here. It’s several articles worth or discussion and testing…

Different search engines offer different matching systems. The more relaxed you are (Google’s Broad Match, Overture’s Advanced Match) the more likely you are to stray from an exact or phrase match. The more you bid, the further the match is pushed. We’ve seen £3 bids on Overture for a financial services product being matched to “weight loss clinic” and “hair loss tonic” - presumably because the word “loss” is also part of the conceptual cluster for financial services.

Because users do not understand how sponsored links are ranked, they tend to assume that they are organised like organic links - the most relevant advert is at the top of the list. So they’ll gleefully click on the most irrelevant adverts so long as they come top of the listings. They’ll click less on an irrelevant advert, but they will click.

So, if your financial product is offered against “hair loss tonic”, and someone clicks, who is to blame? I tend to the belief that just because advertisers are prepared to offer a lot to compete in their industry, it isn’t a justification for search engines to offer inappropriate matches.

Resold Clicks

Overture, in particular, seems to suffer from this problem. Adverts can be sold to third parties by publishers. Overture has no control over the quality of those third parties and seemingly constrain it’s publishers to control them, either. The only way that you can tell, is to look at the referrer information and try to find where the advert really came from. Some companies go to great effort to disguise this, especially with content match. It seems moderately easy to find in keyword search, but it may be that we’re only finding a fraction of the activity.

Again, web server log file analysis skills needed here, to get lists of the sites that carry your adverts. Then you need to review the site to see whether it should be carrying your advert. If there’s relationship with a paid search vendor, and especially if your advert appeared deep in the site, you have to ask whether this is legitimate. If the site is a different language to the advert… well, you really do start to wonder.

Getting money back

In most cases, we’ve had money back from search engines when we’ve demonstrated bad geotargeting. Getting money back for other types of click fraud is harder.

Try convincing a search engine rep that their match algorithm is useless, and they’ll point to clicks on your advert in response to search - “those users”, they’ll say “chose to click on your advert, that shows it is deemed relevant”. “Yeah - and being prepared to pay more than the other adverts you could have run had nothing to do with it?” You won’t get a printable or informative reply.

There is no industry wide check on forcing a good match between keyword and search. There is no commitment from paid search vendors that if they offer your advert to inappropriate outlets, that they’ll recompense you. This is probably because, IMO, most agencies can’t detect the outlets (they don’t do the web server log file analysis that reveals this info) and they don’t do the keyword concept matching (because that requires some kind of work in areas like Artificial Intelligence (AI), way outside the competence of most marketing agencies).

Google has a click fraud investigation team. For Yahoo, we use the account reps. Mirago has click fraud investigators, who you reach through the account reps. Search123, Kanoodle and the like - we’ve not found anyone who cares and we’re reciprocating by not using them unless specifically directed to do so by a client.

We’re developing ever more sophisticated techniques and technologies to identify paid search vendor misbehavior - but we’re a small, research lead, agency. We don’t exert much pressure on Google and Overture and the others to clean up their acts. While people continue to buy low quality clicks, and let the paid search vendors offer whatever quality they choose, you’ll have click fraud of multiple types. We do have some ROI and AI based tools for improving content match, but the reasons for developing those won’t become clear until the next articles in this series.

Recommendations

Google and Yahoo!Search Marketing seem to be on the right track, but have some weaknesses and need monitoring and pressure to maintain standards. Before you spend any more with another paid search vendor, isolate a definite problem and see if you can find anyone who’ll accept the problem and try to resolve it. If you can’t find anyone, and your case isn’t handled well, I think that tells you a lot about quality control of the traffic you are buying. Buy some traffic elsewhere, where they care.

Next up

We’ve been doing some work on invalid clicks on Google’s various targeting systems…

"Click Fraud Reconstructed - keyword search" was published on October 23rd, 2006 and is listed in marketing, google, adwords, click fraud, web analytics.

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Leave Your Comment

Is this article any good? What helped you? What made you think it was wrong? What else would you like to know or discuss?

Merjis Internet Marketing Blog is powered by WordPress and the YUI-Mainstream Theme by Buzzdroid.com