Effective Internet Marketing Strategy and Technique Through Experiments, Measurement and Audit

Twelve ways, plus a bit, less a few.

Interesting click fraud chorus starting up here. This thought provoking piece, The 12 Ways of Click Fraud by Mike O’Krongli, is a response to the Andy Beal/Shuman Ghosemajumder articles. See also Andrew Goodman’s piece and his other one.

I’m not entirely convinced by all of Mikes’ arguments. In the interests of trying to refine the message to Google from advertisers and agencies, I’ll try to tease out the bits that I think are right and those I think are misguided. My opinions only… so if I’ve done any work and experiments, I’ll try to make it clear, otherwise anything I say is just guesswork. With the data that Google give you to work on, it is and can only be guesswork. That’s the biggest part of Google’s problem, but I’ll get back to that bonnet-buzzer in a bit.

Shuman Ghosemajumder’s article is also slightly specious. The key criticism that I have is that, for an advertiser, a fraudulent channel offers a poor return on investment. This is nothing like the very dry and technical description of fraud used by Google. ROI is demonstrably the way that at least some vocal fraction of Google’s advertiser base thinks about clicks and fraud. Google’s definition is appropriate for themselves, but isn’t a practical definition for a small advertiser.

So, to Mike’s 12 ways and let’s look for the nuggets.

1/ Earnings. Not a problem for me. We have some clients for whom Google estimates a number one position on specific keywords as being about $90/click. This client has seen the cost to appear on page one, increase from around $5 in January to around $20 in 11 months. The number of different advertisers is larger (by memory - no data collected) and the number of additional pages of advertisers (click on the link to just see adverts and see how many pages of adverts you get) is also longer. More competition, more advertisers, growing international market of users (not all international markets are as internet saturated as the US) - this all suggests higher revenues to Google.

See also the Nash Equilibrium. Google was trying to hire Nash Equilibrium expertise a year or so ago. And also see this lovely paper on Google’s auction which explains why Panama is the way it is. These both demonstrate why Google is a money making machine.

I do have a problem with the auction. It is asserted by Google that advertising is subject to a bidding Quality Score. This allows a behind-the-scenes fiddle factor, without oversight. Is the auction fair? No-one outside Google can tell. There is insufficient evidence that can be gathered by a single advertiser to judge. A group of competing advertisers could tell, but that would require that competitors share sensitive data… Unlikely, I think. So Google wins in an asymmetric information market with players that compete and won’t co-operate. Straight games theory, or simple microeconomics.

2/ Other things to do? There’s always been other things to do. It used to be baseball and football. When it comes time to purchase a glass angel for Auntie Myrtle, the web’s the way to go and the first advert that offers a glass angel is good enough, if they deliver in time. We do a lot of web server log file analysis, looking for the ways in which users interact with advertising and the site. This shows that there are larger numbers of widespread IP addresses generating traffic, accepting first party cookies. There are a lot of real users out there buying. Additionally our clients, many of them businesses that sell stuff online, attest that they are selling more online this year than they did last year. Some reached the Christmas 2005 last-minute frenzy in mid November this year, so significant growth, fuelled by AdWords.

3/ PPC is nowhere near mature - it’ll be at least a year before it peaks and there’s no good sign of an alternative (something like “Pay For Action”). The revenues will continue to increase and should continue to increase. When you have a specific product or need, PPC can deliver more effective links than organic. Evidence, again - I have some clients who have sustained five+ figure sales off single keywords, exact matched, with a CTR of significantly more than 50%. That easily rivals organic for CTR. The ROI is such that the paid price could increase by a factor of four and they’d still have good margin… meaning that the Nash Equilibrium still has room to swing cats in some markets.

4/ Invalid clicks are germane to click fraud. In doing web server log file analysis, we divide the data into streams of candidate users. I won’t explain how. It’s boring, even for me. We can then allocate probabilities of various types of activity.

It might be “user agent says it is a spider and the IP address is a well known spider, this is 100% a robot crawling the web, for non-fraudulent purposes - wonder if we were charged for the click?”

It might be “user agent looks valid but I don’t like that FunWebProducts in there… IP is shared, probably a cache/proxy… yes it is… {muttering over other factors}… yes this is probably a real user from AOL who looked at one page and went away”.

At the end of that, you can say (entirely made up numbers - totally fictitious, does not represent a real account and with tongue shoved firmly in cheek) “Google says 20% invalid clicks on this date”. “I see 10% of requests that I can identify as advert URL re-use, 10% that I see are double clicks (probably, Google doesn’t reveal the metric but a cross correlation can give you a handle on the likely metrics), 10% self-identified bots and 10% likely spambots and 10% that are probably bots but I have no idea what they are doing and 10% real users who did nothing significant and the rest are entirely wonderful or indistinguishable from an interested human”.

Invalid clicks become an important part of the story, because the AdWords UI represents the billed clicks. Everything else is a conflation of double clicks, ignored spiders, idiot get-rich-quick AdSense publishers too stupid to hide their theft, clever fraudsters who look just like real users and real users who may or may not be interested in the page they’ve landed on. It’s all about intent.

5/ I don’t understand this section about bid rates. What is a “bid rate” and why would a declining one send advertisers looking at the long tail? There certainly is a trend to look at the long tail, but this is driven by ROI and would have been true last year and will be true next year. If you service racing lawn mowers and all racing lawn mowers are painted purple, then “lawn” will get you some lawn mower racing enthusiasts and a lot of people who have needs for rakes, sand, turf, ant killing, golf, etc. Using “purple lawn mower” will get you racing enthusiasts and people who wonder if there are people who race purple lawn mowers… A much more select group, who will give you a better CTR. The use of the long tail is an emergent property, not a crime scene.

6/ I tend to agree that the long tail is a good place to hide, but so are the mass keywords - picking an AOL user stream with varying IP addresses of a proxy/cache out of the mass using “cheap holidays”? Yuck. But if I was writing a botnet click fraud machine, it’d probably be using long keywords gleaned from web sites that it visited. Can this be identified though? If all else were constant, you could measure changing conversion rates.

If there were declining conversion rates, would that indicate fraud or increased competition? Very hard to say. PPC is a substitute for the real measures which revolve around ROI and profit. I think there are some techniques to identify the bots and the humans, but they will be pretty subtle and usable only by the largest advertisers. Actually, I’ve just thought of some honeypots and traps… Hmm. I’m not telling, but I may try to set this up on a larger account to see if I can detect anything. I’ll need to negotiate with their content management team, so it’ll be a while…

7/ AdWords Arbitrage - Erm, no. The arbitrage exists because it can. It’s not a consequence of advertisers looking at long tail keywords or searchers being forced to longer and longer searches. It’s an intrinsic property. If Arbitrage isn’t controlled, it will exist. Some advertisers will pay tens of dollars to get a high quality client connected but there are also cheaper products that the same consumer will use. If some arbitrager is allowed to put a high cost advert on a low cost site, even if it isn’t relevant (think cloaking), then my client gets low quality clicks and an arbitrager has benefited - not the advertiser and not the search user. Google is actually doing a tolerable job of destroying the arbitrage market; see the “Google Slap” and people claiming “the death of internet marketing” and other such, mostly “affiliate” a.k.a. “get rich quick on the internet” operations.

Google is reducing arbitrage through the raised MinCPC on landing page Quality Score failures. But, is the raised MinCPC doing enough? I have clients for whom $5 would be a bargain basement price for a click. They apparently still suffer from arbitrage, because it is worth doing for those products.

8/ Arbitrage does encourage multiple system use. This is why Google should be looking at using MinCPC Quality Scores that take into account the use of all affiliate advertising on a landing page, not just AdSense. I have no evidence that Google does or does not look at Yahoo!Publisher Network presence. There may be some interesting law to be made either way… Sticky area… probably why they don’t discuss it.

OTOH, an advertiser should be using multiple systems anyway. Searchers tend to stick with a search engine they know. MSN users use MSN Live, Yahoo lovers use Yahoo, and Ask users keep using Ask. So if you don’t use PPC systems that reach the users, why on earth not? General rule in marketing - Pareto - is to use the systems that get to 80% of the audience. If that means that you use AdWords, MSN and Yahoo, then that’s what you should use.

9/ Click fraud bots do have an evidence trail. But it is in the form of the instructions to the botnet. That means doing packet analysis on a wide scale. Not sure how to make this happen. It’s certainly beyond my competence to fund. :)

10/ The Cult Of Secrecy. Agreed. I’m going to expand and expound on this at the end.

11/ Paying a publisher… So many points here. It *is* possible to get the CTR and conversion rate from Google properties alone - that’s the “Google Search Pages”. You can duplicate the campaigns and adjust the bids to get the Google Search Network (third party publishers taking part in keyword search) and another campaign for “Content Network” - the AdSense publishers.

Note that even keyword search means paying a publisher, by default. Some keyword search is done on Google. A substantial fraction happens on AOL, Ask, SouthWestern Bell, MSN and other companies that have licensed advertising and search results from Google. There’s even sites that do “fake search” offering pages that contain keyword search derived adverts that are shown on content filled pages and, IIRC, even some domain parks. Actually, I ought to check that, but I’m pretty sure that some domain parks like Sedoparking comes up under Google Search Network, and not the Content Network.

Even keyword search can be subject to some fraud - publisher inspired. There’s an incentive to do so.

If you split out the traffic, then you can get Google’s estimate of invalid clicks on each. And that tells you where the problems are. Inevitably, it is the content network… where the incentives for malpractice are the highest.

In Keyword Search, anti-competitive click fraud is actually the major fear. If your competitors face the same landscape of increasing bids and decreasing conversion rates (the likely consequences of more advertisers and higher click fraud rates) then the rational response for all advertisers is to reduce bids to economic levels. As the click fraud doubles, the click cost should halve. Simple economics in a rational market…

An anti-competitive click is more damaging - and it can’t be easily identified by a single advertiser and Google can’t help either (no sharing of web server log files from the advertisers, except via Google Analytics). Identifying a simple minded (click and leave) human fraudster is fairly hard - as in “not easy for most small businesses”. They can be identified, if you had the stats that showed a particular machine or network of machines clicked on a site with an unusually low purchase rate compared with the other users.

That means a hypothetical click fraud detection system can’t be an advertiser and can’t be Google. Advertisers are often unwilling to let Google see both the advertising budget and the return on that spend. Google is unwilling to share data with, well, anyone, unless it is a list of web sites. Whoever did the analysis would need the trust of Google (that the data wasn’t shared with advertisers) and of the advertisers (that the data wouldn’t be shared with Google or other advertisers). Again, beyond my competence to organise - probably needs to be international in scope, as the fraud is clearly international.

12/ Uh, can’t very well argue with that, can I. :) Thanks for the attribution!

How Does Google View The World?

My feeling is that Google is largely run by two groups of people, neither of whom love advertisers.

There’s a group of techies, often ethical purists, who want totally clean organic search lists and who do worry about morality. They don’t like advertisers because advertising is part of that nasty commercial world. I empathise with these guys. I used to a be an academic and it was very scary turning from academia to commerce…

Then there’s a group of financiers. Information is a weapon. There’s a terrific paper that I read years ago, and can’t find again, about the free market in the presence of asymmetries in information. Basically, the party with the information wins. So Google doesn’t hand out information. Information is power and money.

This causes problems. I can not easily identify a whole bunch of sites that probably represent poor value for my clients. Google offers a mechanism to exclude sites that you don’t want to carry your adverts. The only way that the information is carried is in the search users’ web browser, passed as the “referer_info” field - if the web browser passes that on. Without that field, I can’t tell the publisher. With that field, I still have a low estimate of safety in identifying the publisher, because the field can be faked (e.g. how about poisoning the field with fake ID so I try to penalise a site that I should be on?). I want a tag in my destination URL so that I can identify the publisher and the site unamibiguously.

There’s no point in offering me a facility to reject sites, if I can’t identify sites I don’t like. Even more, if I can identify the publisher, I might want to pre-emptively reject that publisher before seeing any more clicks from low value sites. I can’t exclude a publisher. If Google moves a publisher into the Search Network, I can’t reject them without rejecting all Search Network publishers.

Now, the point of Google is to be an “honest broker” offering an advertiser a quality channel to publishers and offering to publishers that adverts will meet some minimal standards. If advertisers lose confidence in the publisher network, then the asymmetries that work for Google now work against them. Hiding information about publisher quality makes Google an untrustworthy player and decreases the confidence that advertisers have in bids bringing real users.

Hiding information about simple stuff, like whether self-identified spiders from well know IP addresses are counted as billed clicks or invalid clicks, doesn’t make Google look like a clever organisation hiding information from sneaky fraudsters. It makes Google look as though the advertiser is treated as an adversary, and that Google has something to hide.

Google might not be hiding malpractice. The point is that without oversight from a trustworthy third party, or insight from data shared between advertiser and Google, there is no way to know whether Google is hiding something.

Fundamentally, the issue is one of marketing.

Perception is reality

Google has pretended that the Content Network offers a fair advertising forum. It doesn’t tell advertisers that the majority of the budget can go to the content network. The promo pull on the AdWords login page is all about keyword search. It doesn’t say, “shown on thousands of sites built by people who want to get rich quick, oh and a few sites built by genuine enthusiasts who have no other monetisation”.

That is where the primary fraud, the misrepresentation of the channel, happens. And that is what users see.

Absence of evidence is not evidence of absence

I’m one of the top contributors to the AdWords Help Forum. One of the most common refrains over there is that “Google is a scam”. On questioning, it often turns out that the advertiser has spent the majority of their budget on Content Match and achieved poor results. I believe (as a result of experiments) that the quality of the Content network is controllable and that decent results can be achieved. But not by using AdWords in the default set up that Google gives you, and not without additional information that Google does not publish.

The result is that advertisers feel ripped off. That Google itself is the fraudster.

IMO, the pursuit of wealth has blinded Google’s ethicists to the treatment of advertisers and condemned many small businesses to wasting time and money. SME’s who lose funds to badly targeted content network clicks will get no sympathy from Google, because the clicks are valid by Google’s definition - they are from living breathing users who just wanted to see what the advert was about. There was no intent to buy… because the site was poorly targeted and that fact was not explained to the advertiser, it was entirely unmentioned in the sign up process in any way that makes sense.

That’s the big Google fraud. Pretending that the content network offers as high a return as the search network, and then Google usurping the advertisers right to choose how to apportion the clicks.

I have done some minor work on click fraud levels. It appears generally low, or of types that can’t be detected by a relatively small agency. We only affect about $2M of spend per annum - a back of the envelope calculation suggests that we’d need to be about 100 times bigger to stand any chance of detecting sophisticated fraud without Google’s help.

I have a lot more evidence in the AdWords Help Forum of advertisers that have wasted substantial sums, for them, on advertising that is predictably of low sales value. I know it, before I even look at the advertisers account. Google should know it. If they don’t, well, that’s real news.

Updates

2007/04/10

We now have evidence that Google has improved the quality of content match. We have some clients for whom we have set up content match only campaigns. These are now achieving low volumes of sales at ROI’s that match or exceed the ROI from long-established keyword search. Similar campaigns, run around nine months ago, were failing to achieve any sales in similar (not identical) conditions.

We infer that Google has taken action to improve the quality of content match. We do not know what Google has done to improve quality. We need to do more analysis before we can understand why content match results are now vastly improved for these clients.

Note that the volume of impressions, and sales, is quite low. Strictly speaking, it would fail our standards for management time expended to achieve the sales. However, it is suggestive that we could construct more and more effective content matched campaigns.

2007/04/11

Google have offered a paper analysing the behaviour of a clickbot, found in the AdWords Blog.

"Twelve ways, plus a bit, less a few." was published on December 15th, 2006 and is listed in google, intent, adwords, click fraud, yahoo!, content match.

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Twelve ways, plus a bit, less a few.: 4 Comments

  1. MikeOK wrote,

    Hi Jeremy

    Thanks for taking such a close look at my click fraud article. The focus of the article was to show some circumstantial evidence for the existence of a zombie click fraud network and Google knowing about it. Even you point out some areas where Google is less than honest about the AdWords program. The real agruement hear lies in ROI. I agree that if a customer is getting a decent ROI, you don’t need to worry about how many of the unconverted clicks are fraud. In the case of a zombie networks, you can not tell if a human or a zombie is the unconverted click. This type of fraud potentially has hundreds of thousands of slaves doing the work. As you point out, you would need affect close to $200 million per year to see fraud without Google’s assistance. I could live with unconverted human views since it has everything to do with me and the site. The zombie clicks however are outright theft.

    When you can spread out the fraud over 1,000’s of machines and over multiple fraudulent AdSense accounts it makes it extremely hard to detect. My orginal zombie network article outlines how to make a semi-legit business to hide a functioning zombie network.

    Since you are so involved with AdWords, can you tell me of the largest AdSense account that Google has suspended? In my research, I have only found very small time accounts that have got banned. You would think that the large ones would have more to complain about if banned.

    Jeremy, do you think a zombie network exists or can be created? You mentioned how much you examine web server log files. Can you tell me when you consider a click through legit? Is it viewing multiple pages on the AdWord customer account? The length of time spent viewing?? These things are easily programmed. My own spider takes a web page, takes out the links, parses the text, waits an acceptable period of time and does it again by following another link. What makes that any different than what a human does?? By setting the zombie to view x number of pages on a site, leaving a random amount of time between clicks, could you tell that it was fraudulent? Would it even raise a flag? With the number of people trying to game Google on a semi-legit biases, why would a click fraud network not be created? We have already seen multilple accounts of human click fraud networks and these are no different. Only instead of a couple hundred workers, you now have thousands working without paying them. What a bonus.

    I feel people should remember that Google is not the first to try this type of advertising. Others who were successful for a short period of time ended up losing advertiser confidence. Google is close to a tipping point. As the amount of circumstantial evidence grows, Google’s goose might just start lying ordinary eggs instead of the current gold.

  2. Jeremy Chatfield wrote,

    Hi Mike,

    My degree is in Cybernetics Science and I’ve fiddled with my own web crawlers over the years, as well as worrying about the Turing Test… I’m pretty sure that I can’t detect a sufficiently well crafted bot. I don’t want to detail all the things that I’d think about, in a public article, though I suspect that anyone well versed in web analytics would have the knowledge to emulate a real human’s behaviour.

    If click fraud is fairly balanced - that is, it doesn’t selectively click on one advertiser, it isn’t really a problem for advertisers. It is a social, legal and criminal or even terrorist problem, but it is not an advertising problem. At least, not for rational advertisers.

    Imagine that someone turned on a promiscuous high volume robot network tomorrow, that doubles click volume and impression volume. By Tuesday or maybe even Monday, our clients’ bids would have halved, because the conversion metrics would be half what they were today. Eventually, in the steady state, the budget would be the same, and the Average CPC will be half what it is now. The difference would be that a big chunk of money would be in the hands of fraudsters.

    The way that you’d tell is using signal analysis. The number of data points you need depends on assumptions about the number of bots in the net and the number of impressions and clicks that you can gather. More particularly, it depends on identifying the clickers that have an unusually low purchase rate… And that means access to advertisers sales data. Something that suspicion of Google’s motives is unlikely to lead to. So raising suspicion of Google actually increases the chances of successful fraud… maybe. I’ll have to think about that a bit more.

    Right now, I have to leave for a Christmas concert… More later. Thanks for the response!

  3. CPCcurmudgeon wrote,

    MikeOK,

    You are correct in your hypotheses about botnets. In fact, there is a considerable amount of information available on the botnet mailing list and from Gadi Evron at SecureiTeam.

    Among other things, what puzzled me is how AdWords and AdSense were ever released without considerations as to the degree of click fraud possible. It seemed obvious to me that charging per click (or impression, for that matter) with minimal (if any) checks would make click fraud trivially easy. Others in the Internet technical community have made similar comments, such as Bruce Schneier and Lauren Weinstein.

    [comment edited by Jeremy Chatfield to embed the links]

  4. Automating Content Network Management - Part 1 | Merjis Internet Marketing Blog wrote,

    […] Managing keyword search and content match in a single campaign is fraught with problems - detailed in other articles, here and elsewhere. As long as three years ago, we’d recognised the issues and were already optimising separated Content Network campaigns. The problem that we were trying to resolve was that, even with separated content match campaigns, we had great difficulty in achieving a good volume of conversions at a competitive ROI. Every time we drove up the volume, the ROI got worse. Every time we drove to a great ROI, we lost volume. […]

Leave Your Comment

Is this article any good? What helped you? What made you think it was wrong? What else would you like to know or discuss?

Merjis Internet Marketing Blog is powered by WordPress and the YUI-Mainstream Theme by Buzzdroid.com