Reducing the search engine ranking of competitors, known as “Google Bowling”, is getting some air time. A brief summary article “8 Ways a Competitor Can Sabotage Your Site” describes some techniques that can damage rank and steal or reduce traffic to a site. Of the eight techniques described, seven depend on weak identity online, and three aren’t really Google Bowling…
Update
2007/09/23 This recent TechCrunch article, points to Google providing access to social information. Now, I do know and respect Patrick Chanezon, from his days as the AdWords API evangelist. And I have suggested to him, in email, that social networks need open API’s… And a lot of the issues below can be tackled when you have a citation/trust network…
Who Is The Annoying Spammer?
Precisely. Having a weak online identity means that your identity is private, but that doesn’t make it secure. A culture of strong privacy without considering the authentication of identity, means that it is easier for those with poor intent to adopt an identity - even yours.
Getting Your Domain Banned
Identity, again. Most social media don’t even publish an IP Address, which is a pretty weak authentication mechanism, but stronger than most identity proving systems on most Social Networks. In many cases, email addresses for accounts are concealed, for reasons of privacy or to reduce spam. So there’s simply no way other than usage of a name to link a real identity to an online identity. That’s obviously going to be a problem… and since an identity attack can, and probably will be, extraterritorial, most police investigations will falter at national boundaries.
Spammy Link Buying
If identity authentication for links was required, then linkspam attacks would become more difficult. However, this would require some interesting work for search engines. If an SE required that it would only believe links from a site with a credible identity chains, and identified social networking sites as different in type and only used social media with stronger forms of identity, this would overnight change the landscape - because much link weight is ultimately derived at the moment from poorly authenticated sources. I suspect that Google *could not* make this change - not without phasing in the new weightings. The uproar on massive position changes would probably be, hmm, boisterous.
Duplicate Content
This is a little harder, as it would require authentication of both identity and time of posting. But those are basic needs in any polity. While postings have an economic value, then effectively they become a currency, and a currency has needs if it is to be recognised. At present, SE’s don’t recognise and require proof of time of posting and proof of the identity of an owner. So not just duplicate content but site scraping of popular content are common practices.
301/302 Hijacking
Even harder - but also based on identity spoofing and the trusting nature of current SE’s. For this attack to work, the SE must assume that only well intentioned people use 302’s. Manifestly not true. A 302 should only be believed if the identity of the link is related to the identity of the target of the link… and the net offers few ways to make that happen with any proof of identity, other than domain records.
Denial Of Service (DOS & DDoS) Attacks
Superficially, nothing to do with identity. Except… the reason that people can get away with these attacks is the weak association of internet activity and identity. If the source of these attacks was traceable and identifiable and an attack like this was recognised as internationally legally significant, then the commercial and social impact could be reduced. I rate the chances of this being addressed as very low, for reasons I may go into elsewhen.
Kicked Out Of AdSense
Weak notions of identity authentication and an assumption of identity by Google, lead to this problem. Note that the primary identity authentication mechanisms used by Google are IP Address and Cookies. Cookies can be deleted and IP Addresses masked by using Botnets or Proxies. Failure to prevent fraudulent clicks would put Google under threat from annoyed advertisers. So Google is somewhat in a quandary without better notions of identity.
Click Fraud
Identity strikes. If you aren’t who you say you are, why are you allowed to spend an advertisers money? There’s two main motivations (direct gain, or anti-competitive activity) for three types of click fraud:
- Clicking on adverts on content match or other publisher sites to generate revenue for the site
- Clicking on adverts, anywhere, to burn up budget and reduce the flow of real users (a plausible attack on low budget businesses)
- Not clicking on adverts, in order to reduce CTR and thereby increase costs - clicking instead on other less relevant (and hence lower converting) adverts or your own adverts to skew the CTR history.
Defending against these comes down to identity and leads to some interesting thinking about the value of a click.
AdWords And Click Value
The question is… if your identity is unknown, then why show these adverts? In fact, the more strongly your identity is known, then the more that I want to pay Google to show you an advert. Just as Google AdWords uses “Smart Pricing” to adjust the value of adverts on the Content Network, I think there’s an argument for using variable pricing on the keyword search network. When the user clicks, if they have a history of activity with Google, including conversion tracking records from authenticated businesses, then that is a high quality click. If the click is from a cookieless browser with no history, then I want to pay less for that click - as it may be fraudulent.
Thinking about identity and adverts may actually yield an improvement in the internet for most users. If you want to have an anonymity, I suspect that you also want an advert-free experience. While there may be people who want to browse anonymously and still have financial interactions, the businesses dealing with that market will have their own risk assessments and ways of handling the risk of dealing with unknown and unknowable identities. As an advertiser or agent for advertisers, I would want to know that I’m dealing with a deliberately anonymised user - because the risk is different. Mostly, I’d want to pay less or even not have my advert shown.
Rather than behavioural measures, Google probably could and should be thinking about how to strengthen privacy while maintaining or improving identity. If I could send adverts, at a premium, to known good buyers, rather than unknown and possible fraudulent clickers with no history… I could probably convince most of my clients to switch to that network.
This is scaling game. Only a large enough internet entity, with a measurable fraction of the internet’s traffic, could seriously offer authenticated advertising delivery.
Will it come? Probably.
Look at the offline world and the role of organisations like the Audit Bureau of Circulation and their electronic version (ABC Electronic). These guys evolved to assure advertisers that magazine and newspaper circulation figures could be trusted. There is no equivalent, yet, in the online world to assure advertisers that clicks are delivered to real users rather than bots, or to users interested in seeing adverts as part of their browsing experience.
User identity, or rather privacy, has been a major concern on the internet. Early internet adopters certainly didn’t like strong authentication systems. They get in the way. And with early usage being largely focused on offering information from individuals who were not looking for financial gain, this was just fine. You even had characters significant to the early development of the philosophy of the internet (peripherally in this example, but it is well documented), such as Richard Stallman, operating without a password. The assumption of good intent was built in. It was the basis of early search engines.
I was using the precursor of the Internet - the modem-connected world of UUCP transfers, back in, ohh, 1978. The idea of linking identity more closely than the originating email simply wasn’t part of the thinking. Machines connected to usually physically close other machines, and you knew the identities of the administrator. That didn’t prevent email from “kremvax” - the spoofed Kremlin VAX - from appearing, as I recall. Or the meeting for the Guru of Unix Meeting of the Minds sent from /dev/null (it’s a Unix joke) that didn’t announce the place of the meeting - because a true Unix Guru would just know the location… And that model of assuming you knew the links was true of the early Internet - only a small band of researchers used it. And the larger research network was still trustworthy - because they depended on reputation management. And that was even true at the point at which Google came into being, based on a network of largely trusted machines with users who were publishing correct material in academic exercises of respect and trust.
Even after the first spam attack on Usenet Newsgroups, there was still no call to authenticate users. And this attitude has carried over beyond the point at which it was valid. Back then, the daily value of business over the internet was probably measurable in dollars - the communications costs exceeded the transaction value, by (probably) orders of magnitude. From 1994, or so, the transaction value on the web has probably exceeded the costs of the communications, and the ratio is increasing. Just like a material phase change, the pressure/temperature of higher economic activity will cause a phase change - a difference in the way in which things happen.
Now that money flows online, the assumption of good intent can no longer dominate thinking. There has to be a default style of thinking in which you assume that someone wants to grab the money. Free organic search listings have an economic value. So limply waving your hand around and saying that authenticating data isn’t needed, will become a flame-dead idea, in due course. Now? it’s controversial and I have little doubt that this posting will be derided by some readers. In ten years? This posting will be regarded as so blindingly obviously a statement of truth that it’ll be ignored ;)
Summary
Ignoring identity online was permissible and was regarded as desirable by early internet adopters. As the value of economic activity online increases, it becomes less and less acceptable to both businesses and users that full anonymity and privacy are completely assured - because that leads to abuses of identity. However, current mechanisms to authenticate identity are weak, rarely used and are unused in the area of advertising. Only at purchase time is some basic notion of identity established. This would not be accepted offline and I suspect that it will not continue to be acceptable that random strangers can defame you by stealing your identity, online.
What To Do?
Make sure your social networks connections are reputable - don’t allow random strangers to link with your profile.
Search for your own name and online identities and verify that you aren’t in a bad neighbourhood. If you are, you have little effective recourse under law and almost no likelihood of correction. Your best bet is to ask site owners to take down material, but depending on the site and location that may be slow.
Look for Search Engines and advertising systems that offer authenticated users and links. While this is a niche, the economic value and assurance that you reach real users who aren’t indulging in click fraud, should be a significant economic motivator and column inch grabbing business idea. Incumbents would find it hard to react, with the installed base and assured revenue streams that they now have. So advertisers defecting would find better ROAS, albeit at initially low volumes. I expect that a business like AOL (large user volumes, already operating search, some kind of subscription arrangement in place with many users - which means some fraction of the user base is authenticated as real and with a purchase history on at least one medium) might make the jump, but it’s more likely to be a startup that is then acquired, I suspect. Ideas like this are… disruptive. Businesses don’t like to destroy themselves, even if it means survival. :)

Kristine Buenavista wrote,
This blog post is really worth-reading. I have felt more educated about Google. It’s a kind of learning that I don’t get too much from other readings so far.
Thanks for sharing. I’d bookmarked you.
Link | January 4th, 2008 at 2:07 am
Matt | Advertising Course Internet Marketing Online wrote,
Great post, thanks for opening up my eyes to the kinds of potential problems are out there! Bookmarked!
Link | February 27th, 2008 at 8:21 am
Brisbane SEO Consultant wrote,
I was under the impression that Google mostly ignores stuff that your competitors can do to your site like spammy link buying and links from bad neighbourhoods.
Link | July 14th, 2008 at 1:02 am
Jeremy Chatfield wrote,
Hi - depends on the techniques that are being used. 301/302 hijacking isn’t strictly a Google problem but a DNS problem. Spammy link buying has been progressively frowned on - but that doesn’t mean that theidea of link buying is dead - it means a shift to using techniques that are less visible. Smarter Black Hat SEOs are already using signal analysis techniques that increasingly mimic natural user behaviour, but with bots to plant links. The more like real users the behaviour, the harder it is to tell.
This is essentially an operation of th oft-touted “Turng Test”, to determine whether a real human or a masquerading AI is on the other side of a conversation. When all you have to assess the other party is responses, at some point the messages from a sufficiently smart correspondent become indistinguishable from those of a real human. Hide your signal in thenoise of normal looking transactions and then Googlehas been gamed - until the rules are changed.
If you can track my comment back to my birth certificate, you have a strong chain that this posting was made by me, and not a sufficiently well informed AI. :)
What I didn’t talk about - because this was a response to someone elses’ article about Google Bowling - were the amny other techniques that I’ve thought of, or learned from SEO friends.
Link | July 14th, 2008 at 8:49 pm