Clickstream Raises Questions About Online Privacy
On the first of the month, Search Engine Land’s Danny Sullivan (opens in new window) alerted the SEO community to apparent accusations from Google that Bing was copying its search results. Although the accusations caused many to question the credibility of Bing, the explanation the public received for the similarities in results may be even more frightening for online privacy advocates.
The copying allegations surfaced after a successful sting operation, in which Google artificially altered their algorithm to display irrelevant results for a select number of extremely obscure search queries. Two weeks after Google modified their algorithm, Bing started displaying the same results for those obscure search queries. To read up more on the Sting, check out the original blog post that broke the news on Search Engine Land.
Now several weeks after the initial allegations, Bing comes out with their side of the story – revealing not only their frustration with the alleged accusations, but actual insights into a variable in their algorithm which may involve the monitoring of your online browsing behavior.
As online marketers, we are all aware that the search engines’ primary goal is to serve the most authoritative and relevant results for any given search query. We are aware that there are numerous on-page, off-page, URL, and more recently social factors that need to be addressed to ensure visibility in the search engines. But what everyone may not realize is that the search engines are possibly utilizing our own online browsing behavior as a variable in their ranking algorithms.
In reference to the accusation that Bing was copying Google, Bing’s director Stefan Weitz stated,
“The word ‘copy’ has a very specific connotation, and it’s wrong. We get the clickstream. We’re going to see it. We may choose to show it or not.”
What is the Clickstream?
To truly understand Weitz’s response, and Bing’s explanation for the apparent “copying,” we need to explore what the heck the“Clickstream” is. To make it as simple as possible, the clickstream is essentially a roadmap of your online browsing history.
How Does the Clickstream Work?
Bing vehemently denies that they copy Google’s results, but they openly admit that their algorithm considers “clickstream data”, which may include activity from Google. In other words, Bing isn’t specifically monitoring what happens at Google, they are monitoring what people do as they travel across the entire web, which just happens to include Google.
For individuals that have the Bing toolbar installed or that have the IE Suggested Sites feature switched on, Bing is able to see what they are browsing.
If you visit Ebay with either of these features activated, Bing may see that. If you click a result after performing a search on Ebay, Bing may see that search. If you then click an external link to the manufacturer’s website, Bing may see that. And finally, if you were to search on Google for reviews of that product, Bing may see that.
So how does Bing actually extract information from the clickstream? It all comes down to URL factors, which we explained in our URL Boot Camp from February 2nd. Every search performed on Google, contains information about the specific search query in the URL. A search for the “The Shadow,” leads to a URL containing both “the” and “shadow”.
After performing that search, it is likely that an individual may then click one of the results. This action can also be observed using the same method. By analyzing the URL’s, Bing is able to determine what results users think are relevant.
Danny Sullivan explains Bing’s rationale for the clickstream rather eloquently in this quote,
“… Bing has crowdsourced the most popular pages linked to search terms from a wide range of search engines…This search signal, which isn’t purely from Google but does contain many Google searches (because Google’s is searched on so much), is then added into the wide range of other signals that Bing uses to rank pages.”
For most “head searches,” the impact of the clickstream is negligible, but for long-tail keywords (such as the obscure search queries Google created) the clickstream may be more meaningful in determining rank. This would explain why a small percentage of the obscure search queries resulted in similar results – there just wasn’t enough information on any of the other variables in their algorithm to offset the impact of the clickstream.
It’s really quite genius, but is it too invasive? We naturally assume that the search engines track our behaviors when interacting with their product, but when the monitoring extends past their figurative property lines, is it still acceptable?
I don’t really have the answer for that question, but to me, it seems a little “Big Brother-ish.” Being an online marketer, I naturally have a little bit more acceptance of these issues than the general public, and I am honestly a bit surprised that privacy advocates have not cried bloody mary after Bing’s explanation.
Think you are safe because you don’t use Internet Explorer or the Bing toolbar? I wouldn’t be so sure. Google offers their own Google Toolbar, and although they have historically denied any usage of toolbar data to influence the rankings, the discovery that site speed measurements done by the toolbar DO play a role in rankings raises questions about the truth of their statements. I’d be willing to bet that Google knows more about me than Bing and Facebook combined.
Regardless of your stance on online privacy, it will be interesting to see what comes of this mess. I think both parties – Google and Bing – have valid arguments. Ultimately, the success of either party will be determined by which product can produce the most relevant results. And for that, only time will tell.