Subscribe to Malware Help RSS Feed RSS Feed - Subscribe to Malware Help. Org on Twitter Follow on Twitter - Malware Help YouTube Channel YouTube Channel - Subscribe to Malware Help by Email Subscribe by Email

Understanding Search Engine Privacy and How to prevent Snooping Part I

by Shanmuga| Tweet This | Google +1 | Facebook | Stumble It | Reddit | Digg | del.icio.us

Search engines keep records of every search, ever made in a way that can be traced to individuals. They retain users’ search data -anonymized or not- to eternity. They put web cookies on their computers that makes it possible to match search queries to their computer address, their name…if they are logged in…and possibly more depending on how much information they have shared with the particular search engine.

search engine logos Understanding Search Engine Privacy and How to prevent Snooping Part I

How do search engines collect data?

The major search engines like Google, Yahoo, Live, Ask and AOL collect data that are personally identifiable or can be made personally identifiable. IP addresses along with search query analysis and the use of persistent cookies which expires after a ridiculous period of time are the primary methods through which user submitted information is made personally identifiable.

Web cookies are small, byte sized text files placed on your web browser. It is used for user authentication, remembering user preferences and also for tracking behavior across a web site.

What do they do with search data?

The major search engines have long said that they need to retain data for a number of reasons, including:

  • to refine search results
  • to fight click fraud, Web spam, phishing, botnet attacks, denial-of-service attacks and worms
  • to deliver advertisements relevant to users’ interests
  • to respond to valid legal orders from law enforcement as they investigate and prosecute serious crimes like child exploitation;
  • to comply with data retention legal obligations
  • to launch new services

Why should you worry?

Personally identifiable information can be used in a number of ways that were never imagined or sanctioned by the person who entered the search terms. Major search engines make search data available to law enforcement authorities with subpoenas. While nobody can have any qualms about that there were also reports that even divorce lawyers are subpoenaing search engines looking for dirt on spouses at war with one another.

Risks of accidental breaches…couple of years ago researchers from AOL released millions of search queries from more than 650,000 users. Although the personal names were replaced with numbers, reporters from NY Times and others were able to piece together enough information to identify the search terms to several people. Some of the search queries revealed highly personal information including financial data, social security numbers, medical conditions, illegal and illicit activities etc.,

With an ever increasing number of services the search engines offer today they are in a position to correlate your IP address with everything you do online through their services theoretically at least. For example if your preferred search provider is Google and if you also happen to use many of their other services, they are in a position to tie your search queries collected via Web Search, Blog Search, Book Search, University Search etc., with that of your email…with GMail, the videos that you prefer to watch…with YouTube, the news you prefer to read…with Google News, the images you prefer to watch…with Image Search, the topics you are interested in…with Google Alerts and even all the words you misspell in queries…with Google Suggest. Google search boxes and Google ads built into websites including this one, can also record users’ behavior for Google.

Users creating an account with a search engine divulge more data to the firms, including search history. Some search engines also enrich personal data held on their users with information from third parties.

Data are linked far more closely to you personally and you don’t have any control over it.

According to Electronic Privacy Information Center “the United States, federal law does not provide uniform privacy protections for personal data submitted to search engines or for IP addresses. Some federal regulations (i.e. 45 C.F.R. § 164.514(b)(O)) treat IP addresses as “individually identifiable” information for specific purposes, but such treatment is not comprehensive.”

Whereas “The European Commission classifies IP addresses as personal data…The European Union Data Protection Directive requires search engines to “delete or irreversibly anonymise personal data once they no longer serve the specified and legitimate purpose” for which they were collected. Retention of personal data by search engines for more than six months is presumed to be unnecessary. Search engines that retain personal data for longer periods must “demonstrate comprehensively that it is strictly necessary for the service.” This requirement applies to IP address data, which virtually all search engines collect each time a user runs a search. The EU also imposes limits on the lifetime of search engines’ cookies – small computer files that can track users between multiple sessions and web sites. As a technical matter, every cookie expires eventually, and web sites can easily select the expiration dates for their cookies. EU guidelines prohibit search engines from setting expiration dates farther in the future than necessary to provide search services.”

The Article 29 Working Group’s April 4, 2008 report issued a set of obligations to search engine firms, including:

  • Search engines should get informed consent from users if they correlate personal data across different services, such as desktop search;
  • Search engine providers must delete or anonymise (in an irreversible and efficient way) personal data once they are no longer necessary for the purpose for which they were collected;
  • Personal data should not be held by search engines for longer than six months;
  • In case search engine providers retain personal data longer than six months, they must demonstrate comprehensively that it is strictly necessary for the service;
  • It is not necessary to collect additional personal data from individual users in order to be able to perform the service of delivering search results and advertisements;
  • If search engine providers use cookies, their lifetime should be no longer than demonstrably necessary;
  • Search engine providers must give users clear and intelligible information about their identity and location and about the data they intend to collect, store, or transmit, as well as the purpose for which they are collected

I have made an attempt to summarize the data collection and retention patterns of major search engines from information available in the public domain.

Search Engine

No of Cookies

Session

Persistent

Cookie Expiration

Data Retention

Google

1

0

1

24 months

9 months

Yahoo

1

0

1

Morethan 28 years

13 months

Live

10

4

6

3 days to morethan 12 years

18 months

Ask

3

1

2

12 months

12 months

AOL

5

2

3

13 months

12 months


Google

Cookies – Going to google.com by typing the address in the browser address bar sets one persistent cookie expiring in exactly 24 months to date and time.

Data Retention – Google says that it will anonymize IP addresses on server logs after 9 months. More specifically “we can confirm that we will delete some of the bits in logged IP addresses (ie, the final octet) to make it less likely that an IP address can be associated with a specific computer or user. And while it is difficult to guarantee complete anonymisation, the network prefixes of IP addresses do not identify individual users. We will also obfuscate cookie IDs.”

Yahoo.com

Cookies – Going to Yahoo search by typing search.yahoo.com in the browser address bar sets one persistent cookie expiring June03, 2037 that is a whopping 10,491 days Or 28 years, 8 months, 22 days. If you know about the significance of such a random expiry date, please let us know in the comments.

Data Retention – From their privacy policy: For Yahoo! Search specifically, Yahoo! has implemented its search data retention policy, under which search logs are destroyed or anonymized after 13 months from the moment when the data was collected, except where: (i) users request to keep the information for a longer period or (ii) where Yahoo! is required to retain the information to comply with legal obligations (for example, for tax purposes or in connection with ongoing litigation). “It is anonymised after 13 months. We remove portions of the IP address and personally identifiable cookie IDs.”

Live.com

Cookies – Going to Live search by typing live.com in the browser address bar lets you have a mixture of session and persistent cookie blast…one from c.live.com, eight from live.com and one from msn.com. A c.live.com cookie expiring after 3 days, one live.com cookie expiring after 3650 days (10years), two live.com cookies expiring July 21, 2015…2503 days or 6 years, 10 months, 9 days and one live.com cookie and one msn.com cookie expiring on Jan 01, 2021…4494 days or 12 years, 3 months, 20 days!

Data Retention – Microsoft says it will make data from Live Search queries anonymous after 18 months, unless individual users want them to be stored longer. According to the company, “we will permanently remove the entirety of the IP address and all other cross-session identifiers, such as cookie IDs, from the search terms..”

Ask.com

Cookies – Ask.com sets three cookies, two of them persistent expiring exactly after 1 year to date and time.

Data Retention – Ask.com privacy policy states that “we disassociate your query text data from your IP address after a period of 18 months, except in limited circumstances, such as if requested to retain the information by law enforcement. When AskEraser is enabled, all information described above is removed from Ask.com servers within a number of hours.”

Aol.com

Cookies – search.aol.com drops 5 cookies, three of them are persistent expiring in 13 months.

Data retention – AOL retains search requests for 13 months and after 13 months only aggregate search terms are retained.

These cookies are collected without logging to an account and visiting the websites by typing the address directly in the address bar.

A thing to note about these expiration dates of cookies is that they will automatically renew their lifespan if any of the particular search engines’ services are used during this period.

Privacy experts suggest that you need to understand what might happen to information about your search terms before you start searching. How would you choose a search engine…based on what it stores -or doesn’t store- about you?

The next and final part on this topic will contain tips to prevent search engine snooping.

References:

You may also like to read



{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: