You're likely to have heard of the ongoing controversy surrounding browser cookies and their use. Browsers are gradually phasing out cookie technology and they are heavily regulated by privacy guidelines such as the GDPR or the CCP. This is a significant step towards a privacy-oriented internet. However, it is also affecting the core functionality of many websites, their UX, and the economy of the internet. Although the end of the browser cookie is certain, there are other web technologies like ETags available that can be used to store information and for tracking.
Web caching is basically storing web data on your device so that the browser can use it again later. The server will send the entire page back to the browser when a user loads a page from the web the first time. The browser keeps the page cached so that the next time the user requests the same web page, the browser will remember it. This means that the server doesn't have to send it back again and the page can be displayed right away from the browser cache. This makes it much quicker and saves bandwidth. Caching technology improves web content delivery speed and reduces server-side work.
ETags play an important role for caching content. ETags are IDs that servers attach to any resource the provide, like for example images. A new ETag ID for a resource is created when it changes on the server. This allows the server to determine if the user has saved the latest version of the resource.
The process is actually quite simple. This example illustrated how ETags work:
1. User requests a website on Wednesday. There is no ETag present in the request and the user gets the site with ETag 4711 and the site is stored in the browser.
2. On Saturday, the user requests the same site again. This time, the ETag 4711 is included in the request. The server checks whether or not the resource has changed, i.e., the ETag of the resource on the server is still 4711. If it has not changed, the server instructs browser to use the site that was delivered on Wednesday. This process is very efficient for saving bandwith and speeding up the content delivery.
How Cache Technology is Used To Track and Identify Users
ETags are there to store information related to caching. But they can also be misused to track users. Here's how this works:
1. Add the same iFrame to different webpages. The iFrame simply consists of a 1x1 pixel that is completely invisible to the user.
2. Create a random ID when the iFrame resource request is made. This ID is used to override the ETag ID of the iFrame that is normally issued automatically.
3. Now every time your ETag ID is included in every request for one of the three pages. Check on the server side to see if this ID is present or if it's a new request that has no ETag.
- If ETag exists you've got a returning visitor. Send the same ID back.
- If ETag doesn't exist your have a new visitor and yor create your ID Create a new ID. This ID will then be added to all request headers from now on.
How to Prevent ETag Tracking
There are a few ways that users can protect themselves from ETag tracking.
1. Turn off cache in your browser settings. This can have negative side-effects like making repeated visitis to pages slower.
2. Modify headers using a browser addon. A user can override the If_None-Match header and leave it blank for every request. This will result in a new ETag value being generated for every page request. This prevents users' devices from being identified.
HTML5 supports many structured data storage options on the client-side. This category includes localStorage and Indexed DB.
Local storage allows objects to be stored on the client-side using a similar mechanism as cookies. These objects (key-value pair pairs) are permanently stored and will persist until they are deleted by the website or user. A single object can be as large as 5MB. Data is bound to the respective domain: there is no sharing between different domains. But there is a solution to this problem: postMessage.
The postMessage method allows cross-origin communication between pages and iframes embedded within them. Post-messaging functionality allows data sharing between documents in different domains, while remaining secure.
You can embed an iframe in all your domains and use it to save data in localStorage. All domains will then be able to access the same storage via this iframe.
But some browsers like Safari prevent third parties from setting up and reading storage, regardless of whether it's a cookie, Local Storage, or another.
You can bypass this "First-party-only" restriction by redirecting the user directly to the third party website. Because its content is in the first-party context, this intermediate site can set and read cookies.
Next, the user will be redirected back to the original site.
HTML5 Session Storage works in the same way as Local Storage. However, stored objects are only available to the current browser window. They are then deleted when the browser is closed. Both Local Storage and Session Storage can be considered part of the same standard.
IndexedDB is a NoSQL database that is integrated into the browser. It's much more powerful than Local Storage but it is slightly more difficult to use than Local Storage or cookies. Different domains cannot access the same databases. However, the postMessage method can also be used for sharing IndexedDB data across differet domains - with the same restrictions.
There are tracking mechanisms that do not need cookies. However, it's important to emphasize that the use of these mechanisms also fall under the GDPR. If you use these techniques and don't ask the user for consent it's illegal, but they are difficult to spot and to block. It's not unlikely that methods like ETag tracking are likely to be rediscovered by the frightened advertising industry, which is witnessing one of its cornerstones fall in the neat future: the cookie.
GDPR Compliant Data Collection
Use GDPR compliant tools to collect user data. Stop worrying about legal implications of Google analytics and other similar tools.