Tracking pixels, tags, web beacons… Do you really think you control your own data?
pujoljulia May 28, 2016

In my last post I’ve detailed the main aspects of the cookies, their capabilities, their limitations and how manage them. This post was mainly targeted to folks like us, digital marketers that deploy tools through Tag Management Systems that could potentially use the cookies capabilities to improve the tracking strategies for extract better insights.

There is another kind of guys, I could say pretty much similar than us, that instead of collecting data and using cookies to improve the site performance, they’re collecting data, and so use the cookies, to increase the number of visits of the web sites. Both things are really important since we could have the most optimized and beautiful site and no one knowing about it (fail!) or we can get thousands of visits but 0 conversions (fail, fail!).

However important these thing may be (they are!), as a manager of a website, you should know what is going on in your site in terms of Digital Marketing Technologies, and by ‘Digital Marketing Technologies’ I’m not talking about their names (AdWords, Adobe Analytics, Optimazely…) and what they apparently do, no no, I’m talking about what they are doing behind the scenes. As known, marketing technologies are usually deployed through some pieces of code (tracking pixels, tags, web beacons…) and you should ask some questions to the technology provider before add any of these pieces of code:

1. What data the code picks up?

2. How many cookies/storage technique the code creates?

3. What are the cookies/storage technique purposes and when expire?

4. What it’s gonna happen with my site data and cookies/storage technique spread among my users once finished the contract with the technology provider?

Without these points clear, do you really think you control your own data? Well, if you do not know what these codes are doing behind the scenes I would say NO, you do not control your data at all.

Nowadays marketers are adding hundreds of pixels on their websites without even know what they do. Almost always the codes’ purpose is just to track conversions or collecting data to then improve the campaigns’ performance or analyze that data… but, are you sure all technologies installed in your site have a good purpose? Not sure? Keep reading!

From the page to the server: The Data Dealers

Either you have been working for years on digital marketing or you’re new on this field, you probably have heard about pixels, tags, beacons, web bugsare they the same? If so, why all those names from the same concept? What a mess! If you ask for my opinion, I would say yes, all of them are the same, just pieces of code that send data to databases, like the Adobe or Google Analytics tags do.

All technologies that directly or indirectly you install on a website, always set up a way to send data to the platforms (platform = data base), I like to call them ‘data dealers’. There are three mainly types of data dealers: the image (using an Image object),  XML HTTP request (using an XMLHttpRequest object) or beacon (using the new navigator.sendBeacon method).

1. Image/pixel tracking

Images have been the most used way to send information between pages to platforms. When you want to show an image on a certain web page, the html code has to be something like that:

  • ‘alt’ parameter describes something related with the image, basically used by the search engines to positioning the images on their search results.
  • ‘width’ and ‘height’ defines the size of the image.

Good! Understood the image structure, here is the trick: each time the browser loads a page, it goes to the server, grabs all necessary stuff, comes back and finally shows everything it picked up. That, of course, include the images, so taking advantage that the browser makes a travel to the server, let’s use it to do some stuff. Look at the code below:

Is it pretty much the same than the other image right? This one just hasn’t the ‘alt’ parameter and the values for ‘height’ and ‘width’ are zero, this is because we do not want the searchers to read the images, as it’s not a real image, nor the users see the image, as it’s just for tracking purposes. Well, let’s pick up the url and paste it to the browser:

tracking-pixel

Ouu… this is not precisely an image, right? But the browser still process as such, so it goes to the server looking for an image and rather than that it’s obliged to visit another URL (realise that when you have placed the URL at the browser, the end part of the URL has changed to script.php).

To figure out why that happens, let’s see where is the image into my server. To access the server, in my case to pujoljulia.com, is necessary a kind of tool called FTP transfer, FileZilla is one of the most populars, so let’s open it:

ftp-tracking-cookie

Curious, at the same folder (tracking) there are the cookie image along with the strange script.php where the browsers finally goes to. Notice that the cookies.jpg image (left part of the above picture) really exists, so then why browser goes straight to script.php? Basically because we’ve ordered it by means of .htaccess (file that manages things like redirections or access rights to folders among others). This is the code that my .htaccess file contains:

First line makes sure the comment mode is off, just to avoid problems on the second line, where RewriteRule is imposing to the server a rule: “Every file into that folder that ends with .png, .jpg or .gif, please redirect to script.php file.” (or with another words, a 301 redirect). You may be wondering why do that instead of go straight to script.php… that’s because some browser plugins, aiming to block the browser tracking activity, prevent the images server petitions on those URLs that clearly aren’t images.

So far so good, we already have cheated to the browser, so it is now into script.php file, which means everything coded into the file will be executed. Here is the code into script.php:

The code is enclosed into an ‘if’ sentence, which checks if the user’s browser hasn’t a cookie called ‘userID’, if so it is created and stored with a random number ID. Then the cookie value along with the landing page and page’s language is recorded into ‘data’ document (file which is found into my server as well). To check if it works, you would have to place my pixel to your or someone’s page, let’s do it through Google Tag Manager – GTM for example:

gtm-user-id

Once the pixel has been published, the first thing to realize is that certainly a cookie associated to ‘pujoljulia.com’ has been set it up at your/someone’s domain:

third-party-cookie-tracking-pixel

Next thing to do is check the redirect (from cookies.jpg to script.php), which can easily be done by going straight to Browser console (to open the Google Chrome console remember, press F12 on windows devices and alt+cmd+i on macintosh), Network tab and filtering by ‘pujoljulia’:

pixe-image-tracking-redirection

At the left part, you are seeing everything loaded from ‘pujoljulia.com’ server, and… SURPRISE! We just asked for the cookies.jpg image (as we just place an image tag) and the script.php file has also been loaded… Ou ou ou man, this is a mess! Quiet! just look at the ‘Initiator’ column, which inform us ‘who’ has ‘initiated’ both files. ‘Other’ normally means the file has been initiated by the page itself, it’s the case of ‘cookies.jpg’, but what happens with script.php? Exactly it is initiated by ‘cookies.jpg’ (as explained, it is exactly what we ordered to do, a 301 redirect through .htaccess file).

Finally, let’s check what happens with the data file. You can access to it just placing this url http://www.pujoljulia.com/tracking/data to your browser.

info-tracking

Wow! so are you telling me that with a simple image, a 3rd party cookie has been stored to my browser and plus you grab the landing page and its language? Yes, this is exactly what I’m telling! Obviously the tracking platforms are not using a text file to store data, they’re using structured Databases, but just for understand the concept a text file is enough.

Now let’s think BIG. Imagine that all people which read my blog grab my pixel and add it to their pages just to test how it works, in that case, the data file stored on my server is going to record the landing pages along with its language of all users that visit my readers’ websites, and of course, dropping a cookie on their browsers. If my blog was written in different languages, I could use the language information stored in my data file for automatically show the blog with the same language than the users have visited my reader’s sites without the necessity that these users visit previously my website.

That’s very powerful, isn’t it? Maybe not with my example but what if instead of saving the landing’s page language I start to grab interests, preferences, navigation behaviors… I even could adapt my site to each user’s preferences or even attract them through banners that fitting with their preferences, fact that probably will improve my outcomes. As you can start to imagine, that is basically the way the online advertising works.

QUESTION: Do you know everything your tracking pixels are doing?

2. XMLHttpRequest tracking

The things improve and javascript is not an exception. Time ago javascript incorporated the XMLHttpRequest API, which enables a Web page to update just a part of the page (in technical slang is known as make an HTTP request) without having to full refresh the page; we could say that XMLHttpRequest API is the core of AJAX programming technique.

As already explained, each time a page loads the browser goes to the server/s and grabs all necessary stuff that the page requires, so we previously agreed that an image is an HTTP request right? Okay! the XMLHttpRequest API will help us to do the same but without go through an image.

With this way, the Access-Control-Allow-Origin HTTP Header has to be present in the server response and since headers could only be set up through server-side language like PHP and knowing that is impossible to place code into an image, the cookies.jpg image cannot be reached to cheat the browser (nevertheless, it is not necessary here, as the tracking-prevent plugins mentioned earlier always are looking for an <img> tag). Instead, we’ve to point straight to script.php, here is the code:

As you can see the code is really simple. In the first line the xhttp variable is an instance of XMLHttpRequest object, which then it’s used along with the open method (mainly GET or POST), the URL to call and the call method (true=asyhcronous, false=shyncronous). withCredentials method has to be set up as true to be able to drop the cookie on the user’s browser. Finally the send method makes the petition to the server. Also some changes have to be made on the script.php file:

Basically we have added the headers at the two first lines: Access-Control-Allow-Origin, which establishes the connection between the page and the server (in this case pujoljulia.com) and Access-Control-Allow-Credentials, which allows to set up the cookie (notice that the credentials have to be present on both, inside of the petition through ‘withCredentials = true’ code and here, in the response).

As with the previous one, let’s place this script to another page to check if it works, this time through Dynamic Tag Manager – DTM:

XMLHttpRequest-DTM

And here is the communication established (to open the Google Chrome console remember, press F12 on windows devices and alt+cmd+i on macintosh):

XMLHttpRequest - Chrome Console

On the image above, the most important things to keep in account is the two headers and the cookie we implemented on the script.php, as you can see everything from the php file is reflected here. If we follow the same steps as with the <img>, you will realize that everything is replayed, a cookie is established on pujoljulia.com domain and the data file records the proper data.

QUESTION: Do you know how many XMLHttpRequest for tracking purposes are made in your site?

3. Beacon tracking

On the previous section I’ve given you an overview about how XMLHttpRequest works along its open method, which accepts three parameters. Remember we set up the last one as true as per its definition, it sets the information to be sent asynchronously, meaning the request is made apart from page timings. This option helps in a situations where we might want to make a request just when the user is close to abandon the page.

Let’s say that we want to record the name of the links on which the users are most likely to abandon our site (because of external site links). As you have to made the request after user clicks over the link, in this moment some browsers are starting to abandoning the page for loading the new one (that is known as onbeforeunload event) and so they cut off the javaScript execution (even asynchronous XMLHttpRequest call) making impossible to make the request with the methods seen before (there are some workarounds, but usually affect the page performance). sendBeacon method of Navigator object comes up for solving that kind of situations, since it always make the request asynchronously and cannot be stopped by browsers onbeforeunload event. Take a look at the next code:

Easy right? I think there is no need of explanation. The unique cons of use this technique is the browser support, you may still get in troubles with a few users:

send-beacon-navigator-support

Almost all browser supports it on the last versions, is really a shame that Safari and IE don’t follow the W3C standards.

QUESTION: Do you know how many navigator.sendBeacon are made in your site?

The script.php file used for showing how data dealers work just grabs the landing page and the page’s language, that is because we just used PHP and basic javaScript codification. Usually the platforms that provide the pixels offer the possibility to enrich them with some user behaviour data through the GET parameters, also known as query string part of the URL:

url-structure

This part of the URL is used to send information between servers/pages, as all parameters can be read or append by either server-side languages (like PHP) and client-side languages (like javaScript). To demonstrate that, let’s take back the image/pixel example, and append to the image some parameters:

Just to make sure we are at the same page, things to know:

  • The value of param1 is value1, so the param2’s value is value2.
  • The protocol understands that the query string starts when there is at somewhere of the URL a question mark ‘?’.
  • Each pair of parameter/value is separated by ‘&’, that means a new one will be appended.

Okay, let’s make the proper changes at the script.php to ready it to read and store the query string parameters:

The unique change made is at line 13, where I append a couple of $_GET[] sentences, each one for each parameter, to read and write the parameters’ value at data file:

data-get-parameters

Parameters now are recorded at data file. Obviously as this is just an example, the same values are written for all users but in real scenarios, where the values are meaningful data picked up from the page/user behaviour, the values are appended dynamically through javaScript and for each user the values may differ. So… javaScript is needed to gather the data right? Ummm… looks like I have to keep writing!

Gather and deliver all data to Data Dealers: JavaScript

At this point you may be wondering yourself what weird is that, as the codes you’ve probably seen or placed at your or others website don’t look like to any of the three ‘data dealers’ introduced before. Most probably the codes you have placed looks like that:

They are very different right? But they have something in common, somehow they always call an external javaScript file (‘call’ means that all code inside these external javaScript files is executed as if them were placed directly by the site developers):

If you take some of these URLs and paste them to your search bar browser, you gonna see an ugly mountain of code. Each one of this javaScript files, inside themselves, make a call to pass all the information gathered (thanks the javaScript file itself) to the data dealers, which as explained, travel to the server to do whatever they have to do there, among other things, store data from the site. I’m not gonna go inside each script, but just for show you an example, let’s choose Google AdWords:

adwords-code

As seen at above piece of AdWords code, through javaScript is possible to build up an image tag and inject it dynamically to the page source code once it loads, so Google AdWords use the image/pixel tracking data dealer to make the HTTP request to the server and so send the proper information. If I place an AdWords code to my site, here you can see how the browser processes the image, which means sends the data to the server:

AdWords-pixel

Good stuff! Now we know that javaScript collects and organize all data and when be ready, get in touch with some data dealers to send the information to the server. Knowing this we can wonder ourselves, what if the javaScript calls more than one data dealer? Ummm.. that means more HTTP requests so more data sent to the server or ‘servers’… Let’s take a look at the next picture:

javascript-fire-pixels

If this javaScript code was placed on some conversion page, now all conversions would be recorded on 5 different server/platform, is this bad? Depends, was the marketing manager aware of that? No? Then yes, it’s bad Yes? Does the marketing manager know for what the data collected is used?

Some of this platforms, magnific platforms by the way, bring the ability to use this data to retarget people (thanks to the cookies that each pixel has dropped in the users’ browser) which certainly increase the odds to users to convert.

Let’s say I’m a company that helps your Coffee Machine Ecommerce on digital marketing matters and I provide the javaScript code seen on the image above. Now that I recorded a handful of users that have bought a coffee machine, could I use this information for another company that wants to sell coffee capsules online? Of course, since each platform as dropped a cookie at the users’ browser, now I can reach them through 5 different platforms, among which Google Searcher Text Ads, Facebook Ads, Google Display Ads Network…

In this situation, are your coffee machine company making profits for the increase of sells of the coffee capsules company? No, but… it should right? And what if some day your coffee machine company decides to sell coffee capsules as well?

ADVICE: Take care of your data, otherwise some day you will find yourself paying for your own data.

Final thoughts

If you’ve got here, that means you really care about you data, which makes me happy! The goal of this post is not to frighten you, but aware how important this subject is for your website. There are some Chrome’s plugins like Observe Point, Ghostery or Wasp that could help you to figure out what technologies your site has and what data are them grabbing, but in my experience the most reliable tool is the Developer Consoles of the browsers.

Some agencies could help you with this task, but my advice is always be surrounded of someone that handled very well with programming, as he/she can take care of something so important as your website is.

I really hope this post was worth it for you! Feel free to leave a comment if you are unsure about what technologies are deployed in your site or what are they doing 😉

Your comment

Your email address will not be published. Required fields are marked *