[Update 2: I just tested with a newly-created Gmail account and the feature did not seem to have been rolled out to the new account yet.]
[Update: I'm not sure when this feature will actually be rolled out. I think my test below automatically displayed the image because my own email address appears to be implicitly a whitelisted sender (even though "images from this sender are always displayed" doesn't appear for it). Whether Google will alter the behavior when they actually deploy this feature, I don't know.]
[Original message:]
I just tested and, yes, Gmail only loaded the referenced image when I clicked on the message to open it within Gmail. I can't be sure, because perhaps if I had waited an hour without opening the message, Gmail would have automatically loaded the image anyway. But in reply to mherdeg below, the evidence suggests that, yes, Gmail plans to opt everybody in to sending "read receipts" by default for HTML messages that reference images.
I'm surprised by Google's statement that the previous behavior of prompting was "to protect you from unknown senders who might try to use images to compromise the security of your computer or mobile device."
I realize this was a benefit, but I always thought the main purpose was for privacy --- not to betray to the email sender when I opened the email. My guess is that Google did not view this as a privacy setting, or they probably would not have forcibly changed everybody's setting.
It's doubly strange that they did so without a notice inside Gmail that they did so -- just a blog post.
I just ran the same test and can confirm the results. Google will only load your image if you open the email, which means Google has just opted-in all users to mail receipts.
I don't use any Google services outside of small tests like this, but it still makes me concerned for how this will affect the privacy of people I know.
"Email open" tracking just got a lot more reliable for all mass email & marketing automation vendors.
On the flip side, those same solutions can no longer set a persistent cookie with the image, so persistent tracking based on the initial email open will stop working.
> "Email open" tracking just got a lot more reliable for all mass email & marketing automation vendors.
Has it? If Google's proxy is caching images, then "email open" tracking might have broken entirely. All the sender would see is that their email has been opened once by the proxy -- for all gmail addresses put together.
I'd imagine that they are going to de-dup the images they proxy which means email marketers need to generate unique images per mail and that means no more 1-pixel tracking images.
A solution would be 1-pixel high tracking lines - a 1 x 128 pixel wide image that encoded 0 and 1 as two RGB colors adjacent to the mail's background color in the visual spectrum so the difference isn't noticeable would encode a sha-1 hash placed in the url.
Mass-email senders probably would put a unique identifier in the image url (different for all users), so Google will open each image, because it can't know before loading them that it's the same image.
> "Email open" tracking just got a lot more reliable for all mass email & marketing automation vendors.
No, it didn't. If you had chosen the option to ask before displaying external content -- which existed and applied to non-image content and, without which selection, email-open tracking by external non-image content was already reliable -- then the new setting to ask before displaying external images is selected for you by default.
If you hadn't selected that option before, you weren't protected from "email open" tracking.
Interesting. The fact that they don't even address this aspect of the change in the blog post makes you wonder if this is a deliberate or incompetent move. This should be obvious for anyone who works with email and easy enough to describe in layman terms the blog post. Who is the target group for the blog?
I assume the target of the blog is Gmail power users moreso than email markers. I highly doubt that the Gmail team didn't think this through before launching. As far the reason for not explaining how this works, who knows?
> Google will only load your image if you open the email, which means Google has just opted-in all users to mail receipts.
If you didn't have the "ask before displaying external content" option set before this change, you were "opted-in" to read receipts already -- its just that, due to protections designed to stop other malicious use of images, you were incidentally protected against images as the vector for silent read receipts.
With this change, you are better protected against the malicious uses of images the default-not-to-display option was designed to protect against, but exposed to external images as a vector for read receipts if you hadn't chosen to display external content only after confirmation. If you did choose that previously, then you also got the new "ask before displaying external images" chosen by default -- so if you were protected from senders injecting read receipts before, you still are now. If you weren't before, you aren't now, but then that's not really a change.
It's also weird that they didn't explain the how behind this line:
> Instead of serving images directly from their original external host servers, Gmail will now serve all images through Google’s own secure proxy servers.
In most cases, the unique identifiers are embedded in the URLs themselves, so simply serving through a proxy is ineffective. Should I blindly trust that you, Google, did the right thing?
I wonder if this change is a result of backlash over the promotions tab. These type of referenced images are most commonly used in marketing campaigns and were from businesses likely to pay good money to AdWords. As a concession for fewer overall impressions, perhaps, these groups got Google to let them track easier? The whole thing smells fishy.
They'll probably retrieve and cache every image as soon as the email is received which would effectively render open statistics meaningless for GMail addresses.
They don't yet. I wouldn't be surprised if they do the same as with 'not provided' where they slowly make the data less reliable until they finally just turn it off altogether.
I appreciated the lack of pictures of large penises that accompanied spam. And of course the fact that you didn't get a tracking pixel fetched. So I wonder if they are going to fetch the image from their servers, cache it, and then show it. Cutting off a supply of information for email marketers, whom they will offer to supply 'opening' information for people who use the new Gmail Promotions feature. (ok that is a lot cynical)
Uniquely name at least one image per outgoing email where the image name is tied to a recipient ie a316f002a5d080a613dce89a4ad8f9a9.gif uniquely identifies myemail@gmail.com. If google doesn't fetch the image until you open the email you can also determine open time. If they request and cache all images at the time the email is received regardless of its having been opened then this doesn't work.
Thanks, the next question is if gmail sees a bunch of emails from the same sender with these hash-named images, I wonder whether they will squash them.
They are already scanning the content of the emails so there's nothing stopping them from determining if the images between emails are the same regardless of name, and even then just sending down whatever they have cached, 1px transparent gifs are still common for this. Could break some a/b testing software though. You could see marketers move to including a unique image per recipient like a gravatar. If it were me I'd just include something like the github avatars in the footer of every message. Google can learn that those are tracking images and block them but will they?
"Thanks, the next question is if gmail sees a bunch of emails from the same sender with these hash-named images, I wonder whether they will squash them."
Are you saying squash the sender or squash the tracking images?
I hope gmail doesn't start squashing my emails because it contains a tracking pixel.
I don’t think you quite understand the changes here, no "read receipts" are sent, any analytics sent only point to the Google proxy processing the images, no individual recipients nor their actions are revealed.
See how marketers are scrambling to adjust to this change:
a.) Gmail is now requesting all images from proxy servers (googleusercontent.com), which incorrectly situates users in its headquarters in Mountain View, California when images are downloaded. This impacts the ability to geo-target image content for those Gmail users who are affected by the changes. (Note: Local Maps using zip codes appended as query parameters are unaffected.)
b.) Gmail is stripping the user-agent headers from the client request, which eliminates the ability to determine the Gmail user’s device and target image content appropriately.
c.) Gmail is removing the cache-control headers from the responses, which forces the user’s images to be stored in their browser’s cache for up to a day. This only impacts live image content if a Gmail user re-opens the email after the first open.
OP was using "read receipts" colloquially, to include "tracking images with a unique code embedded in them".
And, as such, OP's claims are exactly correct.
The only way this would not be true is if GMail pulled every image in every email, even if it's not read by the recipient. Given GMail's usage of the term "proxy server" in their blog post, as well as the tests by the OP and others on this thread, this appears not to be the case.
Gmail seems to be proxying the images through: https://ci5.googleusercontent.com/proxy/ and my understating the polling happens when Google receives the email not when it's opened.
> I don't understand how a proxy will protect me from an image loaded as
It will protect you from it being as a read receipt if (and I'm not sure if this is the case, though it should be trivial to test by sending email with images served from a site you control to an email you control without opening it) Google requests the image once it has received the email.
Now it will seem to the sender that 100% of e-mails sent to Gmail recipients have been opened, rendering actual measurement impossible. A unique ID is useless if all unique IDs are requested all the time.
I'm sure they built-in rate-limiting to prevent DDOSing the sender's image server...
Apparently not. First, only messages read in gmail.com and Gmail mobile clients (not any other email clients). And apparently the image is not requested from the sender until the message is opened. Finally, apparently any querystring parameters will continue to function as usual.
Even if Gmail loads tells the marketer that you have opened the message (as people are pointing out, it's a bit unclear), there are still some advantages.
It protects you from the guys at marketer tracking your user-agent, your ip address (which gives a rough geo-ip), the number of times you opened the email, etc. It's unclear to me whether the images could set cookies before (they probably did), but even without that, they could just use etag-tricks, or stuff like that, to track you cross-sites.
Marketers might now know when you open the message, but proxying the image prevents them from getting more precise information.
(No idea exactly how much of this and more Google does, obviously, but they put themselves in position to do it)
This proxying actually rolled out on December 3rd to most gmail.com users. We, Streak, happenned to launch an email tracking feature on the same day (http://www.streak.com/email-tracking-in-gmail).
Here's what we've learned:
- the proxying of requests only happens when a user is viewing the mail inside Gmail (i.e. gmail does not actually affect the message body, its just proxying at render time)
- gmail only caches in the image for a few minutes. So for email marketers, if your recipient views the email then views it again a few hours later, the marketer will see two requests for the image
- there is basically no personally identifiable information in the request that Google's proxy sends to the server hosting the image. So from that perspective this is actually a boost in privacy than the previous state of the world. None of the headers, cache controls, cookies, ip addresses, referrers or user agents are passed to the original image server
- obviously you can encode some ID into the image URL itself but all that lets you do is identify the email address of the user that opened the email. But you already had their email address because you sent them an email - so again, no PII gets disclosed
- it is true that marketers will see a more accurate count of opens (because displaying images is on by default)
- there seems to be several ways to get gmail's proxy to NOT cache the image and simply proxy the request every time the user opens the emails
> - obviously you can encode some ID into the image URL itself but all that lets you do is identify the email address of the user that opened the email. But you already had their email address because you sent them an email - so again, no PII gets disclosed
This functions as a read receipt (like the tracking pixels).
The image might be cached later, but since it is initially loaded the first time an email with it is is opened, this means that implementing read receipts on all outgoing emails is as simple as making the URL for each image unique to the user.
Thus, the marketer knows:
1) That the email was opened
2) When the email was opened
along with whatever information they already have about the user.
This is a HUGE privacy implication. Even if "no [further] PII gets disclosed", it discloses a lot of information that is both sensitive and easy for marketers to join with existing identifying information.
You might consider this to be a violation of privacy but you're not disclosing any PII that the sender didn't already have. And in fact it's decreasing the amount of PII that's being disclosed because you're not longer sending any browser information when the image is loaded.
It's disclosing that the spammer (or "marketer") has a valid address. Since I only load images in email from people I trust, there is no way this can increase my privacy.
> obviously you can encode some ID into the image URL itself but all that lets you do is identify the email address of the user that opened the email. But you already had their email address because you sent them an email - so again, no PII gets disclosed
All that lets you do is...confirm that the email address exists. Until this change, that was a very difficult thing to do; now, its equivalent to getting a message through a spam filter. This is going to help spammers _a lot_.
It wasn't very difficult to do - in our testing, approx 60% of emails we sent out had images requested for them meaning users clicked on the display images link. May be us nerds dont do it but regular users seem more likely to.
That's assuming that people are just as likely to allow images from a message that kinda looks like spam as they are from a message that's from a person/service they know.
By "exists" I really meant "is actively used". There are probably millions of gmail addresses that "exist" but no one ever reads (I personally own over a dozen).
> all that lets you do is identify the email address of the user that opened the email. But you already had their email address because you sent them an email
You just have an email address. Once that image is downloaded, you know it's active the sky basically falls down.
Thanks for the information! So if you want to display a dynamic image containing real-time information when the recipient opens the email, it will still work since Gmail only caches the image for a short time, right?
Depends what you want to base the dynamic information on. If you want it to be time based, I.e the current t stock price then that should still be possible. But say you want it to be based on user location, like the weather in the users location then that won't work because you won't have the users original IP address so you won't be able to roughly detect their location.
Is Google loading these images when a user opens mail? (Have they just automatically opted users in to "read receipts"?)
Or are these images pre-cached when mail is delivered (so that the fact that an image was loaded is just proof of delivery, which you should be able to get anyway?)
This is a step in the right direction. However, please understand that it doesn't really make a difference in forms of real privacy Googlers!
The e-mail spam/list creators are a different kind of adversary than, for example, web trackers.
They will do something like this:
http www theirimageserver com/images/img53.jpg?to=you@email.com
(Obviously they will obfuscate and use some kind of hash instead of cleartext e-mail to disguise their tracking ways).
Regardless of whether
some.google.ip loads it, or your.home.ip loads it, it won't change the fact that you@email.com loaded it and your email is very active, not just active in that it didn't bounce, but active in that you actually read it.
Once again, it's a step in the right direction though, and I'm looking forward to seeing greater innovations from Google in the privacy space, because I'm confident that there are Googlers who understand that privacy is not a feature nor a PR thing... it's the difference between the preservation of humanity and society versus not.
They can mitigate that by downloading all images that come to any address @gmail.com. That way spammers won't know if you@gmail.com is real or not, and still be at step 0 (and hopefully taking some bandwidth/processing time/log space from the spammer at the same time!).
This is probably obvious to people who do email marketing, but I just tested whether gmail bounces bad addresses. They do bounce addresses that have never been registered and addresses that have been deleted. This means that even before this change, spammers could find out if an address is active.
Bouncing badly typed addresses is more useful for users than stymieing spammers. However, a spammer doesn't need to know if an existing gmail address is being actively used or not.
Correct me if I'm talking directly from my rear end:
For that scenario, Google could ignore the arguments and just request the file. However, I'm not sure I get Google's implementation of the proxy, but if the images are grabbed only when the email is read, it's easy to track by adding a hash to the filename itself.
So, in the email company's server, they'd have a canonical file, for instance, acme-co/dec2013news/img53.jpg. For each subscriber, they'd have something like a symbolic link to img53.jpg specific to the subscriber, for instance img53-5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8.jpg.
EDIT: Checked a newsletter's source and that's what they already do. Since I don't deal with email marketing much, I honestly had no idea.
You've got that backward. If google fires off a request when you open the email, the companies can track whether you opened the email.
If google fires off a request when they get the email, the company has no way to track if you opened their email.
If google fires off requests to all emails they receive to @gmail.com then the marketers won't even know if it's active @gmail.com account, they'll just know that *@gmail.com is received, which isn't information, it's an easy assumption.
Actually, if they do it on delivery, it ruins the signal (since there is no longer any distinguishing factor between active accounts and inactive accounts), so the latter protects privacy more effectively.
I hate to be "that guy", but does this mean Google is storing every one of our pics on their proxy servers? For how long do they store them, and what is their data retention policy?
Also, remembering that Google has no obligation to protect non-American users, does that give the NSA access to them, to run things like facial recognition, etc?
Because if you stop serving up the remote reference before it gets requested by a particular client, it won't be there.
If it's archived {immediately,on first request} permanently thereafter, there's now another copy outwith your control.
Edit: moreover, the images are requested locally by the user's browser. Google doesn't get a look at them at any point[1]. Whereas now, they get a permanent copy for free, because they're doing you such a favour!
[1] Directly, anyway. I guess they could use js to get the requests and headers and submit them back to the mothership, but on slow connections that could be pretty obvious.
yeah, OK, but again, it's not really an invasion of privacy bcause they get the mails and the URLs in the first place. If you don't want your e-mails read by google, you probably don't use GMail.
Assuming worst-case NSA scenario, even if the images weren't hosted by Google, the NSA would just pull the external images anyway for facial recognition/what not.
Another commenter above noted that the cache duration is on the order of "a few minutes". If the email is opened a second time later on in the day, Google's cache hits the server again.
(Of course, this doesn't rule out the possibility that Google is actually holding onto each version they cache. I imagine it's not actually worth it for them to do so.)
That was my initial reaction to. I suppose it has its pros and cons just like anything else. However, with recent Google (among other companies) and NSA dealings I'm looking at this with my cynical eyes as another data retention policy.
Yes. Whilst you can embed images as MIME attachments (or data:// uri trickery and the like), the vast majority are <img> tags referencing external http:// uris. A message with remote image references is not a complete document, and won't render correctly unless those references can be followed. Google are instead fetching and caching those remote references, then serving them indirectly to you.
As comparison, Firefox has a 'save page' function which distinguishes between 'html only' (akin to the mail message), and 'complete', which would include all images, stylesheets, external js files, etc.
To me, it sounds like they did all the heavy lifting for the NSA, and then packaged it up as a "feature" for us.
If Google handed over the emails themselves to the NSA, it would contain a lot more noise, but now it's really simple for the NSA.
The work is obviously to sift through all the emails, remove the photos, and separate/store them on a separate server. Now all NSA needs to do is get a daily dump of all the pics, and then run their data collection/facial recognition on them.
I'm curious if this will help spammers. AFAIK, loading a tracking pixel helps validate an email address as active (since, by design, bounce messages probably wouldn't make it back to the spammer), even if the recipient didn't otherwise respond to the message. AFAIK, "Validated" email address lists are worth more than unchecked lists and if Google is preloading images for valid accounts, then that seems to make validation even easier for spammers.
There's still an "Ask before displaying external images" setting, and based on the description in the article, it looks like images are only requested when the email is opened, not when it is delivered to the inbox. But, this new system looks to be enabled by default, so more people will have images enabled, which means the web bugs will mostly get through now.
However, if they make requests (that they don't necessarily have to keep) for images for all accounts (preloading on receive and not read), it does the opposite, which is a good thing.
The problem is, if the filename/URL is unique to the user like "spammersite.tld/images/50093825343.jpg" and 50093825343" is tied to my unique email, then on Gmail's download and caching of the image, they've validated my email. If another email has 023503850485.jpg, gmail wouldn't know that the underlying file is the same unless it loads it. I don't even have to have checked my mail for this to happen.
I kinda like this. More often than not, I never click the "Display Images" link because I know doing so will essentially feed information back to the people who sent the email (the fact that I opened it, my geoIP location, browser, etc). I've been fine with a slightly degraded and less pretty email experience in order to give me some extra privacy and keep my information out of marketer's systems.
With this I guess I can see emails with all the images and other goodies without that worry. Works for me.
On the other hand... There's a good chance Google is caching all this now. But seeing as they're running the mail system I don't feel like it's too major of an intrusion beyond what they already have.
Since images usually contain tracking information, I wonder how this helps? Maybe the proxy automatically gets and stores the image as soon as the email is received? So in this manner, they will not be able to tell if/when a user actually clicks on a link?
It devalues other tracking solutions, and makes Google's own paid analytics services more compelling. So users will still be tracked, but the trackers are more likely to have to share some of the money and data with Google.
Google offer paid email analytics? Out of interest do they have a free version like they do for websites too? All I can find online are things like [0] showing how to use web analytics to track emails etc.
Tracking is currently only useful insofar as receiving a URL request for the image indicates that someone opened the email, as well as providing whatever metadata is available via the request, through geo-locating the request IP or what-have-you.
This change makes all of that impossible: Google will (presumably) always request your image URL, whether the user opens the email or not, and the request will come from Google, with their metadata, not your target.
Because relative to the quantity of email Gmail has to process, the images returned from tracking links are likely a drop in the bucket, and requesting them all provides their users with even better privacy protection than the "Show Images" toggle did.
If you want to be cynical, you can note that Google will still know which emails you opened and which you did not. Does the current Gmail TOS restrict them from selling that information to advertisers, or (more likely) using it to target ads? Probably not!
I suppose if they're clever, they'll figure out when a sender is serving a million copies of the same image to slightly altered URLs in the same email template, and forgo the requests, but either way, the sender loses the analytics.
If you want to be cynical (or perhaps realist), you will see this as just another effort to push marketers to paying for expanded analytics data from Google.
I doubt any of this is done with the privacy of users in mind.
It doesn't sound that bad. First, apparently only affecting 2-5% of mail. Secondly, everything works the same except for the absence of useragent and cache control headers (which aren't that helpful anyhow for analytics, especially when you're still getting those on 90%+ of your messages).
If the gmail proxy caches every image that is sent to a gmail address, then this bad for spammers, and good for everyone else. That's one less mechanism to verify valid (or active) email address.
Just tried this out with https://emailprivacytester.com/ - The proxy request only happened when I viewed the email. It came from 66.249.88.50 (google-proxy-66-249-88-50.google.com) and had the User-Agent:
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (via ggpht.com)
I guess the compromise is giving the user the option to choose: (i) I would like to be tracked, (ii) I don't want to be tracked. I genuinly dislike people who send me personal/work emails with tracking mechanisms...
So if you send a different picture to each recipient, you can track if they open your message? Or does Google pull all images, whether you view the message or not?
Yea, it's hard to not think this is ultimately not a great move. It seems rather shortsighted from a Gmail user's perspective if Google can't address concerns over spammers being able to verify email addresses, or even just analytic trackers. It is also vastly less appealing to hear that Google plans to cache all images, which we know Big Brother is grabbing as well. And in the case where someone may want to actually see images for a marketing email (albeit, extremely rare for me), it actually hamstrings the source of the email from possibly providing customized images/content based on geolocation/browser/etc. that I could be interested in seeing.
So, what's the upside vs. just having the option to display images as desired and NOT have Google cache them?
That feels a bit like the "I have nothing to hide argument." Should we just accept that probably spammers no longer send address verifying emails, and tacitly approve this change helping them out if they do? Ultimately, the default should be to let people decide for themselves to opt in to something like this. Forcing this with some pretty big head scratching holes in the benefits seems rather evil.
In some cases, senders may be able to know whether an individual has opened
a message with unique image links. As always, Gmail scans every message for
suspicious content and if Gmail considers a sender or message potentially
suspicious, images won’t be displayed and you’ll be asked whether you want
to see the images.
Mailer can setup url that is composed of random words and is unique per email.
Ex: www.tracker.com/weather-dog-city-nice.jpg could identify you + timestamp of request and bam, you have record of: valid email, address isn't blocked and that user reads emails from recipient. No proxy in a world would be able to make this request anonymous.
No idea what advantage of this is apart from google eventually offering an alternative to gmail (think of comment system being replaced on youtube)
> Of course, those who prefer to authorize image display on a per message basis can choose the option “Ask before displaying external images” under the General tab in Settings. That option will also be the default for users who previously selected “Ask before displaying external content”.
I think that's a great feature. Loading images manually is always annoying and isn't a great user experience. Novice user's don't always understand why it's risky and they probably opt to display the images anyways.
However, one interesting question is how can third-party analytics will workaround this? Is there a way? Given that gmail holds a large market share of email users, this is really going to negatively affect the usefulness of such services.
Seems it's being rolled out in a sloppy fashion. In my Gmail account, images in email have suddenly stopped working. There is still a "display images" button, and when I click it, the images do not load. When I attempt to open an image link from the email message in another tab, I get a 404 error on Google's proxy server, with a super long URL.
Disclosure that I'm from Campaign Monitor and wrote that first post. As you can imagine, we've been getting a fair few enquiries about image tracking and opens today...!
Just because it's through a proxy doesn't mean identifying information can't be embedded in the URL.... This is used as a "customer read email" callback.
[Update: I'm not sure when this feature will actually be rolled out. I think my test below automatically displayed the image because my own email address appears to be implicitly a whitelisted sender (even though "images from this sender are always displayed" doesn't appear for it). Whether Google will alter the behavior when they actually deploy this feature, I don't know.]
[Original message:]
I just tested and, yes, Gmail only loaded the referenced image when I clicked on the message to open it within Gmail. I can't be sure, because perhaps if I had waited an hour without opening the message, Gmail would have automatically loaded the image anyway. But in reply to mherdeg below, the evidence suggests that, yes, Gmail plans to opt everybody in to sending "read receipts" by default for HTML messages that reference images.
I'm surprised by Google's statement that the previous behavior of prompting was "to protect you from unknown senders who might try to use images to compromise the security of your computer or mobile device."
I realize this was a benefit, but I always thought the main purpose was for privacy --- not to betray to the email sender when I opened the email. My guess is that Google did not view this as a privacy setting, or they probably would not have forcibly changed everybody's setting.
It's doubly strange that they did so without a notice inside Gmail that they did so -- just a blog post.