Dropbox opening my docs?

yesbabyyes · on Sept 12, 2013

LibreOffice has a pretty powerful document conversion, which you can run headless. I'm guessing they are converting to HTML and perhaps other formats -- do they offer anything like that?

Edit: You can invoke it something like this:

    soffice --headless --convert_to html file.doc

I'm just speculating, but it seems reasonable that it would open the document just like the regular LibreOffice, fetch external resources and so on.

gjulianm · on Sept 12, 2013

It seems like they're converting it to PDF. When you click the .doc file in Dropbox, they open a PDF preview. It fits with the fact that the buzzs I've seen are only for the .doc file, not the HTML or the XLS.

the_mitsuhiko · on Sept 12, 2013

That's pretty much exactly what is happening. DropBox converts documents into HTML for easy viewing on the web interface.

skeletonjelly · on Sept 13, 2013

You seem pretty confident! Is your source you? A DropBox employee?

icoder · on Sept 13, 2013

abortz from DropBox has stated just this, in this thread, 4hrs before your post

davidu · on Sept 12, 2013

Dropbox used to use Crocodoc which was just acquired by Box... And now Dropbox doesn't use Crocodoc. And now we learn about LibreOffice...

Coincidence?

Just pointing it out...

altrego99 · on Sept 12, 2013

Why would LibreOffice allow the .doc to run whatever embedded macro within it which does the job of calling home?

milkshakes · on Sept 12, 2013

really?

you've already determined that it's running on an ec2 instance, but it's somehow "suspicious" that the user-agent is libreoffice? and you're a "security researcher" but "curious if this is an automated process"? please.

sure, dropbox might owe an explanation (even though you certainly gave them permission to do this in their TOS), and you can call me cynical and jaded, but this seems like pretty shameless FUD that appears to be tied to an effort to shill a new product.

EDIT: first i thought this was written by the HoneyDocs founder. now i'm actually unsure who the author is.

gjulianm · on Sept 12, 2013

The author of this post seems to be a friend of the founder: http://blog.threatagent.com/2013/09/whos-that-peeking-in-my-...

danielweber · on Sept 12, 2013

There's really no need to attack him.

milkshakes · on Sept 13, 2013

i'm not attacking him as a person, i'm attacking his actions, specifically in this case his shamelessly disingenuous linkbait guerrilla marketing masquerading as a public service announcement

brown9-2 · on Sept 12, 2013

It really does read like an unsubtle add for the honey service.

hoffcoder · on Sept 13, 2013

It really does not matter if there is something more serious like a vulnerability in the question. And so what if the link was a covert advertisement, the community is still benefitted if a popular product is put under the scanner. It is a win-win.

milkshakes · on Sept 13, 2013

except for the part where there is no actual vulnerability, it was actually a design decision[1], and the author of the article was raising ridiculous scaremongering questions that he knew the answers to in order to attract more attention.

[1]: https://news.ycombinator.com/item?id=6377712

sspiff · on Sept 12, 2013

> Further digging into the HoneyDocs data reveals a suspicious User Agent, LibreOffice. Now I’m curious if this is still an automated process or one that involves human interaction?

Yes, because humans use LibreOffice over SSH/X11 from an EC2 instance. Probably LibreOffice is being used for the parsing/rendering on a server. Probably for something innocent like generating thumbnails or text-only previews.

nigma · on Sept 12, 2013

They are generating PDF for online viewing. Go to your files on dropbox site and click on a .doc file. A preview popup will appear.

Open/LibreOffice with Python bridge is quite handy in converting documents to PDF format and can be run in headless mode (using virtual frame buffer like xvfb) on a server.

dweekly · on Sept 12, 2013

Dropbox uses (used?) Crocodoc to do its document previews, which would be interesting now that Crocodoc has been acquired by Box (a Dropbox competitor). Crocodoc actually ran full Windows VMs to have Word interpret Word, unlike what was speculated elsewhere here (using LibreOffice) - it turns out pretty much everything else sucks pretty badly at rendering Word docs, largely because the format is a bloody nightmare of binary encoded blobs including OLE embeds, etc. My understanding was that these VMs were run on AWS Windows instances, which explains why the document was seen opened on an AWS cluster. I know they had a fun nightmare of a time getting the right licenses from Microsoft to do this.

dweekly · on Sept 13, 2013

Whoops; I'm an idiot. The request had a UA of LibreOffice. Looks like Dropbox has indeed moved on from Crocodoc. My bad.

pwg · on Sept 12, 2013

Much ado about nothing.

If you don't want your cloud storage provider reading the data you give them, then _encrypt_ that data _before_ you upload it.

mikeash · on Sept 12, 2013

I disagree. If you must not have your cloud storage provider reading the data you give them, then encrypt it before you upload it. However, if you merely don't want them to, but it's not a big deal if they do, there's nothing wrong with expecting them not to go trawling through your data just because they can.

rexreed · on Sept 12, 2013

You might want to check out SafeMonk that does this exact thing. http://www.safemonk.com

gboudrias · on Sept 12, 2013

Am I reading this right? A third-party service that protects you from third-party services? And you have to install it everywhere? And it's not FLOSS? Please tell me I'm reading this wrong.

Edit: Okay I see it's based on FLOSS and that's great, but as far as I can tell they're still asking you to install binary blobs, which makes the whole thing pointless.

blcknight · on Sept 12, 2013

Install EncFS and use it on your Dropbox. No account, no binary blobs needed. You can compile all the bits for EncFS yourself, if you want.

bigiain · on Sept 12, 2013

This works, and works well.

I had a little trouble getting it to run on one of my older Mac OSX machines, but I'm pretty sure that was because I had the remains of a previous installation of MacFUSE messing things up -

There's also a MacOSX/iOS/Android/Windows commercial "wrapper" around EncFS which is fully compatible with the compiled-from-source versions of EncFS I've got running on Mac OSX and Linux (ARM and x86) - it's a "binary blob", but if your security/convenience tradeoff lets you consider that, have a look at BoxCryptor Classic:

https://www.boxcryptor.com/boxcryptor-classic

For me, the tradeoff of having secured/encrypted files available on iOS is worth the decrease in security by relying on Secomba GmbH not backdooring me at the request of the NSA or ASIO (my local security agency) – or anybody further down the security agency or law enforcement foodchain. I'm not actually trying to protect myself against targeted surveillance by any sufficiently powerful nation-state, but I feel good about knowing I'm not quite so readily caught up in "dragnet" surveillance…

CCs · on Sept 12, 2013

TrueCrypt would work too - it has block level encryption and Drobox syncs on block level too.

bigiain · on Sept 12, 2013

Hmmm, I wonder if TrueCrypt adequately secures a hidden volume's existence from an attacker (Dropbox) who can watch the patterns of your block level writes?

CCs · on Sept 16, 2013

It's content encryption app, not Rubberhose File System or similar.

gboudrias · on Sept 12, 2013

That's more like it :)

rexreed · on Sept 12, 2013

From what I understand, and again, I'm not with the company, it's not a cloud service, but rather downloadable / installable software that encrypts prior to storage on the disk.

gboudrias · on Sept 12, 2013

Hmm, but then why do you need an account?

And yeah, I see what you mean, but if you don't have access to the source, you don't know what they're making you install. I'm a huge FLOSS advocate, but in this specific instance it's more my paranoia talking. I believe I can trust them now, but how many clients will they need to have before the NSA blackmails them?

It's still a step forward, somewhat, but I find it hard to believe that there could be a successful product based on putting the user in full control (which is needed for real security).

rexreed · on Sept 12, 2013

The account is needed so you can grant authorization to others to access your files. EncFS is great if you are the only user. If you want to grant others access to your encrypted files, you need an authorization / authentication mechanism, which is why there's an account needed - for permission control.

But if you're the only user / potential accessor of the files, single-user strong file-system based encryption works.

gboudrias · on Sept 13, 2013

Good point. I guess you can't share data without trusting someone.

eli · on Sept 12, 2013

Did you bother asking Dropbox what's going on?

This kinda reads like an ad for HoneyDocs...

B-Con · on Sept 12, 2013

I hate it whenever an article mentions a service or drops an affiliate link and someone's verdict is that the article looks like advertising. Do you prefer your reading content to be devoid of mentioning any products or brands? Should bloggers never make a dime off affiliate links?

Be concerned with the content and only the content. If the article has it, it's legit.

biot · on Sept 12, 2013

I translated your comment to Galician:

  Eu odio iso, cada vez que un artigo menciona un servizo ou cae dun
  enlace de afiliado e veredicto de alguén é que o artigo parece
  publicidade. Prefire o seu contido de lectura a ser desprovisto de
  mencionar os produtos ou marcas? Se bloggers nunca facer un centavo
  off ligazóns afiliados?
  Estar preocupado co contido e só o contido. O artigo ten iso, é
  lexítimo.

If you are curious about other languages, I encourage you to test it out on your own! You can sign-up for a demo account at https://translate-o-rama.io/

B-Con · on Sept 13, 2013

That's spam, not useful data. No one upvoted that hypothetical offering of data. People did find this article interesting.

And hey, maybe I am interested in learning that language. Assuming your forum was appropriate (say, at a Galician convention), that could be a useful thing.

biot · on Sept 13, 2013

With apologies to Arthur C. Clarke: Sufficiently advanced spam is indistinguishable from useful content.

B-Con · on Sept 14, 2013

That's actually a good quote-analogy.

I'm not the most pragmatic person ever, but let me pose this: If you can't tell the difference, do you honestly care?

biot · on Sept 14, 2013

If the utility is high enough, then I might not mind. The problem is when the content's utility is insufficiently advanced and it is followed by a plug for a very related product/service that solves the problem identified in the content. That scenario casts suspicion on the content: "Did the author write this only to plug the product/service?". In this instance, the Dropbox info was sufficiently useful (it appeared to be an original discovery) that I don't see it as a huge problem.

B-Con · on Sept 15, 2013

That's an entirely reasonable position.

smtddr · on Sept 12, 2013

> Be concerned with the content and only the content. If the article has it, it's legit.

Wrong. Context is everything. You cannot look at data in a vacuum. You need to look at where, why, when and how - especially when it's sensational; i.e. something that may cause someone to take action.

B-Con · on Sept 13, 2013

> You cannot look at data in a vacuum.

I never said you could. We're not discussing the philosophy of objective statements that don't require context, we're discussing whether true facts are tainted by subjective elements around them. By definition, they can't be.

If the content is good, it doesn't really matter why it's there. But if you believe the context implies that the data is incomplete (aka, biased) or actually wrong, that's obviously relevant.

Especially if something was an ad, who cares? The data speaks for itself. You just need to verify the ad is correct and isn't mis-representing itself. In this case, the data is fairly objective, they aren't comparing their product to someone else's, just pointing out an action that a product was able to help perform, and that action produced interesting data about another service.

The article didn't come off as an ad, either. My main gripe was that some people complain over any mention of affiliated services. (People also gripe about non-labeled affiliate links by independent-millionaire, popular, respected bloggers.)

eli · on Sept 12, 2013

I'm not a big fan of the content either. If you're going to imply Dropbox is doing something sneaky then I think you owe them the basic courtesy of a chance to comment or explain before you hit publish.

bigiain · on Sept 12, 2013

"Sneaky" or not – it's definitely unexpected behaviour. I certainly didn't expect .doc files I store in a Dropbox folder to magically go and fetch remote resources. Whether it's "explainable" or not (and I guess the "they're generating PDF thumbnails" is a plausible explanation), Dropbox haven't taken any steps to inform users that this happens.

There's little doubt that somebody will think up a way to take advantage of this to "leak" information.

I wonder if this happens before or after Dropbox's dedupe step? I wonder if that provides an avenue to extract useful data?

mhurron · on Sept 12, 2013

Content is modified by the context. Someone trying to raise warnings about a competitors product should make you question the motives.

_3u10 · on Sept 12, 2013

This is equivalent to an appeal to authority.

Content is not modified by the context, a fact is either true or it is not. Everyone has a motive, it reminds me of how people call into question research sponsored by corporations as if people who work in government sponsored research are some how automatically saints with no ulterior motive.

To trust someone based on affiliate links is a quite silly line of deductive reasoning.

From the information provided it seems simple enough to verify, embed an image via URL into a doc file, upload to dropbox, see if the URL is accessed. No need to argue about motive.

takluyver · on Sept 13, 2013

Certainly context is relevant. Understanding who is saying something and what their motives are helps you to judge how likely facts are to be true, and how much weight you should attach to opinions.

Calling some work into question because of the authors' motives isn't a claim that some other group has no ulterior motives at all. Certainly everyone has some motives, otherwise we would never get out of bed. But some of those motives will change the discussion more than others. E.g. when HP sponsors a study that finds that their own ink works out cheaper than buying remanufactured cartridges, it's perfectly sensible to be more suspicious of that than if a study by a consumer organisation found the same thing.

tedunangst · on Sept 12, 2013

HoneyDocs is a DropBox competitor?

meowface · on Sept 13, 2013

It does read like an ad for HoneyDocs, but this was actually the first time I've ever heard of it, and upon checking out their website it actually seems like an incredibly useful service. So if it is an ad, they appear to have succeeded.

amvp · on Sept 12, 2013

LibreOffice is commonly used as part of a system to convert and generate previews for MS Office files. I would assume it has something to do with thumbnail generation or preview generation. However I don't seem to see thumbnails or previews of .Doc files (I do for images - for example) on the dropbox webapp - so maybe it's something their testing?

Guillaume86 · on Sept 12, 2013

Isn't that a thumbnail generation from dropbox? I remember a thumbnail entry in their API.

hoopism · on Sept 12, 2013

That would make sense to me. Especially since he only sees it on .doc files. Probably a thumbnail generator utility that uses LibreOffice plugin. Very interested to find out...

There's a saying that likely applies here:

"When you hear hoofbeats, think of horses not zebras"

robinduckett · on Sept 12, 2013

Thumbnail generator is most likely. I'm suprised Libreoffice will allow a GET call to an external website like that though.

VBprogrammer · on Sept 12, 2013

I hope that the servers running LibraOffice only have that job. LibraOffice has a pretty massive attack surface and its not the kind of thing I'd like to leave running on a server with another purpose while accepting documents from pretty much anyone.

The only thing to see here is that DropBox is potentially opening themselves up to a vulnerability, would be interesting to see if GET file://etc/passwd worked...

steven777400 · on Sept 12, 2013

On the one hand, it seems unlikely that an automated process would trigger external resource retrieval. In the same way, most processes that scan webpages for content or similarities don't run JavaScript, unless they are very sophisticated (this used to be a good way to protect against spam bots, for instance).

On the other hand, given how many files are uploaded to dropbox every hour, it's inconceivable that a human, whether through deliberate management direction or mischief, is opening all these documents. I would more concerned about human intervention if occasionally, a document triggered a buzz some days after it had been uploaded.

If all documents are showing as opened within 10 minutes, then surely it is just an anti-duplication automated agent at work.

blcknight · on Sept 12, 2013

I just tried this with a doc file, and the buzz was nearly instantaneous, within seconds -- 3 buzzes total.

Certainly it's automated.

Paranoid part of me says it's NSA keyword scanning. I feel a little insane suggesting that, but it's certainly conceivable these days.

The other possibility is Dropbox is indexing the files for search?

Anyway, using Dropbox unecrypted is a terrible idea. EncFS has user-friendly frontends like Boxcryptor.

gjulianm · on Sept 12, 2013

Keyword scanning shouldn't resolve images, unless they're using OCR to read any text they have. In that case, they'd be wasting a lot of resources.

blcknight · on Sept 12, 2013

Where do you get that this is using images?

Honey Docs doesn't actually explain what the callback looks like in the doc file, but it doesn't look like it has anything to do with images.

mkopinsky · on Sept 12, 2013

The HTML, DOC, and XLS files all have identical structure (though different content). They are all HTML, and Honey Docs is relying on Word/excel's parsing the HTML in those files to fetch the image (a 1px gif).

I downloaded the credit card Honeydoc. The content looks like:

    <html>Nicole  Davis  4556062729618215<br />
    Brian  Baker  4556767839126624<br />
    Patrick  Jones  4916615717158539<br />
    ....
    <br>
    <br>
    <br>
    ....
    <img src="https://honeydocs.herokuapp.com/img/html/202719bb5717d5621068780180abc593b0fedda692bd63727a510911d21fdcbf.gif">
    </html>

mkopinsky · on Sept 12, 2013

Oh, and FTR, Excel gives a warning before opening the file. So they at least have thought through this vector (if you want to call it that).

rgbrenner · on Sept 12, 2013

If all documents are showing as opened within 10 minutes, then surely it is just an anti-duplication automated agent at work.

It could perhaps be from generating a thumbnail... But dedup wouldn't work like that.. I would be very concerned about a dedup algo that requires interpreting the contents of a file, and dedup'ing based on that.

snowwrestler · on Sept 12, 2013

As I read it, the whole point of the "HoneyDoc" concept is that any access to the file generates a GET request. In other words it is specially crafted to ensure external resource retrieval.

Understanding the nature of the DropBox access would start with understanding how a "HoneyDoc" does what it claims it does.

BWStearns · on Sept 12, 2013

Dedup could just use hashing which wouldn't require opening the file with Libreoffice.

rgbrenner · on Sept 12, 2013

Most dedup algos do compare the data. If you just use the hash, there's a very small chance of a hash collision causing file corruption / data loss. And depending on how common that block is, the corruption could affect a large number of files.

Its a byte comparison.. so you still wouldn't use libreoffice to compare files.

0x0 · on Sept 12, 2013

Many XML parsers will resolve external entities by default.

dotmanish · on Sept 12, 2013

Could be this (thumbnails): https://www.dropbox.com/developers/core/docs#thumbnails

phaer · on Sept 12, 2013

LibreOffice is not necessarily a sign of a human involment in the process, as it comes with a commandline interface to convert documents between various formats. So it could be thumbnail generation as Guillaume86 suggested.

nonchalance · on Sept 12, 2013

If someone from honeydocs is reading this ...

The tracking behavior depends on a tracking pixel which may not always be processed by the client.

For example, with the credit_cards sample, the xls file is actually an HTML file with an img at the end (url linking to https://honeydocs.herokuapp.com/img/xls/...) and a client that only reads the plaintext (there are a boatload of command line utilities that fit the bill) won't fetch the image.

ryanackley · on Sept 12, 2013

Dropbox uses crocodoc for MS Office file previews in the browser as html and my guess is crocodoc's tech is based on a custom print driver for LibreOffice that converts it into html.

mrbill · on Sept 12, 2013

Dedupe (at least for NetApp systems) only cares about data blocks; it wouldn't "open" a document or parse contents.

https://communities.netapp.com/community/netapp-blogs/drdedu...

guiambros · on Sept 13, 2013

Coincidentally (or not), I just received the invite to beta test Sync.com [1] today. Seems a Dropbox-clone, for the privacy conscious user. They claim that all files are encrypted, and they don't have access to the keys. The encryption algorithm is still private, but they say they'll open source it soon.

While I like the approach a lot more than Dropbox (that fights to obfuscate its own algorithm), I still don't feel safe. Anyone with access to the server could intercept your keys, and thus have access to your data.

TrueCrypt over some cloud-based solution is still the ideal option, but the lack of support for sparse images makes me hesitant.

EDIT: no affiliation with Sync.com (or Dropbox, for the matter). Just trying to find a decent cloud-based storage solution that fixes the exact problem exposed by the OP.

[1] http://www.sync.com/your-privacy

yk · on Sept 12, 2013

For further analysis, I would suggest embedding something nasty into a .doc. [1] Seriously, why would Dropbox execute code in arbitrary files; the only reason I can see is some virus scanner heuristic. So then they could spin up a new vm, load the file and diff the vm with a clean one. Or as others suggested, generate thumbnails; that, together with the 10 minute delay, would imply that they are running remote code on some batch processing machine. ( Where a lot of other files are up for graps.) Either way, it does smell somewhat.

[1] I am not sure how LibreOffice does handle active content and furthermore I am not sure if there is a way to generate a ping back from LibreOffice without some kind of active content embedded. But to me at least, it somewhat implies that Dropbox, or whoever, runs LibreOffice in a not maximally locked down configuration.

sdfjkl · on Sept 13, 2013

When you click on a .doc file in the Dropbox web interface, you get a preview of the file in PDF format. To do this, Dropbox must open and convert the file. LibreOffice is popular for this, as it can be run in a headless API mode, reads a wide range of files and can output PDF format. So this is what happens here.

The wisdom of executing "active" content embedded in such files is of course doubtful and something Dropbox should investigate. But if you want your files to be safe, you should instead use a service that encrypts them client side, which has the downside of losing the web interface that Dropbox offers (as this requires it to be able to access the decrypted files in order to serve them to you).

rexreed · on Sept 12, 2013

Posted this reply elsewhere, but SafeMonk encrypts your files before they hit your harddrive and keeps them encrypted in the Dropbox cloud. It's free for personal use: http://www.safemonk.com. Note: this is not my product, just using it after I saw it demo'd at a TechBreakfast.

hoopism · on Sept 12, 2013

In retrospect this was a very well done ad for HoneyDocs... I checked out the service and thought it was novel... wouldn't have looked if not for this.

The article is written in a such a way that they are saying a lot by playing dumb... so hard to say it's misleading... but I know few security people who'd write something up with this tone.

VuongN · on Sept 12, 2013

I think this is a great example of why we should ask question about cloud security & privacy. I've written down some thoughts about this: http://vuongnguyen.com/personal-business-cloud-security.html

-V.

four12 · on Sept 12, 2013

Yay Little Snitch...

http://imgur.com/FfbenAb

ValG · on Sept 12, 2013

Site is up and down. Quick Cache: http://webcache.googleusercontent.com/search?q=cache:http://...

jayd16 · on Sept 13, 2013

To me, the interesting part isn't that the file was read. What has me interested is that this is a clear attack vector.

Want some free EC2 time? Wrap your workload in a .doc and have Dropbox foot the bill.

jasonj79 · on Sept 13, 2013

https://crocodoc.com/customers/

Crocodoc is likely generating web previews of your documents.

Michael_Murray · on Sept 12, 2013

What was that article the other day about "Stealing Traction" from an established player in an adjunct space?

Well played, HoneyDocs... Well played.

gocard · on Sept 12, 2013

In case you were wondering, I descrambled the blanked out "png" files and the filenames were "jennymccarthy[01-04].png"

devx · on Sept 12, 2013

It's so annoying when Google completely opens up archive files in Drive, too. Why would they do that?!

whywhywhy5 · on Sept 12, 2013

I'm sure it's just the perfectly legal NSA browsing through your files. No need to worry.

ck2 · on Sept 12, 2013

It's probably just a MITM review by the NSA Flying Pig

http://www.techdirt.com/articles/20130910/10470024468/flying...

shmageggy · on Sept 12, 2013

google cached version:

http://webcache.googleusercontent.com/search?q=cache:www.wnc...

jlkinsel · on Sept 12, 2013

Time to write a little VBScript to port scan me some Dropbox servers...

atmosx · on Sept 12, 2013

I don't know how to do that in VB and I'm sooo proud of it! :-P

madaxe · on Sept 12, 2013

I would wager that they're opening it in order to generate a thumb or preview, or maybe for search indexing, and libreoffice is a good way to achieve this on linux - particularly if they're only opening it once, as they probably use the hash of the file.

We do exactly this on our eCommerce platform, before wanging stuff into s3 or glacier and just keeping a reference kicking around.

On the other hand, you have just discovered an information disclosure (host IPs) vulnerability in dropbox.

tptacek · on Sept 12, 2013

This seems unsafe; if I understand what this person has done, he'd essentially be coercing Dropbox's backend services to open arbitrary links on his behalf. That's a very dangerous capability to expose to adversaries.

milkshakes · on Sept 12, 2013

to be fair, it's possible that dropbox understands this and has taken steps to sandbox and isolate the process that does this fetching from the rest of their internal infrastructure. if this is done for the purposes of generating thumbnails/online previews, and the .doc includes external resources, what other choice do they have but to fetch it?

abstractbill · on Sept 12, 2013

The machine isn't the only thing at risk. Given this setup, it seems possible to use dropbox nodes to ddos an external target, just by uploading lots of documents, each containing lots of these links. It doesn't seem like they should be fetching external resources at all.

danielweber · on Sept 12, 2013

There are lots of services that generate traffic on your behalf. A very general rule is that you should have to send at least as many bytes as the service does, lest you become a DDOS multiplier.

I don't see a .doc file getting small enough to outsize a HTTP request inside of it, even if you used some funky compression, but I'm willing to hear otherwise.

One question would be if you could upload the document once and then somehow trigger a very tiny edit that causes them to rescan it.

abortz · on Sept 12, 2013

Hi everyone, this is Andrew from Dropbox.

We do use LibreOffice to render previews of Office documents for viewing in a browser, and have permitted external resource loading to make those previews as accurate as possible. While this could theoretically be used for DDoS, we haven’t seen any such behavior. However, just to be extra cautious we’ve temporarily disabled external resource loading while we explore alternatives.

danielweber · on Sept 13, 2013

As one part of your solution, I recommend restricting the machines that can make outbound requests to a certain pool, and then limit that pool's total bandwidth, throwing an alarm whenever the limit is hit.

It may be that you are big enough that even the limited bandwidth you need for normal operations is enough to take out smaller hosts, so you'd need to measure and monitor to see how well this works.

helium · on Sept 13, 2013

Hi Andrew, thanks for the explanation.

Could Dropbox perhaps let me disable this feature? I almost never use the web interface so I wouldn't miss it and I prefer that my documents are not opened after being synched.

abstractbill · on Sept 12, 2013

One question would be if you could upload the document once and then somehow trigger a very tiny edit that causes them to rescan it.

That does seem likely - dropbox tries to only upload diffs, when a file gets changed: https://www.dropbox.com/help/8/en

milkshakes · on Sept 12, 2013

again, it's not inconceivable that they understand this as well, and have some sort of rate limiting system in place. do you have a problem with google docs converting your office files?

werid · on Sept 12, 2013

There was a case awhile back where Google docs helped a guy rack up a huge AWS bill.

http://www.behind-the-enemy-lines.com/2012/04/google-attack-...

veidr · on Sept 13, 2013

Wow that is an interesting blog post, with a happy ending to boot (Amazon refunded the $1000+ in bandwidth fees, due to the accidental nature of the usage).

MiguelHudnandez · on Sept 12, 2013

> what other choice do they have but to fetch it?

They could not fetch it and have a little blank bit in the thumbnail.

Chances are they're using a library they didn't develop and did not think of the possibility of external resources being loaded.

Edit: The most secure way I can think to handle preview generation is to have a virtual machine firewalled from the internet that previews a single document and is then reverted.

_3u10 · on Sept 12, 2013

Docker would probably be better as you don't have the huge VM overhead and is naturally reverts to it's original state.

MiguelHudnandez · on Sept 12, 2013

I agree -- It's the same concept but much more efficient.

I'd set up a docker to accept a single HTTP post with the document, and to return the thumbnail. The docker can then be shut down and a new instance spun up to wait for the next document to process.

It might be wasteful to spin up a new docker for each instance, but it's the only way to prevent some exploit in LibreOffice[1] that might leak information somehow. A leak could be as terrible as embedding an entire document in the next thumbnail, or as simple as returning the wrong thumbnail (like from a previous request).

[1] LibreOffice was the user-agent that phoned home in the article.

Kudos · on Sept 12, 2013

It makes more sense to have it fetch them via a proxy.

_3u10 · on Sept 12, 2013

Fetching via a proxy really doesn't do much, all you lose is the originating IP of the machine, the rest of the vulnerability still works.

If you're thinking of egress filtering except for the proxy, you can just HTTP tunnel right through it.

MiguelHudnandez · on Sept 12, 2013

Also there's a possibility of processing embedded links which point at your internal network.

"How did this HTTP GET go through to my 'firewalled' PHPmyadmin site?"

You have to treat all user input as if it's toxic.

_3u10 · on Sept 12, 2013

Brilliant! I wonder what kind of error messages LibreOffice embeds when it can't fetch a resource... could map out the internal network pretty quickly if it has distinct error messages.

Also docx files are zip files which opens the possibility of a zipbomb. I wonder if LibreOffice has protection for zipbombs.

chc · on Sept 13, 2013

Don't you need recursive decompression to make a zip bomb go off?

_3u10 · on Sept 13, 2013

Kinda... for 42.zip the first layer alone is 68GB albeit the final layer is 4.5 PB.

It wouldn't surprise me one could engineer a docx bomb that would consume gigs of memory.

badman_ting · on Sept 12, 2013

That's what I was thinking too. The only thing is that this process is running on an AWS instance, so it would have to be on Dropbox's VPN or something to have any such access. Either serendipitously or intentionally, I hope these boxes don't have any connection to anything sensitive.

njharman · on Sept 12, 2013

>> what other choice do they have but to fetch it?

Firewall unexpected outbound connections on machines doing their processing.

UnoriginalGuy · on Sept 12, 2013

It is possible but DropBox doesn't exactly have a good record on security.

rpledge · on Sept 12, 2013

Or worse, if as other threads are speculating, libreoffice is being used to generate previews of the docs then an exploit in libreoffice could be used to get access to dropbox's backend

_3u10 · on Sept 12, 2013

Yup, it's massively unsafe, find a bug in LibreOffice, write exploit, gain control of the doc thumbnail servers, read everyone's newly submitted docs.

wildmXranat · on Sept 12, 2013

This x 10. If opening doc files is a planned feature, doing a request to these embedded URLs doesn't sound too good at all.

snowwrestler · on Sept 12, 2013

I'm more concerned about the concept of a document that can issue a GET request just by being opened. It sounds exactly like a phishing payload.

glitch003 · on Sept 12, 2013

Why? Person embeds image in the doc or html file http://2.bp.blogspot.com/-iarq5sjWDWc/TWRxt8nPegI/AAAAAAAAA_... and then when the document is opened, Word (or LibreOffice in this case) tries to pull down the image to display it. Nothing fancy.

badman_ting · on Sept 12, 2013

"Person puts some characters in the query string, and the web application reads it. Nothing fancy." I just described SQL injection.

fphhotchips · on Sept 12, 2013

SQL Injection isn't fancy. That's why it's such a bad vulnerability.

anonymouz · on Sept 12, 2013

It should at least ask if there are remote resources embedded. Like e-mail clients do.

Too · on Sept 14, 2013

Create a text file on your desktop, called hack.html, with the following content.

    <img src="https://news.ycombinator.com/y18.gif" />

Double click to open with your favorite browser.

tlogan · on Sept 12, 2013

I wonder if you know which storage provider does not do this. As far as I each of these storage providers offer preview (including embeded images) thus they do need to open arbitrary links.

joe_the_user · on Sept 13, 2013

Parent: I would wager that they're opening it in order to generate a thumb or preview, or maybe for search indexing

Article: Uploaded Documents to Dropbox Personal Account with Private Folders

So drop box indexes and creates thumbnails for private documents? This is because the NSA gives a bounty for friendly UIs, perhaps?

skeletonjelly · on Sept 13, 2013

What's wrong with thumbnails for private documents? Thumbnails are thumbnails.

altrego99 · on Sept 12, 2013

Why would they open the .doc and allow it to run whatever embedded macro which does the job of calling home?