Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very likely, this was not done intentionally.

I think we can simply imagine a common scenario: some employee working for Company X, developing a compiler suite, and adding necessary optimizations for Company X's processors. Meanwhile, Company Y's processors don't get as much focus (perhaps due to the employee not knowing about Company Y's CPUIDs, supported optimizations for different models, etc.). Thus, Company Y's processors don't run as quickly with this particular library.

Why does this have to be malicious intent? Surely it's not surprising to you that Company X's software executes quicker on Company X's processors: I should hope that it does! The same would hold true if Company Y were to develop a compiler; unique features of their processors (and perhaps not Company X's) should be used to their fullest extent.



No, this was definitely intentional. Intel is doing extra work to gate features on the manufacturer ID when there are feature bits which exist specifically to signal support for those features (and these bits were defined by Intel themselves!).

If they had fixed the issue shortly after it was publicly disclosed it might have been unintentional, but this issue has been notorious for over a decade and they still refuse to remove the unnecessary checks. They know what they're doing.


The thing is: the bits to check for SSE, SSE2, ..., AVX, AVX2, AVX-512? They're in the same spot on Intel and AMD CPUs. So you don't need to switch based on manufacturer. The fact that they force a `GenuineIntel` check makes it seems malicious to many.


All browsers pretend to be MSIE (and all compilers pretend to be GCC). You'd think AMD would make it trivial to change the vendor ID string to GenuineIntel for "compatibility".


Thats not how these CPUs work.

The CPUID instruction allows software to query the CPU on if an instruction set is supported. Code emitted by Intel's compiler would only query if the instruction set exists if the CPU is from Intel, instead of just always detecting.

AMD can choose to to implement (or not) any instruction set that Intel specifies, and Intel can choose to implement (or not) any instruction set AMD specifies, however, it would in 100% of cases be wrong to check who made the CPU instead of checking the implemented instruction set. AMD implements MMX, SSE1-4, AVX1 and 2. Any software compatible with these must work on AMD CPUs that also implement these instructions.

If AMD ever chooses to sue Intel over this (likely as a Sherman Act violation, same as the 2005 case), a court would likely side with AMD due to the aforementioned previous case: Intel has an established history of violating the law to further its own business interests.


I’m with you generally, but having written some code targeting these instructions from a disinterested third-party perspective, there are big enough differences in some instructions in performance or even behavior that can sincerely drive you to inspect the particular CPU model and not just the cpuid bits offered.

Off the top of my head, SSSE3 has a very flexible instruction to permute the 16 bytes of one xmm register at byte granularity using each byte of another xmm register to control the permutation. On many chips this is extremely cheap (eg 1 cycle) and its flexibility suggests certain algorithms that completely tank performance on other machines, eg old mobile x86 chips where it runs in microcode and takes dozens or maybe even hundreds of cycles to retire. There the best solution is to use a sequence of instructions instead of that single permute instruction, often only two or three depending on what you’re up to. And you could certainly just use that replacement sequence everywhere, but if you want the best performance _everywhere_, you need to not only look for that SSSE3 bit but also somehow decide if that permute is fast so you can use it when it is.

Much more seriously, Intel and AMD’s instructions sometimes behave differently, within specification. The approximate reciprocal and reciprocal square root instructions are specified loosely enough that they can deliver significantly different results, to the point where an algorithm tuned on Intel to function perfectly might have some intermediate value from one of these approximate instructions end up with a slightly different value on AMD, and before you know it you end up with a number slightly less than zero where you expect zero, a NaN, square root of a negative number, etc. And this sort of slight variation can easily lead to a user-visible bug, a crash, or even an exploitable bug, like a buffer under/overflow. Even exhaustively tested code can fail if it runs on a chip that’s not what you exhaustively tested on. Again, you might just decide to not use these loosely-specified instructions (which I entirely support) but if you’re shooting for the absolute maximum performance, you’ll find yourself tuning the constants of your algorithms up or down a few ulps depending on the particular CPU manufacturer or model.

I’ve even discovered problems when using the high-level C intrinsics that correspond to these instructions across CPUs from the same manufacturer (Intel). AVX512 provided new versions of these approximations with increased precision, the instruction variants with a “14” in their mnemonic. If using intrinsics, instruction selection is up to your compiler, and you might find compiling a piece of code targeting AVX2 picks the old low precision version, while the compiler helpfully picks the new increased-precision instructions when targeting AVX-512. This leads to the same sorts of problems described in the previous paragraph.

I really wish you could just read cpuid, and for the most part you’re right that it’s the best practice, but for absolutely maximum performance from this sort of code, sometimes you need more information, both for speed and safety. I know this was long-winded, and again, I entirely understand your argument and almost totally agree, but it’s not 100%, more like 100-epsilon%, where that epsilon itself is sadly manufacturer-dependent.

(I have never worked for Intel or AMD. I have been both delighted and disappointed by chips from both of them.)


I don't think you read the article. Go read it first before you make your hypothesis. If it was as easy to fix as using a environment variable (which no longer works) then it was done intentionally.


https://news.ycombinator.com/newsguidelines.html

> Please don't comment on whether someone read an article.


OK


I don't think the fact that it can be enabled/disabled by environmental variable indicates malicious intent. It could be as simple as that Intel doesn't care to test there compiler optimizations on competitors' CPU's. If have to distribute two types of binaries (one which were optimized but could break, vs un-optimized and unlikely to break), I would default over to distributing the un-optimized version. Slow is better than broken.

I understand some end users may not be able to re-compile the application for there machines, but I wouldn't say its Intel's fault, but rather the distributors of that particular application. For example, if AMD users want Solidworks to run faster on their system, they should ask Dassault Systemes for AMD-optimized binaries, not the upstream compiler developers!

Anyways, for those compiling their own code, why would anyone expect an Intel compiler to produce equally optimized code for an AMD cpu? Just use gcc/clang or whatever AMD recommends.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: