Hacker News new | past | comments | ask | show | jobs | submit login

I would not have though to use this formula. The sum of the integers grows as the square of n so if n is anything but small you will overflow and get an erroneous answer.

This interview question has everything bad imo: no practical uses, test knowledge of math formulas (which every math/CS major would have but none of the self learning folks). It’s very easy to stress and fail when given such a test, and even already knowing I would have tried something else because of the overflow.




Let’s not overthink this. Microsoft is using this test as a low-pass filter to get rid of candidates who would:

- not find any answer in a few minutes

- find a wrong answer and argue about it in an obnoxious way

Any reasonable answer would probably do.

You would probably get more brownie points for asking what you need to optimize for (compute, storage) than providing what seems to be the best solution (using the sum trick).


Exactly. It’s basically fizzbuzz. The fact that he thought about this in advance and had an answer ready to explain is also a green flag for such an early filter.


Yes, it's a question with a silly trick. Completely useless if one knows the answer, hard to spot if the person feigns struggle.

Although, n•(n+1)/2 formula is not necessary. One can start with an xor sum and find the duplicate by xor adding elements again. This is another silly trick.


That’s a very elegant solution. Hard looking problems that had simple, trick solutions were called Jewish Problems at the entrance of some universities.

This would let the examiner fail a student for subjective reasons instead of academic success. The examiner would then be able to say « I failed them because of their skills, not because of their religions, look how easy the solution is ».

Perhaps this is what we are reproducing as an industry with all our convoluted interview processes, and it may be a decorum to choose the candidates we want instead of using objective criteria.


The explanation of the name somehow is a lot better than what I assumed it was referring to. The phrasing on the second sentence is... unfortunate.


I’m referring to the easy looking, tricky to solve problems. I’ve heard them referred to by that term because they have been used at Moscow University to discriminate against Jewish students but I don’t know if they have other names.

There is some discussions and examples of such problems at https://arxiv.org/pdf/1110.1556.pdf and http://3038.org/press/shen.pdf.


Of course, just the phrasing made me think the "trick solution" was eugenics, not a university admission trick. Unfortunately, historically "Jewish question" or "Jewish problem" has referred to these concerns [0]. Of course, following your first link it seems like the native phrase for it is in no way better, but that seems predictable as it's stemming from a racist premise in the first place.

[0] https://en.wikipedia.org/wiki/Jewish_question


It's interesting that doing competitive math makes it really easy to spot these kinds of problems and guess at their "true" difficulty. Honestly, I'm kind of surprised that "this problem is short" or "the solution is short" would ever fly as a measure of how easy a problem is, but I assume that a system that is intended to encode racism doesn't need a whole lot of logic behind it…


I think you missed the point of the second sentence.

It's like saying the phrasing on the Chinese Exclusion Act is unfortunate. It's not the phrasing that's unfortunate, but the history of excluding Chinese. These questions were designed to allow screeners to discriminate, and were named after the group designed to be kept out.

edit: If you edit a comment in response to a response, it's polite to say [edited]. Otherwise, the conversation is a non-sequitur, which seems to be happening here a bit.


I didn't edit my comment, at least not from what I remember. May have been a typo fix in the first few seconds though, which I do a bit too often.

Not sure if my comment was clear, but I was referring to "hard problems with trick solutions were called Jewish Problems", which sounds like it's referring to the Final Solution.

Reminds me of a funny anecdote, I was staying with a friend of mine in Berlin and asked her what the second most common religion was in Germany. She casually said "used to be Jewish, but not sure now..." She was of course referring to the influx of refugees from Africa, but for a very brief moment it looked like her life flashed in front of her eyes.


Could also just use a hashtable, the keys are the array elements and the items are the occurrence of each element, then return the one with an occurrence of 2.


Or just a hashset and return when about to insert an existing element.


What is the closed form solution of the xor-sum though?


You already have to spend the time to go through the list. Just add up the xor as you iterate through the list.


s(n) =

• n if n mod 4 = 0,

• 1 if n mod 4 = 1,

• n + 1 if n mod 4 = 2,

• 0 if n mod 4 = 3.

Or alternatively:

• s(4n) = 4n

• s(4n + 1) = 1

• s(4n + 2) = 4n + 3

• s(4n + 3) = 0

Proof by induction:

• s(4n + 1) = s(4n) ⊕ (4n + 1) = 4n ⊕ (4n + 1) = 1

• s(4n + 2) = s(4n + 1) ⊕ (4n + 2) = 1 ⊕ (4n + 2) = 4n + 3

• s(4n + 3) = s(4n + 2) ⊕ (4n + 3) = (4n + 3) ⊕ (4n + 3) = 0

• s(4n + 4) = s(4n + 3) ⊕ (4n + 4) = 0 ⊕ (4n + 4) = 4n + 4


> test knowledge of math formulas (which every math/CS major would have but none of the self learning folks)

I mean they might. Its a very famous formula, with a famous story attached, which is often covered in high school level math.

Regardless its a stupid question.


> I would not have though to use this formula. The sum of the integers grows as the square of n so if n is anything but small you will overflow and get an erroneous answer.

As long as you can get wraparound semantics, the overflow is actually unproblematic. (n(n+1)/2 + k) mod 2^32 - (n(n+1)/2 mod 2^32) = k mod 2^32 = k.


Even with wraparound semantics, you can't compute n(n+1)/2 naively, because (n(n+1) mod N) / 2 ≠ n(n+1)/2 mod N.

So actually not knowing the formula is kinda advantageous, because computing the sum as 1 + 2 + … + n (with every addition mod N) gives you the right answer (mod N).


You could delay division by 2 until the end. I.e ((2 * S mod N - n*(n+1) mod N) mod N) / 2

Where S is the sum of the array achieved through iterative addition with every addition mod N and N > 2n.


Or you can assume N is less than 2^31, and clear out the top bit in the final answer (because that's the only place the wrong values can come from). Which is admittedly pretty hackish, but…

Edit: The simplest solution is probably just doing (n/2)*(n+1) (assuming n is even; move the division for n odd).


Indeed, somehow this went completely over my head.


True, but that doesn't matter at all given reasonable values of n. The number is in the range of n^2, but the algorithm is in the range O(1). Question is not that bad, my first thought was to use hash table to store what you see, then return when duplicate occurs. Don't like hash table fine, set an array of size n storing TRUE when you see the number. Notice it twice, then you've found the duplicate. No need for math, O(n) solution.

Honestly if you're stressing over a question like this, combined with thinking that integer overflow is even remotely something that you need to be concerned about in a question like this, you're definitely not passing any leetcode interview problems tossed around in today's interviewing culture.


Every solution is at least O(n).


Wrong, calculating n(n-1)/2 is O(1)


Finding the length of the array is O(n), though.


depends on how its referenced I suppose


Using 64 bit integers, we can store the square of any 32 bit integer. Not that small...


Also it was 2004 and 64 bit machines weren't common. If I remember correctly you had only 2 GB available under Windows XP and if you wanted more, 3 GB to be more precise, you had to boot with a special parameter.


32-bit machines generally 64-bit arithmetic of some sort; for example "long long" in C is mandated by the standard to be of a width greater than or equal to 64 bits.


Yeah, 64 bit integers were available under gcc on Linux, but the memory issue remained. The array had to fit in 2 GB, unless of course the problem stated otherwise, for example the numbers were read from a file.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: