Anyway, if the researchers are not blinded there are many possible sources of errors.
Perhaps they do the first test in the morning, the sound just before lunch and the second test in the afternoon is made by another person that is more/less friendly to the rats, or the rats has the stomach more full/empty.
After changing a program and running benchmark, I sometimes run it again if the new program is not faster as I expected. I even gave a second chance to deterministic test, that is as useful as it sounds. It's possible that if the rat does not collaborate the researchers hit's the equivalent of Ctr-F5 just to be sure.
It's hard to be 100% neutral, so a method is to not know to ensure all rats have exactly the same test conditions.