You’re going to have to explain that—what is an attack vector for image manipulation? What is an attack vector for false positives?
> Assume we had this perfect hash knowledge.
It’s not a perfect hash. Nobody’s saying it’s a perfect hash. It’s not. It’s a perceptual hash. It is specifically designed to map similar images to similar hashes, for the “right” notion of similar.
It’s an attack vector because, either directly or indirectly, people will get access to the hash function. Therefore, they will have a testbed to launch attacks against the hash. Once they learn an attack, they will hash your harmless pictures into being classified as CP by applying a potentially perceptually invisible modification to your images. Imagine every image on reddit’s frontpage being classified as CP. The distribution of natural images is irrelevant as long as people can modify images arbitrarily.
The hashing isn’t really so relevant apart from pigeonhole arguments. It’s a machine learning problem of classification between CP and not, and hashing is an implementation detail. The way I would attack this reading a few papers would be to approximate any non-differentiable parts of the hashing with a smooth proxy function, then use an off-the-shelf gradient-based attack such as Fast Sign Gradient Method. The hashing guarantees that even at hashing distance 0, you have a huge amount of collisions, so that is blind spot 1. Blind spot 2 is the CNN is not-robust to mild-perturbations, so you can “squeeze” inputs together in hash space by modifying them. You can likely attack both simultaneously by doing the attack I said above.
> Assume we had this perfect hash knowledge.
It’s not a perfect hash. Nobody’s saying it’s a perfect hash. It’s not. It’s a perceptual hash. It is specifically designed to map similar images to similar hashes, for the “right” notion of similar.