I tested it, communicating with a process through shared memory vs communicating with a thread via queue has an overhead. Like 1m eps vs 10m eps. It might be due to implementation.
The implementations of whatever queues or buffers were used for communication must have been different.
There is absolutely no difference between memory pages that are shared by multiple processes and memory pages that are private to a process.
If you use the same implementation of a buffer/message queue or whatever other data structure you want to use for communication, it does not matter whether it is located in private memory or in shared memory.
Similarly, there is no difference between threads that belong to the same process and threads that belong to different processes, except that the threads that belong to the same process share all their memory, not only a part of it.
Nevertheless, on modern CPUs measuring the IPC performance may sometimes be misleading, because the benchmark results can be altered randomly by the thread scheduler of the OS, because the IPC performance may differ depending on the pair of CPU cores where the threads happened to be located during the benchmark.
For reproducible benchmark results, regardless whether threads from the same process or from different processes are tested, the threads must be pinned to some cores, which must be the same when you measure communication inside a process or between processes.
Otherwise the results can be quite different depending on what kind of cache memories are shared between the measured cores or on their position on the chip on the communication network or ring that connects the cores.