Almost the same as my SWAR version - which is what you're doing.
But aren't you reading off the end of the buffer in your memcpy(&w...)? Say with an empty input string whose start address is aligned to sizeof(size_t) bytes?
I just passed in the string length since the caller had that info, otherwise you'd scan the whole string again looking for the zero terminator.
> But aren't you reading off the end of the buffer in your memcpy(&w...)?
If we go by the absolute strictest interpretation of the C standard my above implementation is UB.
But in practice, if p is word-aligned and is at least valid for 1 byte, then you will not pagefault for reading a whole word. In fact, this is how GCC/musl implement strlen itself.
> Say with an empty input string whose start address is aligned to sizeof(size_t) bytes?
Then the start address is valid (it must contain the null byte), and aligned to a word boundary, in which case I assume it is ok to also read a whole word there.
But aren't you reading off the end of the buffer in your memcpy(&w...)? Say with an empty input string whose start address is aligned to sizeof(size_t) bytes?
I just passed in the string length since the caller had that info, otherwise you'd scan the whole string again looking for the zero terminator.