Summary: memchr is faster, but the obvious implement seems to beat the builtin versions.
There are two related C functions for finding the next character in a string - strchr
which assumes the string has a NUL character at the end, and memchr
which takes the string length as an argument. For strings where you have the size and a NUL terminator, which is fastest? Using gcc
6.2.0 64bit MSYS2 on Windows 10, searching for a single byte 10M bytes along a string, the times were (fastest to slowest):
- 11.05ms
memchr
implemented the obvious way. - 14.82ms
strchr
implemented the obvious way. - 14.96ms
memchr
provided by GCC. - 19.63ms
strchr
provided by GCC.
Trying on 3 different Windows computers, the results are all similar (but scaled).
Given the choice, you should prefer memchr
over strchr
.
Surprise result
The optimised implementations shipped with GCC are slower than the obvious C implementations taken from a wiki. I have absolutely no idea why. From what I can tell, the builtin versions are coded in assembly, operating on multiple bytes at a time, using SSE instructions. In contrast, the C variants operate on a single byte at a time, and aren't vectorised by the optimiser according to Godbolt. If anyone has an explanation I'd be keen to hear it.
Benchmark Code
To benchmark the variants I wrote a Haskell program using criterion
. The full code and build instructions are available in this gist. I compiled the C code with -O3
, using the gcc shipped with GHC 8.2.1. I've reproduced the Haskell code below, with some comments:
-- Import all the necessary pieces
import qualified Data.ByteString as BS
import qualified Data.ByteString.Unsafe as BS
import Criterion.Main
import Foreign
import Foreign.C.Types
import Data.Monoid
-- Make all the C imports
foreign import ccall unsafe "string.h memchr" memchr_std :: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe "string.h strchr" strchr_std :: Ptr Word8 -> CInt -> IO (Ptr Word8)
foreign import ccall unsafe memchr_c :: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe strchr_c :: Ptr Word8 -> CInt -> IO (Ptr Word8)
-- Method for ignoring the size when using strchr
ignoreSize f a b _ = f a b
-- Build a suitable string with an interesting character i bytes along
cstr i = BS.replicate i 32 <> BS.singleton 64 <> BS.replicate i 32 <> BS.singleton 0
-- The functions to benchmark
funs =
[("memchr_std", memchr_std)
,("strchr_std", ignoreSize strchr_std)
,("memchr_c", memchr_c)
,("strchr_c", ignoreSize strchr_c)]
-- The main function, using Criterion
main = defaultMain
[ seq bs $ bench (show i ++ " " ++ name) $ whnfIO $ test fun bs
| i <- [1,10,100,1000,10000,100000,1000000,10000000]
, let bs = cstr i
, (name, fun) <- funs]
-- The function under test and input string
{-# NOINLINE test #-}
test fun bs =
BS.unsafeUseAsCStringLen bs $ \(ptr,len) ->
fun (castPtr ptr) 64 (fromIntegral len)