Thank you !
I'm a bit disappointed; based on tests under windows I had expected to come a bit closer to C performance.
I don't think this is related to mtprocs, which I only used because it gave me a simple way to try different thread scheduing. Apart from that, the performance with mtprocs is exactly the same as when using fpc's basic thread support instead.
A few days ago, Akira1346 managed to bring fpc to the top of score in the Binary Trees benchmark, using the PasMP multiprocessing library. I might try it, but doubt it would help for fannkuch, which merely runs a few static threads over the entire program runtime.
So I guess the reason for the lower performance is ultimately fpc's code generation / optimization as such. Probably a bit more on the conservative side than C.