Why would the developers be so stupid as to make native array indexing slower than a protracted pointer construction ? It is an old claim that pointer indexing is faster, and this may have been true long ago, but in my own benchmarks I always found the native approach a bit faster... but we are talking almost unmeasurable quantitities, +/-5% maybe. As said above, with the only exception of multidimensional arrays.
First of all "using pointer" can refer to many different things. In my previous post, I referred to a pointer in a linked list. That is different from replacing the loop index by a pointer.
Replacing the loop index by a pointer (to the list) may or may not gain speed, and if it does it may gain just very little, and it may in future fpc versions even loose speed.
But speed differences between list-index and list-pointer have nothing to do with "the developers making it so" or "being stupid". It would simple mean that they hadn't have time to implement certain optimizations.
Unoptimized access to the loop looks approx like this:
// i starts at 0
- i := i +1
- element from loop := memory at "list_start_address + sizeof(element) * i"
// Note that some cpu may optimize this multiplication, and it may not cost (measurable) time at all
Optimized access may look like this (depends on the cpu):
// p start at list_start_address
- p := p + sizeof(element)
- element from loop := memory at "p"
You can see the optimize version has less math to do in the loop.
I do not know how far fpc currently takes that optimization.
I also do not know how well modern CPU (with prediction, and all their tricks) compensate for the math.
--------------------------------------------------------------
A completely different approach is a linked list.
A linked list can be maintained that already is filtered. This saves time of filtering.
The code
if Actors.State[i] = stat_used then
Means that each object in the list (since the class data is not stored in the list own memory, but accessed by pointer) needs to be fetched from memory.
If there are many objects, they are likely not in the CPU cache. And fetching them becomes an expensive operation.
As I said: the link list is already filtered. No need to fetch any objects, just to find out that they can be skipped. If the percentage of stat_used objects is small, then this saves a lot of time.
And as a side effect, the linked list pointer is part of the object. So there is no separate memory for a list that needs to be in the CPU cache.
In fact, if the object itself is cleverly designed, each cache line read burst of the cpu can fetch more of the data that the loop needs, and you gain speed here too.
And one more, using a linked list, there is on index (nor a pointer to the list). Only the current object.
This saves one variable. Therefore it saves one cpu register, that can be used for something else.