* * *

Author Topic: AVX and SSE support question  (Read 49102 times)

CuriousKit

  • Jr. Member
  • **
  • Posts: 75
Re: AVX and SSE support question
« Reply #180 on: February 09, 2018, 06:53:43 pm »
I developed a new feature for FPC that might help you with this endeavour. Still undergoing testing though before it makes it into the 3.1.1 build... "vectorcall".

https://bugs.freepascal.org/view.php?id=32781

Note that I also fixed the System V ABI to use the SSE registers properly, so the code that passes the result into the low half of XMM0 and XMM1 will have to be reworked a bit.
« Last Edit: February 09, 2018, 07:11:26 pm by CuriousKit »

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #181 on: February 14, 2018, 10:30:04 pm »
Ok, CuriousKit  I am impressed  :)

Will update my trunk and see what happens if I ifdef a few VectorCalls into the code in Linux64 and also if doing nothing at all breaks the existing code.

Removing the movhlps xmm0,  xmm1 can only be a good thing if the result is passed back in xmm0
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #182 on: February 15, 2018, 07:17:25 am »
Good news and bad news I'm afraid CuriousKit

Good news is, that our code base still works fine as is with trunk which has your patches applied.

Bad news is that at least in *nix64 vectorcall causes an internal error on the following bit of test code I was trying when evaluating the patch.

Code: Pascal  [Select]
  1. class operator TGLZVector4f.+(constref A, B: TGLZVector4f): TGLZVector4f; {$ifdef USE_VECTORCALL} vectorcall;{$else}register;{$endif} assembler; nostackframe;
  2. asm
  3.   {$ifndef USE_VECTORCALL}
  4.   movaps  xmm0, [A]
  5.   movaps  xmm1, [B]
  6.   {$endif}
  7.   addps   xmm0, xmm1
  8.   {$ifndef USE_VECTORCALL}
  9.   movhlps xmm1, xmm0
  10.   {$endif}
  11. end;        
  12.  

I did notice that none of the tests in your patch test for methods of the record, but are just using the record as a parameter.

Note as I could not get it to compile the code above is probably not going to work as I was just trying to find out where the new calling convention was placing the parameters in the registers.
« Last Edit: February 15, 2018, 07:28:16 am by dicepd »
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #183 on: February 15, 2018, 09:30:17 am »
More issues with this patch.

It seems that it is not passing Self in RDI in *nix64 anymore.

I can no longer see register allocation or parameter allocation  in the .s assembler file which is hampering trying to work out what is going on.

It seems to be a little more broken than first thought :-\

ok it would seem that the RDI bug which did not show itself first time is down to Self getting out of alignment. 

I changed  movaps [RDI], xmm0  to movups [RDI], xmm0  and tests all worked again. I will dig some more as to why this might be happening though still hampered by lack of info in .s file

A suggested test for the above

Code: Pascal  [Select]
  1. MyXMM.Create(V1,V2)
  2. begin
  3.  Self := V1 + V2;
  4. end;
  5.  

Or something along those lines depending how you want to declare things.
« Last Edit: February 15, 2018, 10:05:44 am by dicepd »
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

Thaddy

  • Hero Member
  • *****
  • Posts: 7130
Re: AVX and SSE support question
« Reply #184 on: February 15, 2018, 09:42:32 am »
I can reproduce that. It seems you are not conservative enough with registers? But it is a great effort.
Note that as it is now it is also hard to port.
« Last Edit: February 15, 2018, 09:45:16 am by Thaddy »
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #185 on: February 15, 2018, 10:35:32 am »
@Thaddy

I presume when you mean hard to port you are thinking of ARM and neon?
If I could get trunk to compile aarch64 on my raspi3 directly (not Xplatform) I would be looking at neon versions of what we are doing,
« Last Edit: February 15, 2018, 11:32:56 am by dicepd »
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #186 on: February 15, 2018, 11:41:30 am »
Quote
Note that I also fixed the System V ABI to use the SSE registers properly, so the code that passes the result into the low half of XMM0 and XMM1 will have to be reworked a bit.

Ok I have tried to force this behaviour without using vectorcall with no luck. It would seem that the original calling convention still applies even though you have made changes to the unix ABI.
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

CuriousKit

  • Jr. Member
  • **
  • Posts: 75
Re: AVX and SSE support question
« Reply #187 on: February 16, 2018, 11:55:50 am »
It will only pass parameters into the full XMM0 register etc if they are aligned to 16-byte boundaries.

What internal error are you getting? I'll see if I can track it down.  "vectorcall" should be ignored on *nix64.
« Last Edit: February 16, 2018, 12:00:02 pm by CuriousKit »

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #188 on: February 16, 2018, 12:42:08 pm »
Quote
What internal error are you getting? I'll see if I can track it down.  "vectorcall" should be ignored on *nix64.

Not a lot of help from the message itself I am afraid. The popup says it is a scanner message but I do not know how it determined that.

Code: Pascal  [Select]
  1. vectormath_vector4f_unix64_sse_imp.inc(4,1) Error: Compilation raised exception internally
  2.  

Quote
It will only pass parameters into the full XMM0 register etc if they are aligned to 16-byte boundaries.

Alignment seems to be an issue see the self bug above. Do we have to use the align 16 after the record as in the example now?

« Last Edit: February 16, 2018, 01:13:27 pm by dicepd »
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #189 on: February 16, 2018, 01:38:47 pm »
OK CuriousKit,

I just ran all the unit test under Linux (even commenting out the $ifdef win64 so it ran the whole tests) and everything passes your unit tests.
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

CuriousKit

  • Jr. Member
  • **
  • Posts: 75
Re: AVX and SSE support question
« Reply #190 on: February 16, 2018, 01:43:45 pm »
Interesting.  And yes, the records have to be aligned to 16-byte boundaries, either with "align 16" if that's been implemented, or using {CODEALIGN RECORDMIN=16}{PACKRECORDS C} (see the code examples).  This is a design choice that is also used in C++, because MOVAPS etc. is several cycles faster than MOVUPS.

An internal error or exception is automatically a bug even if you have the most garbled code in the universe.  I don't have a Linux 64 system to test the compiler unfortunately, although I wonder what the exception is (I also wonder if there's a way to update the error messages to actually say what the exception is).

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #191 on: February 16, 2018, 02:27:27 pm »
All the code was previously aligned as we use aps variant throughout. Somehow with this patch it is ignoring the alignment in unix.

Also most of our calling conventions are using constref so there is no copy to the stack, which seems to be happening in the assembler generated for your unit tests.

Code: Pascal  [Select]
  1. .section .text.n_p$vectorcall_hva_test2_$$_plus$tm128$tm128$$tm128
  2.         .balign 16,0x90
  3. .globl  P$VECTORCALL_HVA_TEST2_$$_plus$TM128$TM128$$TM128
  4.         .type   P$VECTORCALL_HVA_TEST2_$$_plus$TM128$TM128$$TM128,@function
  5. P$VECTORCALL_HVA_TEST2_$$_plus$TM128$TM128$$TM128:
  6. .Lc1:
  7. # Temps allocated between rbp-72 and rbp-52
  8.         # Register rbp allocated
  9. .Ll1:
  10. # [vectorcall_hva_test1.pas]
  11. # [26] begin
  12.         pushq   %rbp
  13. .Lc3:
  14. .Lc4:
  15.         movq    %rsp,%rbp
  16. .Lc5:
  17.         leaq    -80(%rsp),%rsp
  18. # Temp -72,16 allocated
  19. # Temp -16,16 allocated
  20. # Var X located at rbp-16, size=OS_128
  21. # Temp -32,16 allocated
  22. # Var Y located at rbp-32, size=OS_128
  23. # Temp -48,16 allocated
  24. # Var $result located at rbp-48, size=OS_128
  25. # Temp -52,4 allocated
  26. # Var I located at rbp-52, size=OS_S32
  27.         # Register xmm0,xmm1 allocated
  28.         movdqa  %xmm0,-16(%rbp)
  29.         # Register xmm0 released
  30.         movdqa  %xmm1,-32(%rbp)
  31.         # Register xmm1 released
  32. .Ll2:
  33.  
  34.  
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

CuriousKit

  • Jr. Member
  • **
  • Posts: 75
Re: AVX and SSE support question
« Reply #192 on: February 16, 2018, 03:32:21 pm »
Hmmm, looks like I have a way to go before this addition is correct.  I noticed a stack realignment in one of my tests (and made a comment about it), but that happened under Windows, not *nix.  Sorry that it's not quite going to plan.  I'll try to get a computer rigged up to use Linux in the future so I can test these issues more thoroughly.

Can you submit a bug report with a reproducible example of incorrect functionality with alignment and the internal exception?
« Last Edit: February 16, 2018, 03:40:18 pm by CuriousKit »

CuriousKit

  • Jr. Member
  • **
  • Posts: 75
Re: AVX and SSE support question
« Reply #193 on: February 16, 2018, 04:22:34 pm »
That disassembly you posted... is that on Linux?

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #194 on: February 16, 2018, 04:26:47 pm »
Yes that is a Linux
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus