Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retroarch stops working when I try to run Chou Jikuu Yousai Macross - Ai Oboete Imasu ka from PS1 using PCSX from retroarch #868

Open
geysonazevedo27 opened this issue Jan 14, 2025 · 16 comments
Labels
Lightrec Issue specific or affecting lightrec (but not interpreter)

Comments

@geysonazevedo27
Copy link

PCSX-ReARMed Version

r24I c5d1f1d

Your device

PC

Operating System of your device

Windows

CPU architecture

x86-64 (64bit Intel, AMD, etc.)

Issue description

When I go to play Chou Jikuu Yousai Macross - Ai Oboete Imasu ka from PS1 using retroarch pcsx, retroarch stops working

Step-by-step reproduction and logs

This happens after I run the game, the game screen appears for a few seconds, then retroarch stops working, could you fix this problem please?

@notaz notaz added the Lightrec Issue specific or affecting lightrec (but not interpreter) label Jan 14, 2025
@notaz
Copy link
Collaborator

notaz commented Jan 14, 2025

@pcercuei another one for you, SLPS02005. This one is harder as it only crashes on Windows, in generated code it seems.

@pcercuei
Copy link

I don't have a Windows PC to try with.

But could you try adding this commit? pcercuei/lightrec@9572b6e

@notaz
Copy link
Collaborator

notaz commented Jan 14, 2025

Doesn't help, sadly.

You can use wine-staging for testing, RetroArch first complains about dx11 but on retry it falls back to gl automatically and works. Also you need several tries to reproduce the crash as sometimes it seems the game just works under wine.

@geysonazevedo27
Copy link
Author

@pcercuei Will it take long to fix this problem?

@pcercuei
Copy link

It will take long to debug it. Having a savestate that reliably triggers the bug would help.

@geysonazevedo27
Copy link
Author

Thank You @pcercuei

@notaz
Copy link
Collaborator

notaz commented Jan 16, 2025

From some more testing under wine it works if I run retroarch through the command line (wine retroarch.exe -L cores/pcsx_rearmed_libretro.dll game.chd), but consistently crashes if I run retroarch without any args and select the game from the default menu. On real windows it just crashes regardless. It crashes right after finishing booting the BIOS (or almost instantly in HLE mode) so I don't think a savestate would help here.

@pcercuei
Copy link

You commented out a bunch of MAP_FIXED_NOREPLACE in libpcsxcore/lightrec/mem.c, are you sure that's safe?

Could you restore the fprintf() in lightrec_plugin_init()?

@notaz
Copy link
Collaborator

notaz commented Jan 16, 2025

For this case it doesn't matter since windows has LIGHTREC_CUSTOM_MAP=0. In general if that mapping is taken lighrec init failed and the emu was unusable anyway. After my change it logs a message and tries to continue, and it becomes possible to use things like -fsanitize=address. I have no idea if it's safe but I tried random addresses there and things still seemed to work.

fprintf under wine, crashing case:

M=0x53a0000, P=0x55a0000, R=0x59c0000, H=0x5180000

working case:

M=0x4f40000, P=0x5140000, R=0x5150000, H=0x4b20000

I guess when loading through the menu RetroArch allocs more stuff and things move around in heap.

@pcercuei
Copy link

Could you try: pcercuei@35df0bc

@notaz
Copy link
Collaborator

notaz commented Jan 16, 2025

Same crash with the usual

wine: Unhandled page fault on write access to 00000000071BA8C0 at address 0000000005FF37A1 (thread 0134), starting debugger...
...
0x00000005ff37a1: movl %esi, -0x5780(%r11)

@pcercuei
Copy link

Does running it with the interpreter work fine? LIGHTREC_INTERPRETER=1 ./pcsx

@notaz
Copy link
Collaborator

notaz commented Jan 17, 2025

Yes.

Maybe you could look for the repeating pattern after the crash to find the affected block (wine's disassembler's output below):

0x00000005ff374a: jmp 0x5fc1a28            // end of prev block?
0x00000005ff374f: addb %cl, %al
0x00000005ff3751: addb %al, (%rax)
0x00000005ff3753: addb %al, (%rax)
0x00000005ff3755: addb %al, (%rax)
0x00000005ff3757: addb %bh, 0x90000(%rdi)
0x00000005ff375d: leaq -0x5ab7(%rdi), %r11
0x00000005ff3764: movl $0x53a0000, %r14d   // within psx ram
0x00000005ff376a: leaq (%r11, %r14), %r10
0x00000005ff376e: xorq %rsi, %rsi
0x00000005ff3771: movq %rsi, %r9
0x00000005ff3774: movb %r9b, (%r10)
0x00000005ff3777: andq $0xfffffffffffffffc, %r11
0x00000005ff377b: addq %r15, %r11          // where does r15 come from?
0x00000005ff377e: movl %esi, 0x338(%r11)   // r15 derived value ok
0x00000005ff3785: leaq (%rdi, %r14), %r10
0x00000005ff3789: movq %rsi, %r9
0x00000005ff378c: movb %r9b, -0x5ab8(%r10)
0x00000005ff3793: movq %rdi, %r11
0x00000005ff3796: shrq $0x22, %r11         // what's with this weird shifting?
0x00000005ff379a: shlq $0x22, %r11
0x00000005ff379e: addq %r15, %r11
0x00000005ff37a1: movl %esi, -0x5780(%r11) // <--- crash
0x00000005ff37a8: leaq (%rdi, %r14), %r10
0x00000005ff37ac: movl %esi, -0x5ad0(%r10)
0x00000005ff37b3: leaq (%rdi, %r15), %r11
0x00000005ff37b7: movl %esi, -0x5798(%r11)
0x00000005ff37be: leaq (%rdi, %r14), %r10
0x00000005ff37c2: movl %esi, -0x5ad4(%r10)
0x00000005ff37c9: leaq (%rdi, %r15), %r11
0x00000005ff37cd: movl %esi, -0x579c(%r11)
0x00000005ff37d4: leaq (%rdi, %r14), %r10
0x00000005ff37d8: movl %esi, -0x5ac4(%r10)
code_buffer=0000000005fc0000
M=0x53a0000, P=0x55a0000, R=0x59c0000, H=0x5180000
Register dump:
 rip:0000000005ff37a1 rsp:000000000174f7a0 rbp:000000000174f8e0 eflags:00010202 (  R- --  I   - - - )
 rax:0000000000c38100 rbx:000000000007a7dc rcx:000000007fffff80 rdx:00000000ffffffbf
 rsi:0000000000000000 rdi:0000000000090000  r8:000000007fffffc0  r9:0000000000000000 r10:0000000005430000
 r11:00000000071c0040 r12:0000000000000000 r13:00000000001fffff r14:00000000053a0000 r15:00000000071c0040

So r10 is within PSX RAM, but r11 is calculated from r15 which is some unknown heap stuff which unhappily ends up after some reserved area:

0000000006eb0000 00000000071bffff reserve private RW
00000000071c0000 000000000744ffff commit  private RW
0000000007450000 000000000765ffff commit  private RW

@pcercuei
Copy link

r15 is the last of the callee-saved registers on the Win32 x86_64 ABI, so it contains the pointer to the "lightrec_state" structure that is used basically everywhere. On Linux it is r12, as the ABI is different.

The 0x22 shift is what looks strange, could you print the "mask" variable as hex in rec_and_mask() in deps/lightrec/emitter.c?

@pcercuei
Copy link

pcercuei commented Jan 17, 2025

Ok, I think I see what's going on. It's related to the long type being 32 bits on Windows.

Could you try this patch?

diff --git a/deps/lightning/lib/jit_x86-cpu.c b/deps/lightning/lib/jit_x86-cpu.c
index 6957adf36f9..09b6515c93f 100644
--- a/deps/lightning/lib/jit_x86-cpu.c
+++ b/deps/lightning/lib/jit_x86-cpu.c
@@ -870,10 +870,17 @@ static void _patch_at(jit_state_t*, jit_word_t, jit_word_t);
 #      define ffsl(l)                  __builtin_ffsl(l)
 #    endif
 #  endif
+#  if __WORDSIZE == 64
+#    define __popcnt(x)                        __builtin_popcountll(x)
+#    define __ctz(x)                   __builtin_ctzll(x)
+#  else
+#    define __popcnt(x)                        __builtin_popcount(x)
+#    define __ctz(x)                   __builtin_ctz(x)
+#  endif
 #  define jit_cmov_p()                 jit_cpu.cmov
-#  define is_low_mask(im)              (((im) & 1) ? (__builtin_popcountl((im) + 1) <= 1) : 0)
-#  define is_high_mask(im)             ((im) ? (__builtin_popcountl((im) + (1 << __builtin_ctzl(im))) == 0) : 0)
-#  define unmasked_bits_count(im)      (__WORDSIZE - __builtin_popcountl(im))
+#  define is_low_mask(im)              (((im) & 1) ? (__popcnt((im) + 1) <= 1) : 0)
+#  define is_high_mask(im)             ((im) ? (__popcnt((im) + (1 << __ctz(im))) == 0) : 0)
+#  define unmasked_bits_count(im)      (__WORDSIZE - __popcnt(im))
 #endif
 
 #if CODE

@notaz
Copy link
Collaborator

notaz commented Jan 17, 2025

This works. Should also apply to ffsl() above, doesn't it? There might be even more such things...

Doesn't matter in practice, but supposedly leading 1 or 2 underscores in names are reserved for the compiler and system libraries according to some standard. Just a note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lightrec Issue specific or affecting lightrec (but not interpreter)
Projects
None yet
Development

No branches or pull requests

3 participants