Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V disassembly, part 4/4: LibDisassembly: Actually disassemble RISC-V #21540

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

kleinesfilmroellchen
Copy link
Collaborator

This is part 4, the final part of a long chain of PRs and commits that add RISC-V disassembly support. Today: exactly this. RISC-V disassembly support. It's quite a lot of code, I've tried to split it up by extension and also separate out the initial support commits so it's hopefully easier to review. There's full tests for everything added here, so the functionality is verified correct.

A small look at what some of our libc looks like with disasm, grabbing a random function:

AK::StringUtils::invert_case(AK::StringView) (0x00000000000fb2ec-0x00000000000fb39a):
0x00000000000fb2ec  0d 71                 addi       sp, sp, -32
0x00000000000fb2ee  86 ee                 sd         ra, 0x158(sp)
0x00000000000fb2f0  a2 ea                 sd         fp, 0x150(sp)
0x00000000000fb2f2  a6 e6                 sd         s1, 0x148(sp)
0x00000000000fb2f4  ca e2                 sd         s2, 0x140(sp)
0x00000000000fb2f6  4e fe                 sd         s3, 0x138(sp)
0x00000000000fb2f8  52 fa                 sd         s4, 0x130(sp)
0x00000000000fb2fa  56 f6                 sd         s5, 0x128(sp)
0x00000000000fb2fc  80 12                 addi       fp, sp, 352
.Lpcrel_hi55 (0x00000000000fb2fe):
0x00000000000fb2fe  97 46 04 00           auipc      a3, 278528
0x00000000000fb302  03 ba 26 40           ld         s4, 0x402(a3)
0x00000000000fb306  83 36 0a 00           ld         a3, 0x000(s4)
0x00000000000fb30a  b2 84                 add        s1, zero, a2
0x00000000000fb30c  ae 89                 add        s3, zero, a1
0x00000000000fb30e  2a 89                 add        s2, zero, a0
0x00000000000fb310  23 30 d4 fc           sd         a3, -0x040(fp)
0x00000000000fb314  13 05 84 ea           addi       a0, fp, -344
0x00000000000fb318  b2 85                 add        a1, zero, a2
0x00000000000fb31a  ef 90 1f 8f           jal        ra, 0xf4c0a
0x00000000000fb31e  9d c8                 beq        s1, zero, 0xfb354 <.LBB38_7>
0x00000000000fb320  e5 4a                 addi       s5, zero, 25
0x00000000000fb322  05 a0                 jal        zero, 0xfb342 <.LBB38_4>
.LBB38_2 (0x00000000000fb324):
0x00000000000fb324  93 05 f5 fb           addi       a1, a0, -65
0x00000000000fb328  93 b5 a5 01           sltiu      a1, a1, 26
0x00000000000fb32c  96 05                 slli       a1, a1, 5
0x00000000000fb32e  2e 95                 add        a0, a0, a1
.LBB38_3 (0x00000000000fb330):
0x00000000000fb330  93 75 f5 0f           andi       a1, a0, 255
0x00000000000fb334  13 05 84 ea           addi       a0, fp, -344
0x00000000000fb338  ef 90 7f e8           jal        ra, 0xf51be
0x00000000000fb33c  fd 14                 addi       s1, s1, -1
0x00000000000fb33e  85 09                 addi       s3, s3, 1
0x00000000000fb340  91 c8                 beq        s1, zero, 0xfb354 <.LBB38_7>
.LBB38_4 (0x00000000000fb342):
0x00000000000fb342  a1 c4                 beq        s1, zero, 0xfb38a
0x00000000000fb344  03 c5 09 00           lbu        a0, 0x000(s3)
0x00000000000fb348  93 05 f5 f9           addi       a1, a0, -97
0x00000000000fb34c  e3 ec ba fc           bltu       s5, a1, 0xfb324
0x00000000000fb350  01 15                 addi       a0, a0, -32
0x00000000000fb352  f9 bf                 jal        zero, 0xfb330
.LBB38_7 (0x00000000000fb354):
0x00000000000fb354  93 05 84 ea           addi       a1, fp, -344
0x00000000000fb358  4a 85                 add        a0, zero, s2
0x00000000000fb35a  ef c0 ff fa           jal        ra, 0xf8308
0x00000000000fb35e  03 45 84 fb           lbu        a0, -0x048(fp)
0x00000000000fb362  09 e5                 bne        a0, zero, 0xfb36c <.LBB38_9>
0x00000000000fb364  03 35 04 eb           ld         a0, -0x150(fp)
0x00000000000fb368  ef f0 29 ba           jal        ra, 0x9a70a
.LBB38_9 (0x00000000000fb36c):
0x00000000000fb36c  03 35 0a 00           ld         a0, 0x000(s4)
0x00000000000fb370  83 35 04 fc           ld         a1, -0x040(fp)
0x00000000000fb374  63 11 b5 02           bne        a0, a1, 0xfb396 <.LBB38_12>
0x00000000000fb378  f6 60                 ld         ra, 0x158(sp)
0x00000000000fb37a  56 64                 ld         fp, 0x150(sp)
0x00000000000fb37c  b6 64                 ld         s1, 0x148(sp)
0x00000000000fb37e  16 69                 ld         s2, 0x140(sp)
0x00000000000fb380  f2 79                 ld         s3, 0x138(sp)
0x00000000000fb382  52 7a                 ld         s4, 0x130(sp)
0x00000000000fb384  b2 7a                 ld         s5, 0x128(sp)
0x00000000000fb386  35 61                 addi       sp, sp, -32
0x00000000000fb388  82 80                 jalr       zero, ra, 0xfb388
.LBB38_11 (0x00000000000fb38a):
.Lpcrel_hi56 (0x00000000000fb38a):
0x00000000000fb38a  17 a5 f6 ff           auipc      a0, -614400
0x00000000000fb38e  13 05 05 f5           addi       a0, a0, -176
0x00000000000fb392  ef 70 6d a5           jal        ra, 0xd25e8
.LBB38_12 (0x00000000000fb396):
0x00000000000fb396  ef 40 2b bd           jal        ra, 0xaf768

Depends on #21539.


LibDisassembly: Add RISC-V register definitions

The next series of commits will add full RV64GC decoding, but since this
is a lot of code, it is split up into semi-logical chunks which each
contain plenty of dead code initially.

This first commit adds the RISC-V register definitions and enums
necessary for RV64GC; that is, no vector registers.

LibDisassembly: Add decoding for all RISC-V instruction formats

These structs just handle decoding the raw bit format, including details
like sign extending, but not interpreting these bits or translating them
to higher level constructs.

LibDisassembly: Add string fmting for RISC-V registers and rounding mode

LibDisassembly: Add RISC-V instruction abstractions

These will be used later on by concrete instruction implementations.

LibDisassembly: Add RISC-V 64 base ISA instruction decoding

Actually, this is RV64IMZifencei and RV32IMZifencei:

  • As per RISC-V, RV64I of course includes RV32I
  • The M extension is not separable in a good way since it just adds more
    arithmetic instructions that are represented in a common class with
    other I instructions.
  • Zifencei is just one instruction (FENCE.I) that doesn't need its own
    file.

LibDisassembly: Add RISC-V Zicsr extension decoding

Note that CSR pretty-printing is not yet implemented, since we aim to
establish shared CSR utilities in AK in the near future.

LibDisassembly: Add RISC-V FD extension decoding

Again, no point in splitting these two since the orthagonal instruction
set makes supporting both in common abstractions convenient.

LibDisassembly: Add RISC-V A extension decoding

LibDisassembly: Add common priviledged instructions

This does not include all instructions for S- and M-mode from the
priviledged spec, but these are the common ones that we will definitely
need (and use ourselves)

LibDisassembly: Use RISC-V decoding functions for 32-bit instructions

This was prettier to add in one go instead of many small sections :^)

LibDisassembly: Add RV64C extension decoding

This does not support RV32C or RV128C, since those use the same
instructions for different purposes. We decode compressed instructions
into the same internal representation as their full counterparts, since
all compressed instructions have such an exact counterpart by design.

LibDisassembly: Support disassembling a RISC-V instruction stream

This small commit finally plugs the RISC-V decoding capabilities into
Disassembler, allowing RISC-V to be disassembled correctly.

Tests: Add RISC-V disassembly tests

There are two tests here, one for full RV64G coverage and one with the
asinh implementation from our libc. There's no dedicated C tests (though
C is tested plenty in asinh) since I've already spent over two days
writing these tests.

Copy link
Member

@spholz spholz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the F/D/C/A extensions, as I don't know much about them.
I only have some comments regarding the naming of things.
But very nice to see risc-v disassembly support in serenity!

@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch 2 times, most recently from 697671f to dd1cd0f Compare October 22, 2023 12:42
Copy link

stale bot commented Nov 12, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions!

@stale stale bot added the stale label Nov 12, 2023
Copy link

stale bot commented Nov 23, 2023

This pull request has been closed because it has not had recent activity. Feel free to re-open if you wish to still contribute these changes. Thank you for your contributions!

@stale stale bot closed this Nov 23, 2023
Copy link

stale bot commented Jan 13, 2024

This pull request has been closed because it has not had recent activity. Feel free to re-open if you wish to still contribute these changes. Thank you for your contributions!

@stale stale bot closed this Jan 13, 2024
@stale stale bot removed the stale label Jan 19, 2024
@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch 5 times, most recently from c2023a0 to 6725b3f Compare January 20, 2024 16:39
@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch 3 times, most recently from eb11cbb to 4737f32 Compare January 27, 2024 12:20
Copy link

stale bot commented Feb 17, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions!

@stale stale bot added the stale label Feb 17, 2024
Copy link

stale bot commented Feb 24, 2024

This pull request has been closed because it has not had recent activity. Feel free to re-open if you wish to still contribute these changes. Thank you for your contributions!

@stale stale bot closed this Feb 24, 2024
@stale stale bot removed the stale label Mar 17, 2025
@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch from 4737f32 to 62e3196 Compare March 17, 2025 21:48
@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch from 62e3196 to a6e18f5 Compare March 17, 2025 22:10
@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch from a6e18f5 to 781b7c7 Compare March 23, 2025 20:47
Note that CSR pretty-printing is not yet implemented, since we aim to
establish shared CSR utilities in AK in the near future.
@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch from 781b7c7 to 515ed86 Compare March 25, 2025 20:02
Again, no point in splitting these two since the orthagonal instruction
set makes supporting both in common abstractions convenient.
This does not include all instructions for S- and M-mode from the
priviledged spec, but these are the common ones that we will definitely
need (and use ourselves)
This was prettier to add in one go instead of many small sections :^)
This does *not* support RV32C or RV128C, since those use the same
instructions for different purposes. We decode compressed instructions
into the same internal representation as their full counterparts, since
all compressed instructions have such an exact counterpart by design.
There are two tests here, one for full RV64G coverage and one with the
asinh implementation from our libc. There's no dedicated C tests (though
C is tested plenty in asinh) since I've already spent over two days
writing these tests.
@kleinesfilmroellchen kleinesfilmroellchen force-pushed the risc-v-disasm/04-yo-dis-assembler-so-risky-bruh branch from 515ed86 to 9a20742 Compare March 25, 2025 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants