Skip to content

Conversation

@yuebanabn
Copy link

Fixes #2602

Summary
This PR overhauls the VanadisRoCCInterface integration within the processor pipeline to support true Out-of-Order (OoO) execution. Previously, RoCC instructions were handled sequentially with immediate operand reading at the Issue stage and direct architectural register write-back, which artificially serialized execution and introduced RAW/WAW/WAR hazards.

Technical Details

This PR implements the following changes to align RoCC instruction handling with standard Vanadis functional units:

  1. Register Renaming for RoCC (assignRegistersToInstruction):

    • Removed the special handling block that bypassed register allocation.
    • RoCC instructions now go through the standard output register allocation path, acquiring physical registers for rd. This resolves WAW and WAR hazards.
  2. Delayed Dispatch Mechanism (allocateFunctionalUnit):

    • Removed the immediate push to the RoCC interface during the Issue stage.
    • Introduced rocc_wait_queues_ to hold issued RoCC instructions that are waiting for source operands (acting as a reservation station).
  3. Operand Readiness Check (performExecute):

    • Added logic to the start of the Execute stage to scan rocc_wait_queues_.
    • Instructions are only dispatched (pushed) to the accelerator when:
      • The hardware RoCC queue is not full.
      • All source physical registers are ready (checking pendingIntWrites).
    • Source register values are now read from the register file at the moment of dispatch, ensuring correct data is sent.
  4. Correct Physical Register Write-back:

    • Updated the response handling logic in performExecute.
    • Results are now written to the allocated physical register index (retrieved from the instruction object) instead of the architectural register index.

Impact
These changes allow the Vanadis CPU to continue issuing independent instructions while RoCC instructions are waiting for operands or executing long-latency tasks. This significantly improves the accuracy of performance modeling for heterogeneous systems.

Testing

  • Verified with a custom CIM (Compute-in-Memory) RoCC component.
  • Confirmed that RAW dependencies are correctly respected (instruction waits for operands).
  • Confirmed that independent instructions can execute out-of-order relative to RoCC operations.

1. Implement Register Renaming for RoCC
2. Delayed Dispatch Mechanism
3. Operand Readiness Check in Execute Stage
4. Correct Write-back
@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

@bfeinberg
Copy link

This may exacerbate a long-standing issue with the Vanadis-RoCC interface, potential speculative execution of RoCC instructions. If you look at the BOOM core documentation it states that RoCC instructions should now issue until they become non-speculative (https://docs.boom-core.org/en/latest/sections/reorder-buffer.html#point-of-no-return-pnr, https://docs.boom-core.org/en/latest/sections/execution-stages.html#the-rocket-custom-co-processor-interface-rocc) ensuring no problems from instructions with side effects.

This has been something I've been meaning to fix for a while but it hasn't been a huge issue because as you point out the RoCC instructions pretty aggressively stall the pipeline. Could you incorporate that non-speculative dispatch into these changes?

@yuebanabn
Copy link
Author

Thanks for the excellent feedback and pointing out this critical issue. I agree completely.
I'll work on implementing the non-speculative dispatch from the commit stage to correctly handle instructions with side effects.
I'll start by creating a test case to reproduce the bug. I'll keep you updated on my progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants