Skip to content

Make IDs 32bits#542

Merged
vinistock merged 3 commits intomainfrom
01-28-id_exploration
Jan 30, 2026
Merged

Make IDs 32bits#542
vinistock merged 3 commits intomainfrom
01-28-id_exploration

Conversation

@vinistock
Copy link
Member

@vinistock vinistock commented Jan 29, 2026

Step towards #141

When thinking about using u32 IDs, I was getting really caught up by the fact that Rust's HashMap implementation still requires storing a u64 for the key digests. Of course that consumes more memory and ideally we can have a custom HashMap implementation that uses half the amount of bits.

However, we actually use IDs quite extensively outside of just HashMap keys. For example, many definitions store StringId, names store multiple NameIds and ancestors is a list of DeclarationId. We can actually have very substantial memory savings right now even without the 32 bit HashMap.

Implementation

This PR reduces the size of our IDs to u32 from i64. I recommend reviewing per commit:

  1. Reduce the ID size to u32

  2. Allow DefinitionId and ReferenceId to be tagged with their kind. This helps us reduce the changes of collision by encoding the kind in the lower bits of the ID. Essentially, we have 28 bits for the digest + 4 for the kind. Definitions and references are the things we have the most in Core and the only ones I got conflicts for running the release mode, so I think this is enough for the time being.

    As an added benefit, this allows to check the kind of definitions and references without having to retrieve them from the graph. That information is directly encoded in the ID, we can just invoke kind. Note that the kind addition conflicts with Provide resolution diagnostics (inline) #502, we can let that PR go first.

  3. Adjust the C side to use u32

Impact

Despite still using 64 bit HashMaps, this PR still reduces our memory used from ~3810 MB to
~2880 MB (25% reduction).

I believe we can further reduce this with a custom HashMap implementation that stores u32 internally.

Why now?

I believe this is the right time to do this for the following reason: we're starting adoption of Rubydex in our tools. It will be easier for us to verify that we can indeed get away with u32 and lower memory usage and then increase to u64 if necessary than the other way around.

Also, we're almost getting to 4GB and a reduction is definitely welcome.

Copy link
Member Author

vinistock commented Jan 29, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@vinistock vinistock mentioned this pull request Jan 29, 2026
@vinistock vinistock self-assigned this Jan 29, 2026
@vinistock vinistock marked this pull request as ready for review January 29, 2026 21:33
@vinistock vinistock requested a review from a team as a code owner January 29, 2026 21:33
#[derive(PartialEq, Eq, Debug, Clone, Copy)]
pub struct DeclarationMarker;
/// `DeclarationId` represents the ID of a fully qualified name. For example, `Foo::Bar` or `Foo#my_method`
pub type DeclarationId = Id<DeclarationMarker>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also tag declarations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't figure out a good way to do it. Consider our Ruby API:

graph["Foo"]

The Ruby API has no way of knowing what Foo is before looking it up, so we wouldn't be able to construct the right DeclarationId to search.

For now, I think it's okay because we have a lot fewer declarations than definitions.

}
}

#[repr(u8)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to fit this in 4 bits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do this? u8 is the smallest integer storage we can use no?

offset::Offset,
};

#[repr(u8)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 bits should be enough?

Base automatically changed from 01-28-reduce_ref_count_sizes to main January 30, 2026 16:05
@vinistock vinistock force-pushed the 01-28-id_exploration branch from 9d72fcd to f486994 Compare January 30, 2026 18:14
@vinistock vinistock merged commit 6888476 into main Jan 30, 2026
27 checks passed
@vinistock vinistock deleted the 01-28-id_exploration branch January 30, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants