Skip to content

Split library into cudf-free core and optional cucascade-cudf target#150

Draft
mbrobbel wants to merge 1 commit into
NVIDIA:mainfrom
mbrobbel:cucascade-cudf-target
Draft

Split library into cudf-free core and optional cucascade-cudf target#150
mbrobbel wants to merge 1 commit into
NVIDIA:mainfrom
mbrobbel:cucascade-cudf-target

Conversation

@mbrobbel

Copy link
Copy Markdown
Member

Summary

Closes #142. Supersedes #144.

cuCascade is meant to be a generic tiered GPU memory/data manager, but several pieces were tightly
coupled to libcudf, forcing every consumer to pull the whole cuDF stack. PR #144 addressed this
by deleting the cudf-coupled code. This PR takes a less destructive approach: it keeps the cudf
representations/converters but moves them into a separate library target, cucascade-cudf, that
links libcudf. Downstream consumers that need the cudf representations (e.g. Sirius) link
cucascade-cudf; everyone else links the cudf-free core cucascade and registers their own
converters.

The core cucascade target now depends only on RMM + CUDA (plus libnuma and kvikIO/cuFile for
the disk tier).

What changed

Core (cuCascade::cucascade) — now cudf-free

  • Extracted a generic cucascade::memory::column_metadata (opaque int32_t type_id) into
    include/cucascade/memory/column_metadata.hpp; the disk tier uses it instead of the
    cudf::type_id-typed version. The cudf converters translate cudf::type_id ↔ the generic tag.
  • record_writer_event() / get_writer_event() / rebind_stream() are now virtuals on
    idata_representation (no-op / nullptr defaults), so data_batch dispatches polymorphically
    instead of dynamic_cast-ing to the GPU representation.
  • The converter registry ships empty; register_builtin_converters() moved out of the core header.

New cuCascade::cucascade_cudf target (headers under include/cucascade/cudf/, sources under
src/cudf/)

  • Holds gpu_table_representation, host_data_representation / host_data_packed_representation,
    host_table / host_table_packed, the bandwidth profiler, and the built-in converters
    (register_builtin_converters(), declared in cucascade/cudf/builtin_converters.hpp).

Build

  • New CUCASCADE_BUILD_CUDF option (default ON) gates find_package(cudf) and the
    cucascade_cudf library. CUCASCADE_BUILD_CUDF=OFF produces a fully cudf-free build.
  • cudf is exposed as a CMake package component, so core consumers never pull cudf even from a
    full (cudf-ON) install:
    find_package(cuCascade REQUIRED)                  # cudf-free core
    find_package(cuCascade REQUIRED COMPONENTS cudf)  # also resolves cudf for cuCascade::cucascade_cudf
  • Tests split into cucascade_tests (core) and cucascade_cudf_tests (cudf); benchmarks link
    cucascade_cudf. Both cudf executables are gated on CUCASCADE_BUILD_CUDF.

Testing

Built and ran the full suite locally (NVIDIA, cuda-13-stable env):

  • cmake --preset release -DCUCASCADE_BUILD_TESTS=ON → core + cucascade_cudf + both test exes +
    benchmarks build; ctest100% passed (cucascade_tests, cucascade_cudf_tests,
    cucascade_topology_discovery_tests).
  • cmake ... -DCUCASCADE_BUILD_CUDF=OFF → core-only build succeeds; cudf tests/benchmarks skipped;
    core tests pass.
  • ldd confirms libcucascade.so has no libcudf, while libcucascade_cudf.so links it.
  • Installed both configurations and verified the package config via consumer smoke tests:
    core find_package(cuCascade) works without cudf (even against a cudf-ON install),
    COMPONENTS cudf resolves cudf and links cuCascade::cucascade_cudf, and
    REQUIRED COMPONENTS cudf fails cleanly against a cudf-free install.

Reviewer notes

  • Breaking include-path change (intentional): the cudf representation/host-table/profiler
    headers move from cucascade/data/ and cucascade/memory/ to cucascade/cudf/. No old-path
    forwarding shims are provided — consumers update their includes to <cucascade/cudf/...>. Happy
    to add forwarding shims if we'd rather not break the include paths at this stage.
  • disk_file_format.hpp stays in the core (include/cucascade/data/): it is cudf-free even
    though only the cudf converters use it today.

🤖 Generated with Claude Code

Decouple the core library from libcudf: cucascade now depends only on
RMM + CUDA (plus numa and kvikIO/cuFile for the disk tier), while the
cudf-backed representations, built-in converters, and bandwidth profiler
move to a separate cucascade-cudf target (headers under cucascade/cudf/).

- Generic memory::column_metadata (opaque int32_t type tag) used by the
  disk tier; cudf converters translate cudf::type_id <-> the tag.
- Virtualize record_writer_event/get_writer_event/rebind_stream on
  idata_representation so data_batch stays cudf-free.
- Core converter registry ships empty; register_builtin_converters() now
  lives in cucascade-cudf.
- CUCASCADE_BUILD_CUDF option (default ON) gates find_package(cudf) and
  the cudf target; cudf is exposed as a package component so core
  consumers never pull cudf.
- Split tests and benchmarks by target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mbrobbel mbrobbel added feature request New feature or request breaking labels Jun 17, 2026
@mbrobbel mbrobbel marked this pull request as draft June 17, 2026 15:37
@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mbrobbel mbrobbel requested a review from felipeblazing June 17, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking feature request New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove libcudf dependency from cuCascade

1 participant