Decouple core library from libcudf#144
Draft
mbrobbel wants to merge 2 commits into
Draft
Conversation
Remove libcudf as a dependency of the cuCascade core library; RMM (librmm) becomes the only direct RAPIDS dependency. The cuDF-backed GPU/host representations and all built-in tier converters are removed and become the responsibility of an external domain layer that links libcudf and registers converters via register_converter(). Library / build: - Drop find_package(cudf) and cudf::cudf from link libs; drop find_dependency(cudf) from the installed CMake config. - pixi: swap libcudf -> librmm, rename cudf-* features to rmm-*. - Delete gpu/cpu_data_representation, host_table(_packed), the ~1800-line built-in converters, register_builtin_converters(), the bandwidth profiler, and the cuDF-coupled benchmarks / tests / test utils. - Extract a generic memory::column_metadata (opaque int32_t type tag) into include/cucascade/memory/column_metadata.hpp for the disk tier. - Move record_writer_event()/get_writer_event() to the idata_representation base (no-op / nullptr defaults). - disk_data_representation is now the only in-library concrete representation. Docs / CI: - Update README, docs/*, and CLAUDE.md to the decoupled architecture; remove docs/bandwidth-profiler.md. - Disable the CI benchmark job (its converter benchmarks moved out with the cuDF code). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Member
Author
|
/ok to test b24413f |
Member
Author
|
/ok to test 79fb183 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #142.
cuCascade is meant to be a generic, tiered GPU memory/data manager, but several pieces were tightly
coupled to libcudf — forcing every consumer to depend on the whole cuDF stack. This PR removes that
coupling: the core library now depends only on RMM + CUDA (plus libnuma, and kvikIO/cuFile for disk),
and the cuDF-specific representations, converters, and bandwidth profiler move out to the domain layer
(downstream consumers that link libcudf).
cuCascade's abstractions —
idata_representation, the converter registry,data_batch, repositories,disk I/O backends, the memory subsystem — were already generic; this PR makes the dependency boundary
match the architecture.
What changed
Public API / library
idata_representationno longer exposes any cuDF type. The cuDF-backed concrete representations aregone from the library —
disk_data_representationis now the only in-library concrete representation.record_writer_event()/get_writer_event()moved onto theidata_representationbase as virtuals(no-op /
nullptrdefaults).read_only_data_batch::get_writer_event()now dispatches polymorphicallyinstead of
dynamic_cast-ing to the GPU type.memory::column_metadata(opaqueint32_ttype tag — the numericvalue of a consumer's column-type enum, which cuCascade never interprets) into
include/cucascade/memory/column_metadata.hpp, used by the disk tier.register_builtin_converters()is removed; consumersregister tier converters via
register_converter<Source, Target>().Build
find_package(cudf)andcudf::cudffrom the link interface; removefind_dependency(cudf)from the installed CMake config — consumers no longer transitively pull cuDF.
pixi: depend onlibrmmdirectly (was transitive via libcudf); renamecudf-*features →rmm-*.Removed (now provided by the domain layer)
gpu_data_representation,cpu_data_representation(host_data_representation,host_data_packed_representation),host_table.hpp,host_table_packed.hpp.representation_converter.cpp+register_builtin_converters().bandwidth_profiler.{hpp,cpp}).test/utils/cudf_test_utils.*, and the converter/disk benchmarks.Docs / CI
README,docs/*, andCLAUDE.mdto the decoupled architecture; removeddocs/bandwidth-profiler.md.benchmarkjob (its converter benchmarks moved out with the cuDF code).Acceptance criteria (#142)
find_package(cudf)absent fromCMakeLists.txtcudf::cudfabsent from every link target#include <cudf/...>in any cuCascade source/header (remaining cuDF mentions are explanatory comments)cmake-release+build-releasesucceed; the build graph has zero cuDF referencestest_disk_io_backend)Testing
Built and ran the full suite locally on an NVIDIA RTX PRO 6000 (sm_120):
Verified the build references no cuDF via
find_package/ link /#include. (A fully cuDF-absentenvironment will confirm at the package level once the lockfile is re-solved against the new
rmm-*features.)
Reviewer notes
column_metadatakept (not removed): the issue listshost_table.hpp'scolumn_metadataamongthe types to move out, but
disk_data_representationstays and needs it — so it's retained as ageneric, cuDF-free struct rather than deleted.
per-column metadata lives in-memory in
disk_table_allocation(a file is only meaningful alongside itsallocation). Corrected a stale
CLAUDE.mdclaim of adisk_file_header.🤖 This PR was prepared with the assistance of an AI agent (Claude Code).