Skip to content

Conversation

@rostan-t
Copy link
Collaborator

Category: Bug fix (non-breaking change which fixes an issue)

Description:

This PR enables arithmetic operations between tensors/batches and scalars. Previously, x + n worked if x was a CPU tensor and n a scalar, but not if x was a GPU tensor.

Arithmetic operations between a GPU tensor/batch and a Python list or tuple are also supported. Tensor types need to be explicitly copied.

Additional information:

Affected modules and functionalities:

Dynamic mode tensors and batches.

Key points relevant for the review:

Do arithmetic operations work the way we intend to?

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-4545

@greptile-apps
Copy link

greptile-apps bot commented Dec 19, 2025

Greptile Summary

This PR enables arithmetic operations between GPU tensors/batches and scalars by consolidating the _arithm_op implementation from _tensor.py and _batch.py into a new shared _arithmetic.py module. The key improvements are:

  • Fixed GPU tensor + scalar operations: Previously only worked for CPU tensors
  • Improved device detection logic: Uses two-pass approach to determine target device before converting scalars, ensuring consistent device placement
  • Added type safety: Validates that non-implicitly-convertible types cannot be used with GPU tensors
  • Fixed backend bug: Corrected TensorListGPU.broadcast to use GPUBackend instead of CPUBackend
  • Comprehensive test coverage: Added tests for scalar operations with both tensors and batches, plus device compatibility validation

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-structured with clear improvements over the previous implementation. The two-pass device detection algorithm is more robust than the old approach, the backend fix is obviously correct, and comprehensive tests cover the new functionality including edge cases like device mismatches.
  • No files require special attention

Important Files Changed

Filename Overview
dali/python/backend_impl.cc Fixed TensorListGPU.broadcast to use GPUBackend instead of CPUBackend
dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py New centralized arithmetic operation handler with improved GPU/CPU device handling
dali/test/python/experimental_mode/test_arithm_ops.py Added comprehensive tests for scalar operations and device compatibility checking

Sequence Diagram

sequenceDiagram
    participant User
    participant Tensor/Batch
    participant _arithm_op
    participant as_tensor
    participant _arithmetic_generic_op

    User->>Tensor/Batch: gpu_tensor + scalar
    Tensor/Batch->>_arithm_op: __add__(gpu_tensor, scalar)
    
    Note over _arithm_op: Check all args for GPU tensors
    _arithm_op->>_arithm_op: gpu = any(arg.device == "gpu"<br/>for Tensor/Batch args)
    
    alt Scalar argument found
        _arithm_op->>_arithm_op: Check if implicitly convertible
        alt GPU mode & not convertible
            _arithm_op-->>User: ValueError: not implicitly copyable
        else Convertible
            _arithm_op->>as_tensor: as_tensor(scalar, device="gpu")
            as_tensor-->>_arithm_op: GPU tensor
        end
    end
    
    Note over _arithm_op: Verify no CPU/GPU mixing
    _arithm_op->>_arithm_op: Check all new_args devices match
    
    alt Device mismatch
        _arithm_op-->>User: ValueError: Cannot mix GPU and CPU
    else All same device
        _arithm_op->>_arithmetic_generic_op: Execute operation
        _arithmetic_generic_op-->>User: Result tensor
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py, line 36 (link)

    syntax: Typo: "implictly" should be "implicitly"

  2. dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py, line 30-46 (link)

    logic: Logic bug: when a scalar appears before a GPU tensor (e.g., 3 + gpu_tensor), the scalar gets converted to CPU before detecting the GPU tensor, causing the final check to fail with "Cannot mix GPU and CPU inputs."

    The algorithm needs two passes:

    1. First pass: scan all args to detect if any GPU tensor exists
    2. Second pass: convert scalars to appropriate device

    Example failure case:

    gpu_tensor = ndd.tensor([1, 2, 3], device="gpu")
    result = 5 + gpu_tensor  # Will raise ValueError
  3. dali/test/python/experimental_mode/test_arithm_ops.py, line 128 (link)

    style: Test only covers tensor + scalar but not scalar + tensor. Add reverse operation tests:

    # Also test scalar + tensor
    result_reversed = ndd.as_tensor(apply_bin_op(op, scalar, x))
    ref_reversed = apply_bin_op(op, scalar, tensor)
    if not np.allclose(result_reversed.cpu(), ref_reversed):
        msg = f"{scalar} {op} {tensor} = \n{result_reversed}\n!=\n{ref_reversed}"
        raise AssertionError(msg)

5 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@mzient mzient self-assigned this Dec 19, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. dali/python/nvidia/dali/experimental/dynamic/_arithmetic.py, line 20 (link)

    syntax: using | operator in isinstance() requires Python 3.10+, but pyproject.toml targets Python 3.8+. Use tuple syntax instead for compatibility.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants