Skip to content

⚡️ Speed up method ReadInstruction.to_absolute by 14%#112

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-ReadInstruction.to_absolute-mlcdh7sw
Open

⚡️ Speed up method ReadInstruction.to_absolute by 14%#112
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-ReadInstruction.to_absolute-mlcdh7sw

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Feb 7, 2026

📄 14% (0.14x) speedup for ReadInstruction.to_absolute in src/datasets/arrow_reader.py

⏱️ Runtime : 1.11 milliseconds 974 microseconds (best of 47 runs)

📝 Explanation and details

The optimized code achieves a 14% runtime improvement through several targeted micro-optimizations in the _rel_to_abs_instr function, which is called for every relative instruction conversion:

Key Optimizations

1. Reduced Attribute Access Overhead
The original code repeatedly accessed rel_instr.rounding and rel_instr.unit multiple times. The optimized version localizes these attributes once at the start:

rounding = rel_instr.rounding
unit = rel_instr.unit

This eliminates repeated attribute lookups, which in Python involves dictionary access on the object's __dict__. Since this function may be called frequently (as shown by the large-scale tests with 500+ instructions), this reduction compounds.

2. Streamlined Boundary Validation
The original code had two separate if statements checking percent boundaries:

if self.unit == "%" and self.from_ is not None and abs(self.from_) > 100:
    raise ValueError(...)
if self.unit == "%" and self.to is not None and abs(self.to) > 100:
    raise ValueError(...)

The optimized version groups these under a single if self.unit == "%": check, eliminating one redundant unit comparison per validation.

3. Optimized Min/Max Operations
The original code used max(num_examples + from_, 0) and min(from_, num_examples) for clamping. The optimized version breaks these into explicit comparisons:

if from_ < 0:
    from_ = num_examples + from_
    if from_ < 0:
        from_ = 0

This avoids function call overhead from max() and min() and provides more predictable branch behavior for the CPU's branch predictor.

Performance Impact by Test Category

  • Basic operations (single instruction conversion): 5-19% faster - most benefit from reduced attribute access
  • Negative index handling: 16-19% faster - streamlined clamping logic pays off
  • Large-scale tests (500 instructions): 16% faster - cumulative benefit of per-instruction savings
  • Percentage slicing: 4-15% faster - benefits from consolidated validation and localized unit variable

The optimization is particularly effective for workloads that:

  • Convert many instructions in batch (like 10-fold cross-validation scenarios)
  • Use absolute indices with negative values (common Python-style slicing)
  • Process datasets with various split configurations repeatedly

These changes maintain identical behavior and outputs while reducing the CPU cycles needed for each instruction conversion, making dataset loading operations more efficient.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 293 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import math

import pytest  # used for our unit tests
from src.datasets.arrow_reader import ReadInstruction

# function to test
# The following block reproduces the relevant parts of src/datasets/arrow_reader.py
# (preserving the provided functions and the ReadInstruction implementation exactly).
# We add minimal helper classes required for the code to run in this test context,
# implemented to match the attributes and behavior expected by the functions above.

# Minimal helper domain classes required by the implementation.
# These are lightweight real classes (not mocks) that match the attributes the code expects.

class _RelativeInstruction:
    """
    Minimal representation of a relative instruction.
    The original implementation uses attributes: splitname, from_, to, unit, rounding.
    """
    def __init__(self, splitname, from_, to, unit, rounding):
        # Keep attribute names matching usage in _rel_to_abs_instr
        self.splitname = splitname
        self.from_ = from_
        self.to = to
        self.unit = unit
        self.rounding = rounding

def test_to_absolute_closest_percent_basic():
    # Basic test: closest rounding with percent unit should map percentages to nearest absolute indices.
    instr = ReadInstruction("train", rounding="closest", from_=0, to=33, unit="%")
    # Provide a split length where 33% is exactly 330
    name2len = {"train": 1000}
    codeflash_output = instr.to_absolute(name2len); abs_list = codeflash_output # 6.34μs -> 5.63μs (12.6% faster)
    instr_abs = abs_list[0]

def test_to_absolute_abs_basic_clamping():
    # Basic absolute slicing should return exact boundaries but be clamped to dataset length.
    instr = ReadInstruction("test", rounding=None, from_=10, to=200, unit="abs")
    name2len = {"test": 100}
    codeflash_output = instr.to_absolute(name2len); abs_list = codeflash_output # 4.39μs -> 3.71μs (18.2% faster)
    a = abs_list[0]

def test_to_absolute_pct1_dropremainder_small_split_raises():
    # Edge case: using pct1_dropremainder on a split with less than 100 elements must raise ValueError.
    instr = ReadInstruction("small", rounding="pct1_dropremainder", from_=0, to=50, unit="%")
    name2len = {"small": 50}  # less than 100 triggers the ValueError in _pct_to_abs_pct1
    with pytest.raises(ValueError) as excinfo:
        instr.to_absolute(name2len) # 3.76μs -> 3.64μs (3.24% faster)

def test_to_absolute_negative_indices_abs_and_percent():
    # Edge case: negative indices for 'abs' unit should count from the end and be clamped
    instr_abs = ReadInstruction("s", rounding=None, from_=-10, to=None, unit="abs")
    name2len = {"s": 100}
    codeflash_output = instr_abs.to_absolute(name2len); abs_list = codeflash_output # 4.89μs -> 4.12μs (18.6% faster)
    a = abs_list[0]

    # Negative percent with 'closest' rounding: -10% of 200 -> -20 -> num_examples + (-20) = 180
    instr_pct = ReadInstruction("s", rounding="closest", from_=-10, to=None, unit="%")
    name2len = {"s": 200}
    codeflash_output = instr_pct.to_absolute(name2len); abs_list2 = codeflash_output # 4.51μs -> 4.16μs (8.38% faster)
    b = abs_list2[0]

def test_pct1_dropremainder_behavior_and_truncation():
    # Edge behavior: pct1_dropremainder uses math.trunc on num_examples/100
    # Example: boundary 5%, num_examples 250 -> math.trunc(2.5) == 2 => result 5 * 2 = 10
    instr = ReadInstruction("a", rounding="pct1_dropremainder", from_=0, to=5, unit="%")
    name2len = {"a": 250}
    out = instr.to_absolute(name2len)[0] # 7.14μs -> 6.24μs (14.5% faster)

    # Another check: negative percent with pct1_dropremainder on large enough split:
    instr2 = ReadInstruction("a", rounding="pct1_dropremainder", from_=-5, to=None, unit="%")
    out2 = instr2.to_absolute(name2len)[0] # 3.16μs -> 2.61μs (21.0% faster)

def test_clamping_when_from_or_to_exceed_bounds():
    # If from_ > num_examples, it must be clamped to num_examples; same for to < 0 handled earlier.
    instr = ReadInstruction("b", rounding=None, from_=150, to=200, unit="abs")
    name2len = {"b": 100}
    result = instr.to_absolute(name2len)[0] # 4.52μs -> 3.99μs (13.2% faster)

def test_large_scale_many_instructions():
    # Large scale test: create a ReadInstruction instance that contains many relative instructions
    # without exceeding 1000 elements (we use 500).
    count = 500  # under the 1000-element guideline
    # Build many _RelativeInstruction entries, alternating units to also test different handling
    relative_list = []
    name2len = {}
    for i in range(count):
        split_name = f"split_{i}"
        # Assign a moderate length for each split to keep conversions simple and under memory limits
        name2len[split_name] = 100
        # Alternate between absolute and percent units to ensure the conversion handles both
        if i % 2 == 0:
            # absolute slice: from i to i+1 (clamped later)
            rel = _RelativeInstruction(split_name, from_=i, to=i + 1, unit="abs", rounding=None)
        else:
            # percent slice: use 10% to 20% with closest rounding
            rel = _RelativeInstruction(split_name, from_=10, to=20, unit="%", rounding="closest")
        relative_list.append(rel)

    # Create a ReadInstruction instance via the private initializer to inject many relative instructions.
    instr = ReadInstruction("split_0")  # initial instance (we'll replace its internals)
    instr._init(relative_list)  # use the class's provided private initializer

    # Execute to_absolute and ensure it returns as many absolute instructions as we supplied
    codeflash_output = instr.to_absolute(name2len); abs_instructions = codeflash_output # 611μs -> 527μs (16.0% faster)

    # Verify a couple of entries to ensure conversions are correct
    # First (even index 0) was absolute with from_=0, to=1 -> with split length 100 clamped to (0,1)
    first = abs_instructions[0]

    # Second (index 1) was 10%-20% of 100 -> closest => 10 and 20
    second = abs_instructions[1]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import math

import pytest
from datasets.arrow_reader import (ReadInstruction, _AbsoluteInstruction,
                                   _RelativeInstruction)
from src.datasets.arrow_reader import ReadInstruction

class Test_ReadInstruction_to_absolute_Basic:
    """Basic test cases for ReadInstruction.to_absolute function."""

    def test_single_split_absolute_indices(self):
        """Test converting a single split with absolute indices to absolute instructions."""
        # Create a ReadInstruction with absolute indices
        instr = ReadInstruction('train', from_=10, to=20, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute and verify we get correct AbsoluteInstruction
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.34μs -> 3.72μs (16.7% faster)

    def test_single_split_percentage_closest_rounding(self):
        """Test converting a single split with percentage indices using closest rounding."""
        # Create a ReadInstruction with 50% using closest rounding
        instr = ReadInstruction('test', from_=0, to=50, unit='%', rounding='closest')
        name2len = {'test': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.09μs -> 5.79μs (5.16% faster)

    def test_single_split_percentage_pct1_dropremainder_rounding(self):
        """Test converting a single split with percentage using pct1_dropremainder rounding."""
        # Create a ReadInstruction with 25% using pct1_dropremainder rounding
        instr = ReadInstruction('validation', from_=0, to=25, unit='%', rounding='pct1_dropremainder')
        name2len = {'validation': 400}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.12μs -> 5.38μs (13.8% faster)

    def test_none_boundaries_defaults_to_full_split(self):
        """Test that None boundaries default to full split range."""
        # Create a ReadInstruction with None from_ and to (should use full split)
        instr = ReadInstruction('train', unit='abs')
        name2len = {'train': 200}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.36μs -> 3.81μs (14.4% faster)

    def test_negative_indices_absolute(self):
        """Test converting negative absolute indices (Python-style slicing)."""
        # Create a ReadInstruction with negative to index
        instr = ReadInstruction('train', from_=0, to=-10, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.63μs -> 3.88μs (19.4% faster)

    def test_concatenated_instructions_with_plus(self):
        """Test converting concatenated instructions created with + operator."""
        # Create two ReadInstructions and add them
        instr1 = ReadInstruction('train', from_=0, to=50, unit='abs')
        instr2 = ReadInstruction('test', from_=0, to=30, unit='abs')
        combined = instr1 + instr2
        
        name2len = {'train': 100, 'test': 50}
        
        # Call to_absolute
        codeflash_output = combined.to_absolute(name2len); result = codeflash_output # 5.70μs -> 5.03μs (13.3% faster)

    def test_only_from_boundary_absolute(self):
        """Test instruction with only from_ boundary specified."""
        # Create a ReadInstruction with only from_ specified
        instr = ReadInstruction('train', from_=50, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.40μs -> 3.93μs (11.8% faster)

    def test_only_to_boundary_absolute(self):
        """Test instruction with only to boundary specified."""
        # Create a ReadInstruction with only to specified
        instr = ReadInstruction('train', to=75, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.31μs -> 3.87μs (11.3% faster)

class Test_ReadInstruction_to_absolute_Edge:
    """Edge case tests for ReadInstruction.to_absolute function."""

    def test_zero_percentage(self):
        """Test instruction with 0% boundary."""
        # Create a ReadInstruction with 0% to boundary
        instr = ReadInstruction('train', from_=0, to=0, unit='%', rounding='closest')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.02μs -> 5.76μs (4.57% faster)

    def test_one_hundred_percentage(self):
        """Test instruction with 100% boundary."""
        # Create a ReadInstruction with 100% to boundary
        instr = ReadInstruction('train', from_=0, to=100, unit='%', rounding='closest')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.34μs -> 5.66μs (11.9% faster)

    def test_from_greater_than_to_absolute(self):
        """Test instruction where from_ is greater than to (edge case)."""
        # Create a ReadInstruction where from_ > to
        instr = ReadInstruction('train', from_=80, to=20, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute - boundaries are clamped
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.21μs -> 3.80μs (10.9% faster)

    def test_indices_exceed_split_size_absolute(self):
        """Test instruction where indices exceed the split size."""
        # Create a ReadInstruction with indices larger than split size
        instr = ReadInstruction('train', from_=50, to=200, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.30μs -> 3.84μs (11.9% faster)

    def test_negative_from_index_absolute(self):
        """Test instruction with negative from_ index."""
        # Create a ReadInstruction with negative from_
        instr = ReadInstruction('train', from_=-30, to=100, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.74μs -> 3.98μs (19.1% faster)

    def test_both_indices_negative_absolute(self):
        """Test instruction with both negative indices."""
        # Create a ReadInstruction with both from_ and to negative
        instr = ReadInstruction('train', from_=-50, to=-10, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.78μs -> 4.09μs (16.7% faster)

    def test_unknown_split_name_raises_error(self):
        """Test that unknown split name raises ValueError."""
        # Create a ReadInstruction with non-existent split name
        instr = ReadInstruction('nonexistent', from_=0, to=50, unit='abs')
        name2len = {'train': 100, 'test': 50}
        
        # Call to_absolute and expect ValueError
        with pytest.raises(ValueError, match='Unknown split'):
            instr.to_absolute(name2len) # 6.67μs -> 6.48μs (2.85% faster)

    def test_pct1_dropremainder_with_small_split_raises_error(self):
        """Test that pct1_dropremainder with < 100 examples raises error."""
        # Create a ReadInstruction with pct1_dropremainder and small split
        instr = ReadInstruction('small', from_=0, to=50, unit='%', rounding='pct1_dropremainder')
        name2len = {'small': 50}
        
        # Call to_absolute and expect ValueError
        with pytest.raises(ValueError, match='Using "pct1_dropremainder" rounding on a split with less than 100'):
            instr.to_absolute(name2len) # 3.89μs -> 3.67μs (5.83% faster)

    def test_percentage_with_fractional_results_closest_rounding(self):
        """Test percentage that results in fractional value uses closest rounding."""
        # Create a ReadInstruction with percentage that gives fractional result
        instr = ReadInstruction('train', from_=0, to=33, unit='%', rounding='closest')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.24μs -> 5.66μs (10.3% faster)

    def test_negative_percentage(self):
        """Test instruction with negative percentage boundary."""
        # Create a ReadInstruction with negative percentage
        instr = ReadInstruction('train', from_=-50, to=100, unit='%', rounding='closest')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.65μs -> 6.12μs (8.64% faster)

    def test_very_large_negative_index_absolute(self):
        """Test instruction with very large negative index that clamps to 0."""
        # Create a ReadInstruction with very large negative index
        instr = ReadInstruction('train', from_=-500, to=50, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.73μs -> 4.12μs (14.8% faster)

    def test_very_large_positive_index_absolute(self):
        """Test instruction with very large positive index that clamps to split size."""
        # Create a ReadInstruction with very large positive index
        instr = ReadInstruction('train', from_=50, to=500, unit='abs')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.39μs -> 3.87μs (13.5% faster)

    def test_empty_name2len_dict_raises_error(self):
        """Test that empty name2len dict raises error."""
        # Create a ReadInstruction
        instr = ReadInstruction('train', from_=0, to=50, unit='abs')
        name2len = {}
        
        # Call to_absolute and expect ValueError
        with pytest.raises(ValueError, match='Unknown split'):
            instr.to_absolute(name2len) # 4.42μs -> 4.18μs (5.62% faster)

    def test_multiple_splits_in_name2len_one_requested(self):
        """Test that correct split is selected from multiple available splits."""
        # Create a ReadInstruction for one specific split
        instr = ReadInstruction('validation', from_=10, to=20, unit='abs')
        name2len = {'train': 1000, 'validation': 100, 'test': 50}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.48μs -> 3.75μs (19.4% faster)

    def test_percentage_rounding_closest_rounds_up(self):
        """Test that closest rounding rounds to nearest value."""
        # Create a ReadInstruction with percentage that rounds up
        instr = ReadInstruction('train', from_=0, to=33, unit='%', rounding='closest')
        name2len = {'train': 101}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.29μs -> 5.75μs (9.33% faster)

    def test_pct1_dropremainder_exactly_100_elements(self):
        """Test pct1_dropremainder with exactly 100 elements (boundary case)."""
        # Create a ReadInstruction with pct1_dropremainder and exactly 100 elements
        instr = ReadInstruction('train', from_=0, to=50, unit='%', rounding='pct1_dropremainder')
        name2len = {'train': 100}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.24μs -> 5.40μs (15.4% faster)

class Test_ReadInstruction_to_absolute_LargeScale:
    """Large scale test cases for ReadInstruction.to_absolute function."""

    def test_large_split_size_absolute_indices(self):
        """Test instruction with large split size and absolute indices."""
        # Create a ReadInstruction for a large split
        instr = ReadInstruction('train', from_=5000, to=10000, unit='abs')
        name2len = {'train': 1000000}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.38μs -> 3.76μs (16.4% faster)

    def test_large_split_size_percentage(self):
        """Test instruction with large split size and percentage boundaries."""
        # Create a ReadInstruction with percentage on large split
        instr = ReadInstruction('train', from_=25, to=75, unit='%', rounding='closest')
        name2len = {'train': 1000000}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 6.01μs -> 5.76μs (4.34% faster)

    def test_many_concatenated_instructions(self):
        """Test many instructions concatenated together."""
        # Create multiple ReadInstructions and concatenate them
        instr = ReadInstruction('split1', from_=0, to=10, unit='abs')
        for i in range(1, 50):
            split_name = f'split{i + 1}'
            instr = instr + ReadInstruction(split_name, from_=0, to=10, unit='abs')
        
        # Create name2len for all splits
        name2len = {f'split{i}': 100 for i in range(1, 51)}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 55.3μs -> 47.2μs (17.3% faster)
        # Each instruction should have from_=0 and to=10
        for i, abs_instr in enumerate(result):
            pass

    def test_large_percentage_split_many_times(self):
        """Test splitting a large dataset with various percentages."""
        # Test 10-fold cross validation pattern with 10 folds
        name2len = {'train': 100000}
        
        # Create instructions for 10 folds
        folds = []
        for k in range(0, 100, 10):
            instr = ReadInstruction('train', from_=k, to=k+10, unit='%', rounding='closest')
            codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 26.7μs -> 24.6μs (8.54% faster)
            folds.append(result[0])
        # Each fold should be roughly 10% of the dataset
        for fold in folds:
            fold_size = fold.to - fold.from_

    def test_alternating_splits_union(self):
        """Test creating union of alternating splits."""
        # Create complementary train/test splits concatenated
        train_part1 = ReadInstruction('train', from_=0, to=50, unit='%', rounding='closest')
        test_part = ReadInstruction('test', from_=0, to=100, unit='%', rounding='closest')
        train_part2 = ReadInstruction('train', from_=50, to=100, unit='%', rounding='closest')
        
        combined = train_part1 + test_part + train_part2
        
        name2len = {'train': 100000, 'test': 50000}
        
        # Call to_absolute
        codeflash_output = combined.to_absolute(name2len); result = codeflash_output # 10.1μs -> 9.01μs (11.6% faster)

    def test_fine_grained_percentage_slicing(self):
        """Test fine-grained percentage slicing with small percentages."""
        # Create multiple small percentage slices
        instr = ReadInstruction('train', from_=0, to=1, unit='%', rounding='closest')
        for i in range(1, 100):
            from_pct = i
            to_pct = i + 1
            instr = instr + ReadInstruction('train', from_=from_pct, to=to_pct, unit='%', rounding='closest')
        
        name2len = {'train': 100000}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 153μs -> 137μs (12.2% faster)

    def test_very_small_split_absolute(self):
        """Test instruction on very small split with absolute indices."""
        # Create instruction for tiny split
        instr = ReadInstruction('tiny', from_=0, to=5, unit='abs')
        name2len = {'tiny': 10}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.29μs -> 3.81μs (12.7% faster)

    def test_single_element_slice(self):
        """Test instruction that selects a single element."""
        # Create instruction for single element
        instr = ReadInstruction('train', from_=50000, to=50001, unit='abs')
        name2len = {'train': 100000}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 4.28μs -> 3.72μs (15.0% faster)

    def test_dense_percentage_coverage(self):
        """Test that concatenated percentage slices cover entire dataset."""
        # Create 20 equal percentage slices that should cover 100%
        instr = ReadInstruction('train', from_=0, to=5, unit='%', rounding='closest')
        for i in range(1, 20):
            from_pct = i * 5
            to_pct = (i + 1) * 5
            instr = instr + ReadInstruction('train', from_=from_pct, to=to_pct, unit='%', rounding='closest')
        
        name2len = {'train': 100000}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 35.0μs -> 31.6μs (10.8% faster)

    def test_pct1_dropremainder_with_large_split(self):
        """Test pct1_dropremainder rounding with large split size."""
        # Create instruction with pct1_dropremainder on large split
        instr = ReadInstruction('train', from_=0, to=1, unit='%', rounding='pct1_dropremainder')
        name2len = {'train': 10000}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 5.87μs -> 5.34μs (10.0% faster)

    def test_concatenated_same_split_multiple_times(self):
        """Test concatenating same split multiple times."""
        # Create multiple non-overlapping slices of the same split
        instr = ReadInstruction('train', from_=0, to=10000, unit='abs')
        for i in range(1, 10):
            from_idx = i * 10000
            to_idx = (i + 1) * 10000
            instr = instr + ReadInstruction('train', from_=from_idx, to=to_idx, unit='abs')
        
        name2len = {'train': 100000}
        
        # Call to_absolute
        codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 13.9μs -> 11.8μs (17.8% faster)
        # All should be from 'train' split
        for abs_instr in result:
            pass

    def test_result_is_list_of_absolute_instructions(self):
        """Test that to_absolute always returns a list of _AbsoluteInstruction objects."""
        # Create various instructions and verify return type
        instr1 = ReadInstruction('train', from_=0, to=50, unit='abs')
        instr2 = ReadInstruction('test', from_=0, to=100, unit='abs')
        combined = instr1 + instr2
        
        name2len = {'train': 100, 'test': 100}
        
        # Call to_absolute
        codeflash_output = combined.to_absolute(name2len); result = codeflash_output # 5.64μs -> 4.75μs (18.9% faster)
        # Verify all elements are _AbsoluteInstruction
        for item in result:
            pass

    def test_percentage_consistency_across_different_sizes(self):
        """Test that percentage boundaries scale correctly with split size."""
        # Test same percentage instruction with different split sizes
        instr = ReadInstruction('data', from_=10, to=90, unit='%', rounding='closest')
        
        test_sizes = [100, 1000, 10000, 100000]
        results = []
        
        for size in test_sizes:
            name2len = {'data': size}
            codeflash_output = instr.to_absolute(name2len); result = codeflash_output # 13.4μs -> 12.4μs (8.16% faster)
            results.append(result[0])
        
        # Verify percentages are consistent (10-90% of each size)
        for i, (size, result) in enumerate(zip(test_sizes, results)):
            expected_from = int(round(10 * size / 100.0))
            expected_to = int(round(90 * size / 100.0))
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-ReadInstruction.to_absolute-mlcdh7sw and push.

Codeflash Static Badge

The optimized code achieves a **14% runtime improvement** through several targeted micro-optimizations in the `_rel_to_abs_instr` function, which is called for every relative instruction conversion:

## Key Optimizations

**1. Reduced Attribute Access Overhead**
The original code repeatedly accessed `rel_instr.rounding` and `rel_instr.unit` multiple times. The optimized version localizes these attributes once at the start:
```python
rounding = rel_instr.rounding
unit = rel_instr.unit
```
This eliminates repeated attribute lookups, which in Python involves dictionary access on the object's `__dict__`. Since this function may be called frequently (as shown by the large-scale tests with 500+ instructions), this reduction compounds.

**2. Streamlined Boundary Validation**
The original code had two separate `if` statements checking percent boundaries:
```python
if self.unit == "%" and self.from_ is not None and abs(self.from_) > 100:
    raise ValueError(...)
if self.unit == "%" and self.to is not None and abs(self.to) > 100:
    raise ValueError(...)
```
The optimized version groups these under a single `if self.unit == "%":` check, eliminating one redundant unit comparison per validation.

**3. Optimized Min/Max Operations**
The original code used `max(num_examples + from_, 0)` and `min(from_, num_examples)` for clamping. The optimized version breaks these into explicit comparisons:
```python
if from_ < 0:
    from_ = num_examples + from_
    if from_ < 0:
        from_ = 0
```
This avoids function call overhead from `max()` and `min()` and provides more predictable branch behavior for the CPU's branch predictor.

## Performance Impact by Test Category

- **Basic operations** (single instruction conversion): 5-19% faster - most benefit from reduced attribute access
- **Negative index handling**: 16-19% faster - streamlined clamping logic pays off
- **Large-scale tests** (500 instructions): 16% faster - cumulative benefit of per-instruction savings
- **Percentage slicing**: 4-15% faster - benefits from consolidated validation and localized `unit` variable

The optimization is particularly effective for workloads that:
- Convert many instructions in batch (like 10-fold cross-validation scenarios)
- Use absolute indices with negative values (common Python-style slicing)
- Process datasets with various split configurations repeatedly

These changes maintain identical behavior and outputs while reducing the CPU cycles needed for each instruction conversion, making dataset loading operations more efficient.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 February 7, 2026 13:51
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants