⚡️ Speed up method `JitDecoratorDetector.visit_ImportFrom` by 522% in PR #1335 (`gpu-flag`) by codeflash-ai[bot] · Pull Request #1346 · codeflash-ai/codeflash

codeflash-ai · 2026-02-04T00:29:29Z

⚡️ This pull request contains optimizations for PR #1335

If you approve this dependent PR, these changes will be merged into the original PR branch gpu-flag.

This PR will be automatically closed if the original PR is merged.

📄 522% (5.22x) speedup for `JitDecoratorDetector.visit_ImportFrom` in `codeflash/code_utils/line_profile_utils.py`

⏱️ Runtime : 473 microseconds → 76.0 microseconds (best of 247 runs)

📝 Explanation and details

The optimized code achieves a 521% speedup (from 473μs to 76μs) by eliminating unnecessary AST traversal overhead. The key improvement comes from recognizing that generic_visit(node) was being called unconditionally at the end of visit_ImportFrom, accounting for 84.3% of the original runtime (5.66ms out of 6.71ms total).

Key Optimizations:

Conditional generic_visit: The optimization only calls generic_visit if a visit_alias method exists on the instance, using getattr(self, "visit_alias", None). Since JitDecoratorDetector doesn't define this method, the traversal is skipped entirely in the common case, eliminating the 84% overhead.
Local variable caching: Two micro-optimizations reduce attribute lookup overhead:
- module_name = node.module (cached once instead of accessed twice)
- import_aliases = self.import_aliases (avoids repeated self. lookups in the loop)

Why This Works:

The original code always traversed child nodes via generic_visit, but ImportFrom nodes' children (the alias objects in node.names) were already being processed directly in the for-loop. The redundant traversal had no semantic benefit for this class but consumed significant time walking the AST structure.

Test Results:

The optimization excels across all test cases:

Simple imports: 283-302% faster (single/multiple names)
Large-scale test (500 imports): 614% faster (439μs → 61.6μs)
The only regression is the module=None edge case (9% slower), which is negligible in absolute terms (4.26μs vs 3.87μs) and rarely encountered

Impact:

This is an AST parsing utility likely used during static analysis or code transformation pipelines. Since AST traversal often happens repeatedly (e.g., analyzing multiple files or running in CI/CD), the 5x speedup compounds significantly in real workloads. The optimization is particularly effective for codebases with many import statements.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 44 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 2 Passed
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import ast  # used to construct AST nodes for testing

import pytest  # used for our unit tests
from codeflash.code_utils.line_profile_utils import JitDecoratorDetector

def test_basic_single_import():
    """
    Basic case: a simple 'from numba import jit' should add one entry mapping 'jit' to ('numba','jit').
    """
    detector = JitDecoratorDetector()  # create an instance of the class under test

    # Construct an ast.ImportFrom node equivalent to: from numba import jit
    node = ast.ImportFrom(module='numba', names=[ast.alias(name='jit', asname=None)], level=0)

    # Call the method under test. It should return None (no explicit return value).
    codeflash_output = detector.visit_ImportFrom(node); result = codeflash_output # 4.80μs -> 1.21μs (296% faster)

def test_alias_import_with_asname():
    """
    Verify that 'from numba import jit as my_jit' stores the local alias 'my_jit'
    mapping to the original name 'jit' and module 'numba'.
    """
    detector = JitDecoratorDetector()

    # Simulate: from numba import jit as my_jit
    node = ast.ImportFrom(module='numba', names=[ast.alias(name='jit', asname='my_jit')], level=0)

    detector.visit_ImportFrom(node) # 4.34μs -> 1.13μs (283% faster)

def test_multiple_names_import():
    """
    Multiple imports in a single statement: 'from numba import jit, cuda' should add both names.
    """
    detector = JitDecoratorDetector()

    # Simulate: from numba import jit, cuda
    names = [ast.alias(name='jit', asname=None), ast.alias(name='cuda', asname=None)]
    node = ast.ImportFrom(module='numba', names=names, level=0)

    detector.visit_ImportFrom(node) # 5.55μs -> 1.38μs (302% faster)

def test_module_none_calls_generic_visit_and_makes_no_changes():
    """
    Edge case: when node.module is None (e.g., 'from . import something'), the function should
    call generic_visit and return early, leaving import_aliases unchanged.
    """
    detector = JitDecoratorDetector()
    # Pre-populate import_aliases to ensure it remains unchanged after the call
    detector.import_aliases['preexist'] = ('existing_mod', 'existing_name')

    # Construct ImportFrom with module=None to simulate a relative import like 'from . import foo'
    node = ast.ImportFrom(module=None, names=[ast.alias(name='foo', asname=None)], level=1)

    codeflash_output = detector.visit_ImportFrom(node); result = codeflash_output # 3.87μs -> 4.26μs (9.18% slower)

def test_overwrite_alias_behaviour():
    """
    If two from-imports define the same local name, the later one should overwrite the previous entry.
    This verifies deterministic last-write-wins semantics.
    """
    detector = JitDecoratorDetector()

    # First import: from modA import name as local
    node1 = ast.ImportFrom(module='modA', names=[ast.alias(name='origA', asname='local')], level=0)
    detector.visit_ImportFrom(node1) # 4.19μs -> 1.14μs (267% faster)

    # Second import: from modB import other as local  (same local name 'local')
    node2 = ast.ImportFrom(module='modB', names=[ast.alias(name='origB', asname='local')], level=0)
    detector.visit_ImportFrom(node2) # 2.58μs -> 611ns (321% faster)

def test_empty_asname_is_treated_as_missing():
    """
    If alias.asname is an empty string (a falsy value), the code uses alias.name instead.
    Confirm that behavior by providing asname='' and checking the stored key is alias.name.
    """
    detector = JitDecoratorDetector()

    # Create an alias whose asname is an empty string (falsy), which should make the implementation
    # use alias.name as the local_name (asname or name).
    alias_with_empty_asname = ast.alias(name='real_name', asname='')

    node = ast.ImportFrom(module='some_mod', names=[alias_with_empty_asname], level=0)
    detector.visit_ImportFrom(node) # 4.24μs -> 1.12μs (278% faster)

def test_large_scale_many_names():
    """
    Large-scale test: create a single ImportFrom node with a large but bounded number of names
    (500 entries) and assert that all entries are recorded correctly.
    This exercises performance and scalability without exceeding the provided limits.
    """
    detector = JitDecoratorDetector()

    module_name = 'big_module'
    count = 500  # well under the 1000-element limit

    # Build 500 different imported names with local aliases 'alias_i' mapping to original 'name_i'
    names = []
    for i in range(count):
        # Use asname to create local names distinct from the original names
        names.append(ast.alias(name=f'name_{i}', asname=f'alias_{i}'))
    node = ast.ImportFrom(module=module_name, names=names, level=0)

    detector.visit_ImportFrom(node) # 439μs -> 61.6μs (614% faster)
    mid_index = count // 2
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import ast

import pytest
from codeflash.code_utils.line_profile_utils import JitDecoratorDetector

def test_visit_import_from_basic_jit_import():
    """Test that a basic 'from numba import jit' import is tracked correctly."""
    code = "from numba import jit"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_with_alias():
    """Test that 'from numba import jit as my_jit' is tracked with the alias."""
    code = "from numba import jit as my_jit"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_multiple_imports():
    """Test that multiple imports from the same module are all tracked."""
    code = "from numba import jit, njit, vectorize"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_mixed_aliases():
    """Test multiple imports with some having aliases and some not."""
    code = "from numba import jit, njit as fast_jit"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_different_modules():
    """Test imports from different modules are all tracked correctly."""
    code = """
from numba import jit
from tensorflow import function as tf_function
from torch import jit as torch_jit
"""
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_none_module():
    """Test handling of relative imports (module is None)."""
    code = "from . import helper"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_relative_parent_import():
    """Test handling of parent relative imports (module is None)."""
    code = "from .. import module"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_wildcard_import():
    """Test handling of wildcard imports 'from module import *'."""
    code = "from numba import *"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_empty_names():
    """Test that imports with no names (edge case) don't crash."""
    # This is a syntactically valid but unusual case
    code = "from numba import jit"
    tree = ast.parse(code)
    # Manually create a node with empty names list (edge case)
    import_node = tree.body[0]
    
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_very_long_import_name():
    """Test handling of very long import names."""
    code = "from some_very_long_module_name_that_is_quite_extended import some_very_long_function_name"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_unicode_in_module_name():
    """Test handling of unicode characters in names (valid Python identifiers)."""
    code = "from module import ñame"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_duplicate_names_overwrites():
    """Test that duplicate imports overwrite previous entries."""
    code = """
from numba import jit
from tensorflow import jit
"""
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_preserves_generic_visit():
    """Test that generic_visit is called to process child nodes."""
    code = "from numba import jit"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    
    # Should not raise any exceptions
    detector.visit(tree)

def test_visit_import_from_nested_in_function():
    """Test import statement inside a function."""
    code = """
def my_function():
    from numba import jit
    return jit
"""
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_nested_in_class():
    """Test import statement inside a class."""
    code = """
class MyClass:
    from numba import jit
"""
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_single_character_name():
    """Test handling of single character import names."""
    code = "from module import x"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_underscore_names():
    """Test handling of underscore-based names."""
    code = "from module import _private, __dunder__"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_numeric_suffix_names():
    """Test handling of names with numeric suffixes."""
    code = "from module import func1, func2, func10"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_case_sensitivity():
    """Test that import names are case-sensitive."""
    code = "from module import Jit, jit, JIT"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_many_imports_from_single_module():
    """Test importing many names from a single module."""
    # Create code with 100 imports from the same module
    import_names = ", ".join([f"func{i}" for i in range(100)])
    code = f"from module import {import_names}"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)
    for i in range(100):
        pass

def test_visit_import_from_many_imports_with_aliases():
    """Test importing many names with aliases."""
    # Create code with 100 imports, each with an alias
    imports = ", ".join([f"func{i} as alias{i}" for i in range(100)])
    code = f"from module import {imports}"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)
    for i in range(100):
        pass

def test_visit_import_from_many_import_statements():
    """Test processing many import statements sequentially."""
    # Create 100 separate import statements
    import_statements = "\n".join([f"from module{i} import func{i}" for i in range(100)])
    tree = ast.parse(import_statements)
    detector = JitDecoratorDetector()
    detector.visit(tree)
    for i in range(100):
        pass

def test_visit_import_from_large_mixed_scenario():
    """Test a realistic scenario with many mixed imports."""
    code = """
from numba import jit, njit, vectorize
from tensorflow import function as tf_func, keras
from torch import jit as torch_jit, nn
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from pandas import DataFrame, Series, concat as pd_concat
from numpy import array, zeros, ones as np_ones
"""
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_preserves_long_module_paths():
    """Test that deeply nested module paths are preserved correctly."""
    code = "from some.very.deeply.nested.module.structure.path import function"
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_handles_all_import_combinations():
    """Test various combinations of imports together."""
    code = """
from a import x
from b import y as z
from c import p, q, r
from d import m as m1, n as n1, o
from e import e1, e2 as e2_alias, e3
"""
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)

def test_visit_import_from_state_independence():
    """Test that detector state is independent across visits."""
    code1 = "from numba import jit"
    tree1 = ast.parse(code1)
    detector1 = JitDecoratorDetector()
    detector1.visit(tree1)
    
    code2 = "from tensorflow import function"
    tree2 = ast.parse(code2)
    detector2 = JitDecoratorDetector()
    detector2.visit(tree2)

def test_visit_import_from_repeated_visits_accumulate():
    """Test that multiple visits to the same detector accumulate imports."""
    detector = JitDecoratorDetector()
    
    code1 = "from numba import jit"
    tree1 = ast.parse(code1)
    detector.visit(tree1)
    
    code2 = "from tensorflow import function"
    tree2 = ast.parse(code2)
    detector.visit(tree2)

def test_visit_import_from_maintains_correct_module_name():
    """Test that the module name is preserved correctly for all import styles."""
    code = """
from a.b.c import x
from x.y import z as z_alias
from m.n.o.p import q, r as r_alias
"""
    tree = ast.parse(code)
    detector = JitDecoratorDetector()
    detector.visit(tree)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from ast import ImportFrom
from codeflash.code_utils.line_profile_utils import JitDecoratorDetector

def test_JitDecoratorDetector_visit_ImportFrom():
    JitDecoratorDetector.visit_ImportFrom(JitDecoratorDetector(), ImportFrom())

🔎 Click to see Concolic Coverage Tests

To edit these changes git checkout codeflash/optimize-pr1335-2026-02-04T00.29.23 and push.

Add a `gpu` parameter to instrument tests with torch.cuda.Event timing instead of time.perf_counter_ns() for measuring GPU kernel execution time. Falls back to CPU timing when CUDA is not available/initialized. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix unused variables, single-item membership tests, unnecessary lambdas, and ternary expressions that can use `or` operator. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The optimized code achieves a **521% speedup** (from 473μs to 76μs) by eliminating unnecessary AST traversal overhead. The key improvement comes from recognizing that `generic_visit(node)` was being called unconditionally at the end of `visit_ImportFrom`, accounting for **84.3% of the original runtime** (5.66ms out of 6.71ms total). **Key Optimizations:** 1. **Conditional generic_visit**: The optimization only calls `generic_visit` if a `visit_alias` method exists on the instance, using `getattr(self, "visit_alias", None)`. Since `JitDecoratorDetector` doesn't define this method, the traversal is skipped entirely in the common case, eliminating the 84% overhead. 2. **Local variable caching**: Two micro-optimizations reduce attribute lookup overhead: - `module_name = node.module` (cached once instead of accessed twice) - `import_aliases = self.import_aliases` (avoids repeated `self.` lookups in the loop) **Why This Works:** The original code always traversed child nodes via `generic_visit`, but `ImportFrom` nodes' children (the `alias` objects in `node.names`) were already being processed directly in the for-loop. The redundant traversal had no semantic benefit for this class but consumed significant time walking the AST structure. **Test Results:** The optimization excels across all test cases: - Simple imports: **283-302% faster** (single/multiple names) - Large-scale test (500 imports): **614% faster** (439μs → 61.6μs) - The only regression is the `module=None` edge case (9% slower), which is negligible in absolute terms (4.26μs vs 3.87μs) and rarely encountered **Impact:** This is an AST parsing utility likely used during static analysis or code transformation pipelines. Since AST traversal often happens repeatedly (e.g., analyzing multiple files or running in CI/CD), the 5x speedup compounds significantly in real workloads. The optimization is particularly effective for codebases with many import statements.

aseembits93 and others added 5 commits February 3, 2026 14:33

Merge branch 'main' into gpu-flag

dce74b1

fix: resolve ruff lint errors for pre-commit

a4e0fb4

Fix unused variables, single-item membership tests, unnecessary lambdas, and ternary expressions that can use `or` operator. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

linter fixes

805e612

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 4, 2026

codeflash-ai bot mentioned this pull request Feb 4, 2026

feat: add gpu flag for CUDA event-based timing #1335

Open

2 tasks

KRRT7 force-pushed the gpu-flag branch from e02071d to 85088c3 Compare February 4, 2026 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `JitDecoratorDetector.visit_ImportFrom` by 522% in PR #1335 (`gpu-flag`)#1346

⚡️ Speed up method `JitDecoratorDetector.visit_ImportFrom` by 522% in PR #1335 (`gpu-flag`)#1346
codeflash-ai[bot] wants to merge 5 commits intogpu-flagfrom
codeflash/optimize-pr1335-2026-02-04T00.29.23

codeflash-ai bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai bot commented Feb 4, 2026

⚡️ This pull request contains optimizations for PR #1335

📄 522% (5.22x) speedup for JitDecoratorDetector.visit_ImportFrom in codeflash/code_utils/line_profile_utils.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 522% (5.22x) speedup for `JitDecoratorDetector.visit_ImportFrom` in `codeflash/code_utils/line_profile_utils.py`