Speed up Cairo renderer by 2.2x on animation-heavy scenes#4695
Speed up Cairo renderer by 2.2x on animation-heavy scenes#4695HamdiBarkous wants to merge 8 commits intoManimCommunity:mainfrom
Conversation
…flat array indexing Replace Python generators and tuple unpacking with numpy-based subpath splitting and direct flat-array indexing for bezier point lookups. Same Cairo calls, same output, ~2-7x faster path building. - Replace gen_subpaths_from_points_2d generator with vectorized numpy boundary detection using np.arange + boolean masking - Replace gen_cubic_bezier_tuples_from_points generator with direct integer-range iteration over pre-flattened xy array - Eliminate per-curve numpy slice creation (*p[:2] splat) - Cache method references (ctx.curve_to → local) to avoid attribute lookup per call Benchmarks (1920x1080 @ 60fps): - set_path: 2-7x faster across scene types - Overall: up to 1.5x faster on shape/text-heavy scenes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- camera.reset(): Replace set_pixel_array() → convert_pixel_array() → np.array() (copy) → slice assignment (second copy) with a single np.copyto() call. Removes one full-frame copy per frame. - set_frame_to_background(): Same optimization for static frame restore. - renderer.get_frame(): Replace np.array() with .copy() — avoids dtype inference overhead on an already-typed array. Benchmarks (1920x1080 @ 60fps): - camera_reset: 3-10x faster (e.g. 390ms → 120ms on AnimatedTransforms) - Overall: ~2x faster across scene types when combined with set_path opt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds benchmarks/bench_lissajous.py — a heavy real-world animation workload with grid-of-circles updaters tracing Lissajous curves. Stresses the per-frame render path far more than static gallery scenes, making rendering optimizations visible end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
self.background is typed as PixelArray | None (from __init__ param) but is guaranteed non-None after init_background() runs during construction. Add an assert to satisfy mypy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This looks quite promising -- but the claim of pixel-identical ouput appears to be wrong, given our pipelines. We can accept some variation in the tests, but we would need to understand (and verify individually) why the output has changed and that it still is practically correct. |
When a VMobject had exactly nppcc points (one cubic curve, e.g. Line), np.arange(nppcc, n_pts, nppcc) returned an empty array and the function exited before drawing. The original code handled this via split_indices of [0, n_pts], yielding one subpath of all 4 points. Handle the empty-boundary case explicitly as a single subpath. Verified pixel-identical vs main across 12 scenes (Line, Dot, Square, Circle, Arrow, Text, MathTex, Polyline, DashedLine, OpenPath, mixed, animated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
|
@behackl Thanks for catching this. Fixed in the latest commit. Verified pixel-identical against main across 12 scenes (Line, Dot, Square, Circle, Arrow, Text, MathTex, Polyline, DashedLine, open polyline, mixed, animated). Can you please confirm? |
|
Looking much better, indeed -- I'll make sure to review this in more detail as soon as I can; very impressive! Our CI says that the documentation build and specifically the rendering of the Could you take a look at this? |
|
The docbuild issue happens because of this piece of code in the my_vmobject = VMobject(color=GREEN)
my_vmobject.points = [
np.array([-2, -1, 0]), # start of first curve
np.array([-3, 1, 0]),
np.array([0, 3, 0]),
np.array([1, 3, 0]), # end of first curve
np.array([1, 3, 0]), # start of second curve
np.array([0, 1, 0]),
np.array([4, 3, 0]),
np.array([4, -2, 0]), # end of second curve
]When doing this, You can fix this by changing the list above to a 2D NumPy array, but IMO a cleaner solution is to use the existing my_vmobject = VMobject(color=GREEN).set_points(
[
[-2, -1, 0], # start of first curve
[-3, 1, 0],
[0, 3, 0],
[1, 3, 0], # end of first curve
[1, 3, 0], # start of second curve
[0, 1, 0],
[4, 3, 0],
[4, -2, 0], # end of second curve
]
) |
The vectorized path builder uses numpy fancy indexing (e.g. points[boundary_indices - 1, :2]), which fails when vmobject.points is a plain Python list. The documented VMobjectDemo example sets points this way, which broke the docs build. np.asarray the points array once on entry; it's a no-op when the input is already an ndarray. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The docs' VMobjectDemo sets vmobject.points to a Python list, which broke my numpy fancy indexing. Fixed by normalizing to ndarray on entry (no-op when it already is one, which is the default). |
Summary
Two small, behavior-preserving optimizations to the Cairo renderer hot
path, plus a real-world benchmark to measure them.
Optimizations
1. Vectorized path building in
set_cairo_context_pathReplace Python generators (
gen_subpaths_from_points_2d,gen_cubic_bezier_tuples_from_points) and per-curve tupleunpacking with numpy-based subpath splitting and direct flat-array
indexing. Same Cairo API calls, same output, just less Python
overhead per bezier segment.
2. Eliminate redundant numpy copies
Camera.reset()/set_frame_to_background(): replaceset_pixel_array() -> convert_pixel_array() -> np.array() (copy) -> slice assignment (second copy)with a singlenp.copyto().CairoRenderer.get_frame(): replacenp.array()with.copy()to skip dtype inference on an already-typed array.
Both changes are purely Python-side and leave rendering output
pixel-identical.
Benchmark
Adds
benchmarks/bench_lissajous.py, a heavy real-world workload(adapted from Abhijith Muthyala's Lissajous project) with a grid
of circles + updaters tracing Lissajous curves. Stresses the per-frame
render path far more than static gallery scenes.
Results
Measured with
bench_lissajous.pyon the same machine, 1920x1080 @ 60fps:A ~20-minute wall-clock reduction on a single scene render.
Test plan
python benchmarks/bench_lissajous.pyruns to completion