Skip to content

Improve DataFusion stability for small batch_size values #20919

@comphead

Description

@comphead

Describe the bug

Built DataFusion with batch.size = 1 in the config.rs

        pub batch_size: usize, default = 1

And realized there are failed unit and slt tests. Kind of tests which expected to fail:

  • explain plans changed
  • result differently sorted across rows or across arrays

But there also issues when the test crashes on internal assertions, correctness issues, panics

Some examples

thread 'tokio-runtime-worker' (3580060) panicked at /Users/ovoievodin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-58.0.0/src/coalesce/primitive.rs:61:9:
assertion `left == right` failed
  left: 4
 right: 2
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Completed 5 test files in 1 second                                                                                                                                                                         failure in group_by.slt for sql SELECT FIRST_VALUE(x)
FROM FOO;
caused by
External error: task 17 panicked with message "assertion `left == right` failed\n  left: 4\n right: 2"
Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests

To Reproduce

just run cargo test

Expected behavior

At least shouldn't crash, I'm not sure how fuzz test it as the explain plans are connected to the batch_size = 8192 but would be nice to check correctness issues or crashes with different batch sizes

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions