Panic in DataFusion 54.0.0 when ordering Parquet scan by computed projection alias

### Describe the bug

## Title

Panic in DataFusion 54.0.0 when ordering Parquet scan by computed projection alias with statistics collection enabled

## Description

I hit a panic in DataFusion 54.0.0 when querying a Parquet dataset from S3 and ordering by a computed projection alias.

The query is valid SQL and runs successfully in DuckDB on the same Parquet file. In DataFusion CLI, it panics during
statistics handling.

## Environment

- DataFusion version: 54.0.0
- Interface: DataFusion CLI and Rust API
- Source: Parquet files on S3
- Dataset layout: Hive-style path segment, e.g.

s3://<redacted-bucket>/<redacted-prefix>/partition_col=some_value/

The query does not reference partition_col.

The physical Parquet schema contains an id column.

## Minimal SQL
```
CREATE EXTERNAL TABLE profile
STORED AS PARQUET
LOCATION 's3://<redacted-bucket>/<redacted-prefix>/partition_col=some_value/';

SELECT
  (((CAST(id AS BIGINT) % 1024) + 1024) % 1024) AS computed_bucket
FROM profile
ORDER BY computed_bucket, CAST(id AS BIGINT)
LIMIT 10;
```
## Observed panic

thread 'main' panicked at .../datafusion-datasource-54.0.0/src/statistics.rs:100:48:
index out of bounds: the len is 0 but the index is 0

The relevant source location appears to be:
```
// datafusion-datasource-54.0.0/src/statistics.rs
if i < s.column_statistics.len() {
    ...
} else {
    let partition_value = &pv[i - s.column_statistics.len()];
    ...
}
```
It looks like this path assumes that when a sort/statistics column index exceeds column_statistics.len(), the column must be
a partition column and therefore indexes into partition_values. In this case partition_values is empty, causing the panic.

## Additional observation

The physical id column does have Parquet min/max statistics. I checked with DuckDB metadata:
```
SELECT
  count(*) AS row_groups,
  count(*) FILTER (WHERE stats_min IS NULL OR stats_max IS NULL) AS missing_minmax,
  count(*) FILTER (WHERE stats_min IS NOT NULL AND stats_max IS NOT NULL) AS has_minmax
FROM parquet_metadata('s3://<redacted-bucket>/<redacted-prefix>/partition_col=some_value/*.parquet')
WHERE path_in_schema = 'id';
```
Result:

row_groups = 75
missing_minmax = 0
has_minmax = 75

So this does not appear to be caused by missing min/max statistics for id.

## DuckDB comparison

The same query shape works in DuckDB:
```
SELECT
  (((CAST(id AS BIGINT) % 1024) + 1024) % 1024) AS computed_bucket
FROM 's3://<redacted-bucket>/<redacted-prefix>/partition_col=some_value/<redacted-file>.parquet'
ORDER BY computed_bucket, CAST(id AS BIGINT)
LIMIT 10;
```
## Expected behavior

DataFusion should not panic.

If statistics cannot be derived for the computed expression computed_bucket, I would expect DataFusion to fall back to
unknown/absent column statistics and continue executing the query.


### To Reproduce

_No response_

### Expected behavior

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Panic in DataFusion 54.0.0 when ordering Parquet scan by computed projection alias #23219

Describe the bug

Title

Description

Environment

Minimal SQL

Observed panic

Additional observation

DuckDB comparison

Expected behavior

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Panic in DataFusion 54.0.0 when ordering Parquet scan by computed projection alias #23219

Description

Describe the bug

Title

Description

Environment

Minimal SQL

Observed panic

Additional observation

DuckDB comparison

Expected behavior

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions