Skip to content

feat: expose arrow schema on async avro reader#9534

Merged
alamb merged 4 commits intoapache:mainfrom
mzabaluev:expose-arrow-schema-on-async-avro-reader
Mar 11, 2026
Merged

feat: expose arrow schema on async avro reader#9534
alamb merged 4 commits intoapache:mainfrom
mzabaluev:expose-arrow-schema-on-async-avro-reader

Conversation

@mzabaluev
Copy link
Contributor

Rationale for this change

Exposes the Arrow schema produced by the async Avro file reader, similarly to the schema method on the synchronous reader.

This allows an application to prepare casting or other schema transformations with no need to fetch the first record batch to learn the produced Arrow schema. Since the async reader only parses OCF content for the moment, the schema does not change from batch to batch.

What changes are included in this PR?

The schema method for AsyncAvroFileReader exposes the Arrow schema of record batches that are produced by the reader.

Are these changes tested?

Added tests verifying that the returned schema matches the expected.

Are there any user-facing changes?

Added a schema method to AsyncAvroFileReader.

Add a schema method to obtain the Arrow schema from the async Avro
reader.
Add metadata on fields of nested records and the list type,
so that the expected schema matches the one produced by the reader.
Add a test reading nested_records.avro to verify the schema exposed
by the reader.
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Mar 10, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mzabaluev -- the code looks good I just had some test comments

cc @jecsand838


#[tokio::test]
async fn test_arrow_schema_from_reader_no_reader_schema() {
// Use a very small header size hint to force multiple fetches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments seem out of date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have cleaned these up. Done now.

let location = Path::from_filesystem_path(&file).unwrap();
let file_size = store.head(&location).await.unwrap().size;

let file_reader = AvroObjectReader::new(store, location);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also reduce some of the duplication in this test so that it is easier to understand what is actually being tested and what is different between the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added clarifying comments explaining the purpose and differences of code in each of the added cases. Hope this helps.


#[tokio::test]
async fn test_arrow_schema_from_reader_with_reader_schema() {
// Use a very small header size hint to force multiple fetches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise this comment seems outdated

Remove copy-pasted comments that don't apply to the new tests.
In the test with the reader schema, update the test to use a projected
schema and verify that the reader schema is applied correctly.
Add comments explaining the expectations for each test case.
@mzabaluev-flarion mzabaluev-flarion force-pushed the expose-arrow-schema-on-async-avro-reader branch from 017c28a to 4b39bef Compare March 11, 2026 12:26
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- looks good to me

Copy link
Contributor

@jecsand838 jecsand838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb
Copy link
Contributor

alamb commented Mar 11, 2026

🚢 🇮🇹

@alamb alamb merged commit 6931d88 into apache:main Mar 11, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants