Add quote style and trimming to csv writier#20813
Open
xanderbailey wants to merge 7 commits intoapache:mainfrom
Open
Add quote style and trimming to csv writier#20813xanderbailey wants to merge 7 commits intoapache:mainfrom
xanderbailey wants to merge 7 commits intoapache:mainfrom
Conversation
xanderbailey
commented
Mar 8, 2026
| LOCATION 'test_files/scratch/csv_files/quote_style_always.csv' | ||
| OPTIONS ('format.has_header' 'true', 'format.quote_style' 'Never'); | ||
|
|
||
| # All values should have been quoted, but reading them back strips the quotes |
Contributor
Author
There was a problem hiding this comment.
It's hard in SLT to actually test these but I did my best...
xanderbailey
commented
Mar 8, 2026
| pub quote_style: i32, | ||
| /// Whether to ignore leading whitespace in string values | ||
| #[prost(bytes = "vec", tag = "21")] | ||
| pub ignore_leading_whitespace: ::prost::alloc::vec::Vec<u8>, |
Contributor
Author
There was a problem hiding this comment.
Following the pattern here for other bools being Vec
xanderbailey
commented
Mar 8, 2026
| pub quote_style: i32, | ||
| /// Whether to ignore leading whitespace in string values | ||
| #[prost(bytes = "vec", tag = "21")] | ||
| pub ignore_leading_whitespace: ::prost::alloc::vec::Vec<u8>, |
Contributor
Author
There was a problem hiding this comment.
Following the pattern here for other bools being Vec
xanderbailey
commented
Mar 8, 2026
| Ok(CsvQuoteStyleProto::Always) => CsvQuoteStyle::Always, | ||
| Ok(CsvQuoteStyleProto::NonNumeric) => CsvQuoteStyle::NonNumeric, | ||
| Ok(CsvQuoteStyleProto::Never) => CsvQuoteStyle::Never, | ||
| _ => CsvQuoteStyle::Necessary, |
Contributor
Author
There was a problem hiding this comment.
We don't error on:
compression: match proto.compression {
0 => CompressionTypeVariant::GZIP,
1 => CompressionTypeVariant::BZIP2,
2 => CompressionTypeVariant::XZ,
3 => CompressionTypeVariant::ZSTD,
_ => CompressionTypeVariant::UNCOMPRESSED,
},
So made the same true here
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related arrow-rs PRs apache/arrow-rs#8960 and apache/arrow-rs#9004
Rationale for this change
The CSV writer was missing support for
quote_style,ignore_leading_whitespace, andignore_trailing_whitespaceoptions that are available on the underlying arrowWriterBuilder. This meant users couldn't control quoting behaviour or whitespace trimming when writing CSV files.What changes are included in this PR?
Adds three new CSV writer options wired through the full stack:
quote_style— controls when fields are quoted (Always,Necessary,NonNumeric,Never). Modelled as a protobuf enum (CsvQuoteStyle).ignore_leading_whitespace— trims leading whitespace from string values on write.ignore_trailing_whitespace— trims trailing whitespace from string values on write.Are these changes tested?
Yes — sqllogictest coverage added in
csv_files.sltAre there any user-facing changes?
Three new
format.*options available in COPY TO and CREATE EXTERNAL TABLE for CSV:format.quote_style(string:Always,Necessary,NonNumeric,Never)format.ignore_leading_whitespace(boolean)format.ignore_trailing_whitespace(boolean)