Skip to content

Add quote style and trimming to csv writier#20813

Open
xanderbailey wants to merge 7 commits intoapache:mainfrom
xanderbailey:xb/csv_writer_options
Open

Add quote style and trimming to csv writier#20813
xanderbailey wants to merge 7 commits intoapache:mainfrom
xanderbailey:xb/csv_writer_options

Conversation

@xanderbailey
Copy link
Contributor

@xanderbailey xanderbailey commented Mar 8, 2026

Which issue does this PR close?

Related arrow-rs PRs apache/arrow-rs#8960 and apache/arrow-rs#9004

Rationale for this change

The CSV writer was missing support for quote_style, ignore_leading_whitespace, and ignore_trailing_whitespace options that are available on the underlying arrow WriterBuilder. This meant users couldn't control quoting behaviour or whitespace trimming when writing CSV files.

What changes are included in this PR?

Adds three new CSV writer options wired through the full stack:

  • quote_style — controls when fields are quoted (Always, Necessary, NonNumeric, Never). Modelled as a protobuf enum (CsvQuoteStyle).
  • ignore_leading_whitespace — trims leading whitespace from string values on write.
  • ignore_trailing_whitespace — trims trailing whitespace from string values on write.

Are these changes tested?

Yes — sqllogictest coverage added in csv_files.slt

Are there any user-facing changes?

Three new format.* options available in COPY TO and CREATE EXTERNAL TABLE for CSV:

  • format.quote_style (string: Always, Necessary, NonNumeric, Never)
  • format.ignore_leading_whitespace (boolean)
  • format.ignore_trailing_whitespace (boolean)

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate proto Related to proto crate labels Mar 8, 2026
LOCATION 'test_files/scratch/csv_files/quote_style_always.csv'
OPTIONS ('format.has_header' 'true', 'format.quote_style' 'Never');

# All values should have been quoted, but reading them back strips the quotes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard in SLT to actually test these but I did my best...

pub quote_style: i32,
/// Whether to ignore leading whitespace in string values
#[prost(bytes = "vec", tag = "21")]
pub ignore_leading_whitespace: ::prost::alloc::vec::Vec<u8>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the pattern here for other bools being Vec

pub quote_style: i32,
/// Whether to ignore leading whitespace in string values
#[prost(bytes = "vec", tag = "21")]
pub ignore_leading_whitespace: ::prost::alloc::vec::Vec<u8>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the pattern here for other bools being Vec

Ok(CsvQuoteStyleProto::Always) => CsvQuoteStyle::Always,
Ok(CsvQuoteStyleProto::NonNumeric) => CsvQuoteStyle::NonNumeric,
Ok(CsvQuoteStyleProto::Never) => CsvQuoteStyle::Never,
_ => CsvQuoteStyle::Necessary,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't error on:

            compression: match proto.compression {
                0 => CompressionTypeVariant::GZIP,
                1 => CompressionTypeVariant::BZIP2,
                2 => CompressionTypeVariant::XZ,
                3 => CompressionTypeVariant::ZSTD,
                _ => CompressionTypeVariant::UNCOMPRESSED,
            },

So made the same true here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add quote-style parameter for CSV options

1 participant