Skip to content

Conversation

@shangxinli
Copy link
Contributor

@shangxinli shangxinli commented Dec 30, 2025

Add complete API structure for Iceberg data file writers including:

Add FileWriterFactory to create data and delete writers, with
comprehensive improvements per review feedback.

File Organization:

  • Separate DataWriter to data_writer.h/.cc
  • Separate PositionDeleteWriter to position_delete_writer.h/.cc
  • Separate EqualityDeleteWriter to equality_delete_writer.h/.cc
  • Separate FileWriterFactory to file_writer_factory.h/.cc
  • Keep only FileWriter base interface in writer.h/.cc

Key Features:

  • Input validation for all factory methods (path, schema, spec)
  • Thread safety documentation (NOT thread-safe)
  • State management in stub implementations (is_closed tracking)
  • Support for Parquet and Avro formats
  • Pass-by-value + std::move for sink parameters

Implementation:

  • FileWriterFactory directly creates writers (true factory pattern)
  • Writers use friend pattern - only factory can construct them
  • Internal MakeXxxInternal functions handle cross-file construction
  • Stub implementations validate inputs before returning NotImplemented

@wgtmac
Copy link
Member

wgtmac commented Dec 31, 2025

I would expect separate PRs for DataWriter, PositionDeleteWriter and EqualityDeleteWriter because we want full-featured implementation to verify that the API design makes sense. This feature is not targeted in the 0.2.0 release so we don't need to rush. Does that make sense?

@shangxinli shangxinli force-pushed the data_file_writers_factory_new branch 6 times, most recently from 07c9ef9 to 078acb1 Compare December 31, 2025 17:08
Add FileWriterFactory to create data and delete writers, with
comprehensive improvements per review feedback.

File Organization:
- Separate DataWriter to data_writer.h/.cc
- Separate PositionDeleteWriter to position_delete_writer.h/.cc
- Separate EqualityDeleteWriter to equality_delete_writer.h/.cc
- Separate FileWriterFactory to file_writer_factory.h/.cc
- Keep only FileWriter base interface in writer.h/.cc

Key Features:
- Input validation for all factory methods (path, schema, spec)
- Thread safety documentation (NOT thread-safe)
- State management in stub implementations (is_closed tracking)
- Support for Parquet and Avro formats
- Pass-by-value + std::move for sink parameters

Implementation:
- FileWriterFactory directly creates writers (true factory pattern)
- Writers use friend pattern - only factory can construct them
- Internal MakeXxxInternal functions handle cross-file construction
- Stub implementations validate inputs before returning NotImplemented

Related to apache#441
@shangxinli shangxinli force-pushed the data_file_writers_factory_new branch from 078acb1 to e05193c Compare December 31, 2025 17:09
@shangxinli
Copy link
Contributor Author

I would expect separate PRs for DataWriter, PositionDeleteWriter and EqualityDeleteWriter because we want full-featured implementation to verify that the API design makes sense. This feature is not targeted in the 0.2.0 release so we don't need to rush. Does that make sense?

Sure. This PR provides the factory pattern and stub implementations for all three writer types. The actual implementations are planned for follow-up PRs as outlined in issue #441 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants