Add OpenLineage extension for Kafka Connect by rahul-madaan · Pull Request #22050 · apache/kafka

rahul-madaan · 2026-04-14T10:07:37Z

Add a new connect/openlineage-extension module that emits OpenLineage lineage events from Kafka Connect connectors. This enables automatic data lineage tracking for Connect pipelines, following the same pattern used by the OpenLineage integrations for Apache Spark and Apache Flink.

The extension implements ConnectRestExtension (KIP-285) and runs inside the Connect worker JVM. A background monitor polls ConnectClusterState to detect connector lifecycle changes (start, pause, fail, delete) and emits OpenLineage RunEvents with input/output dataset information.

Supported connectors:

JDBC Source/Sink (PostgreSQL, MySQL, SQL Server, Oracle, DB2, Redshift)
Debezium CDC (all variants)
S3, GCS, Azure Blob, HDFS Sink
MongoDB Source/Sink
Elasticsearch Sink
BigQuery, Snowflake, Redshift Sink
MirrorMaker 2
HTTP Sink
Generic fallback for unknown connectors

Events follow the OpenLineage spec (https://openlineage.io/docs/spec/naming/) and include:

processing_engine run facet (name: kafka-connect)
jobType job facet (processingType: STREAMING, integration: KAFKA_CONNECT)
errorMessage run facet on FAIL events
Input/output datasets on all event types (START, COMPLETE, FAIL)

Transport options: HTTP (with Bearer auth), File (NDJSON), Console (SLF4J). Configuration via OPENLINEAGE_CONFIG env var or openlineage.* worker properties.

mimaison · 2026-04-14T12:35:51Z

Thanks for the PR. This introduces new public APIs so this change requires a Kafka Improvement Proposal (KIP). See https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals for the process

Add a new connect/openlineage-extension module that emits OpenLineage lineage events from Kafka Connect connectors. This enables automatic data lineage tracking for Connect pipelines, following the same pattern used by the OpenLineage integrations for Apache Spark and Apache Flink. The extension implements ConnectRestExtension (KIP-285) and runs inside the Connect worker JVM. A background monitor polls ConnectClusterState to detect connector lifecycle changes (start, pause, fail, delete) and emits OpenLineage RunEvents with input/output dataset information. Event lifecycle (matches Flink streaming pattern): START — when a connector is first observed RUNNING RUNNING — periodic heartbeat (default every 5 minutes, configurable) confirming the connector is still active with current lineage COMPLETE — when a connector is paused or deleted FAIL — when a connector enters FAILED state Supported connectors (17 visitors): - JDBC Source/Sink (PostgreSQL, MySQL, SQL Server, Oracle, DB2, Redshift) - Debezium CDC (all variants) - S3, GCS, Azure Blob, HDFS Sink - MongoDB Source/Sink - Elasticsearch Sink - BigQuery, Snowflake, Redshift Sink - MirrorMaker 2, HTTP Sink - Generic fallback for unknown connectors Events follow the OpenLineage spec (https://openlineage.io/docs/spec/naming/) and include: - processing_engine run facet (name, version, adapter version) - jobType job facet (processingType: STREAMING, integration: KAFKA_CONNECT) - errorMessage run facet on FAIL events - Input/output datasets on all event types (START, RUNNING, COMPLETE, FAIL) - Lineage cached per-connector for reliable COMPLETE events on deletion - Lineage refreshed on RUNNING events to detect config changes Dataset namespaces follow OL naming conventions: postgres://, mysql://, kafka://, s3://, gs://, wasbs://, hdfs://, mongodb://, elasticsearch://, bigquery, snowflake://, redshift://, cassandra:// Transport options: HTTP (with Bearer auth), File (NDJSON), Console (SLF4J). Configuration via OPENLINEAGE_CONFIG env var or openlineage.* worker properties. Signed-off-by: Rahul Madan <madan.rahul9@gmail.com>

github-actions bot added triage PRs from the community connect build Gradle build or GitHub Actions labels Apr 14, 2026

rahul-madaan marked this pull request as draft April 14, 2026 10:08

rahul-madaan marked this pull request as ready for review April 14, 2026 10:50

mimaison added the kip Requires or implements a KIP label Apr 14, 2026

rahul-madaan force-pushed the kafka-connect-openlineage-extension branch from 23498f1 to 802f4a0 Compare April 14, 2026 17:55

rahul-madaan marked this pull request as draft April 14, 2026 20:20

rahul-madaan force-pushed the kafka-connect-openlineage-extension branch from 802f4a0 to b02a7d5 Compare April 14, 2026 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenLineage extension for Kafka Connect#22050

Add OpenLineage extension for Kafka Connect#22050
rahul-madaan wants to merge 1 commit intoapache:trunkfrom
rahul-madaan:kafka-connect-openlineage-extension

rahul-madaan commented Apr 14, 2026

Uh oh!

mimaison commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rahul-madaan commented Apr 14, 2026

Uh oh!

mimaison commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants