Summary
When building complex connectors with multi-level substream hierarchies, it's useful to define stream definitions that are only used internally as parent streams for other streams, without exposing them as top-level streams. This pattern is currently undocumented but is actively used in production connectors.
Problem
The current YAML Reference documentation explains that only entries in the top-level streams: array are exposed as runnable streams, but it doesn't explicitly document the pattern of:
- Defining a full stream definition in
definitions that is NOT listed in streams:
- Using that definition solely as a
parent_stream_config for another stream
- The naming convention some connectors use (e.g.,
__ prefix) to signal "internal helper"
This pattern is particularly useful for 3-level nested substream hierarchies where an intermediate stream is needed to provide partition keys but shouldn't be exposed to users.
Example Implementation: Jira Connector
The Jira connector uses this pattern extensively. Here are code permalinks:
Internal/Private Stream Definitions (in definitions, NOT in streams:)
How It's Used (3-level hierarchy example)
The issue_properties_stream references the internal __issue_property_keys_substream as its parent:
issue_properties_stream:
# ...
retriever:
# ...
partition_router:
type: SubstreamPartitionRouter
parent_stream_configs:
- type: ParentStreamConfig
stream: "#/definitions/__issue_property_keys_substream" # <-- Internal stream reference
This creates a 3-level hierarchy:
issues_stream (grandparent - exposed)
__issue_property_keys_substream (parent - internal, NOT exposed)
issue_properties_stream (child - exposed)
Top-level streams: Section
The streams section only lists the streams that should be exposed to users - the __-prefixed definitions are intentionally omitted.
Suggested Documentation
Add a section to the YAML Reference or a new "Advanced Patterns" page that documents:
- Pattern: Using stream definitions as internal parent streams
- Use case: Multi-level substream hierarchies where intermediate streams shouldn't be exposed
- Naming convention: The
__ prefix convention (optional but recommended for clarity)
- Behavior: Streams not listed in
streams: will not be exposed by source.streams(config) - attempting to sync them will silently no-op
- Testing implications: When writing mock server tests, always verify stream names against the
streams: section to avoid testing non-existent streams
Context
This issue was discovered while creating comprehensive mock server tests for the Jira connector (airbytehq/airbyte#70884). The pattern caused confusion when attempting to test issue_property_keys as a stream, only to discover it's an internal-only definition.
Requested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))
Related PR: airbytehq/airbyte#70884
Devin session: https://app.devin.ai/sessions/f152f435f9d146688e476611ff864c30
Summary
When building complex connectors with multi-level substream hierarchies, it's useful to define stream definitions that are only used internally as parent streams for other streams, without exposing them as top-level streams. This pattern is currently undocumented but is actively used in production connectors.
Problem
The current YAML Reference documentation explains that only entries in the top-level
streams:array are exposed as runnable streams, but it doesn't explicitly document the pattern of:definitionsthat is NOT listed instreams:parent_stream_configfor another stream__prefix) to signal "internal helper"This pattern is particularly useful for 3-level nested substream hierarchies where an intermediate stream is needed to provide partition keys but shouldn't be exposed to users.
Example Implementation: Jira Connector
The Jira connector uses this pattern extensively. Here are code permalinks:
Internal/Private Stream Definitions (in
definitions, NOT instreams:)__issue_property_keys_substream- Used as parent forissue_properties_stream__custom_issue_fields_substream- Used as parent forissue_custom_field_contexts__issue_custom_field_contexts_substream- Used as parent forissue_custom_field_options__boards_substream- Used as parent for board-related streams__story_points_issue_fields_substream- Used for story points configurationHow It's Used (3-level hierarchy example)
The
issue_properties_streamreferences the internal__issue_property_keys_substreamas its parent:This creates a 3-level hierarchy:
issues_stream(grandparent - exposed)__issue_property_keys_substream(parent - internal, NOT exposed)issue_properties_stream(child - exposed)Top-level
streams:SectionThe streams section only lists the streams that should be exposed to users - the
__-prefixed definitions are intentionally omitted.Suggested Documentation
Add a section to the YAML Reference or a new "Advanced Patterns" page that documents:
__prefix convention (optional but recommended for clarity)streams:will not be exposed bysource.streams(config)- attempting to sync them will silently no-opstreams:section to avoid testing non-existent streamsContext
This issue was discovered while creating comprehensive mock server tests for the Jira connector (airbytehq/airbyte#70884). The pattern caused confusion when attempting to test
issue_property_keysas a stream, only to discover it's an internal-only definition.Requested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))
Related PR: airbytehq/airbyte#70884
Devin session: https://app.devin.ai/sessions/f152f435f9d146688e476611ff864c30