Community report
Reported by a user in the Airbyte community Slack: https://airbytehq.slack.com/archives/C021JANJ6TY/p1776936626673869
The user has an Intercom connector with hourly syncs, lookback_window = 1 day, and conversation_parts in incremental append+deduped mode. They found that statistics.count_conversation_parts on conversations consistently exceeds the number of parts in their destination table. Manual verification against the Intercom API confirmed the API returns the correct number of parts but Airbyte drops some during sync.
Summary
When a substream uses global_substream_cursor: true + is_client_side_incremental: true, the configured lookback_window on the DatetimeBasedCursor is not applied to the should_be_synced client-side filter. Instead, the filter uses the runtime_lookback_window (previous sync's duration in seconds), which is typically only a few minutes. This causes parts whose updated_at is even slightly behind the global cursor to be silently dropped.
Affected code path
-
ConcurrentPerPartitionCursor._get_cursor() creates a cursor using self._lookback_window, which is the runtime lookback (previous sync duration in seconds), loaded from state at airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py line 474:
self._lookback_window = int(stream_state.get("lookback_window", 0))
-
This value is passed to _create_cursor() at line 596-600:
if self._use_global_cursor:
return self._create_cursor(
self._global_cursor,
self._lookback_window if self._global_cursor else 0,
)
-
In model_to_component_factory.py lines 1416-1426, the runtime lookback is subtracted from the cursor value:
stream_state_value = stream_state.get(cursor_field.cursor_field_key)
if runtime_lookback_window and stream_state_value:
new_stream_state = (
connector_state_converter.parse_timestamp(stream_state_value)
- runtime_lookback_window
)
-
This becomes self.start in ConcurrentCursor.__init__() at airbyte_cdk/sources/streams/concurrent/cursor.py line 225.
-
should_be_synced() at line 561-573 uses self.start as the lower bound:
return self.start <= record_cursor_value <= self._end_provider()
The configured lookback_window (e.g., P1D) from the manifest's DatetimeBasedCursor is passed to ConcurrentCursor as self._lookback_window (line 226) but is only used in stream_slices() for generating API request time ranges. It is never referenced by should_be_synced().
Reproduction
Connector: source-intercom
Stream: conversation_parts (substream of conversations)
Config: lookback_window = 1, hourly syncs, incremental append+deduped
Manifest settings: global_substream_cursor: true, is_client_side_incremental: true
Observed behavior:
- Intercom API returns 44 parts for a conversation
- Airbyte destination table has 38 parts (6 missing)
- The 6 missing parts were created seconds/minutes before other parts on the same conversation that were successfully synced
- Example: On April 28, parts created at 15:33:10-16 were dropped, while parts created at 15:33:18 (2 seconds later) on the same conversation were synced. The
should_be_synced cutoff landed between them.
Expected behavior:
- With a 1-day
lookback_window, self.start should be global_cursor - 1 day, not global_cursor - previous_sync_duration_seconds
- All 44 parts should be synced
Root cause
The global_substream_cursor tracks one cursor across all partitions. When the cursor advances (from processing parts on other conversations), parts from individual conversations that have updated_at slightly behind the global max are dropped by should_be_synced. The runtime lookback (a few minutes) is far too small to cover the spread of updated_at values across conversations in a workspace.
Suggested fix
Apply the configured lookback_window (from the manifest's DatetimeBasedCursor) to the should_be_synced filter when using global_substream_cursor, either instead of or in addition to the runtime lookback. The effective lookback should be max(configured_lookback_window, runtime_lookback_window).
Devin session — requested by @mark-grivnin
Community report
Reported by a user in the Airbyte community Slack: https://airbytehq.slack.com/archives/C021JANJ6TY/p1776936626673869
The user has an Intercom connector with hourly syncs,
lookback_window = 1 day, andconversation_partsin incremental append+deduped mode. They found thatstatistics.count_conversation_partson conversations consistently exceeds the number of parts in their destination table. Manual verification against the Intercom API confirmed the API returns the correct number of parts but Airbyte drops some during sync.Summary
When a substream uses
global_substream_cursor: true+is_client_side_incremental: true, the configuredlookback_windowon theDatetimeBasedCursoris not applied to theshould_be_syncedclient-side filter. Instead, the filter uses theruntime_lookback_window(previous sync's duration in seconds), which is typically only a few minutes. This causes parts whoseupdated_atis even slightly behind the global cursor to be silently dropped.Affected code path
ConcurrentPerPartitionCursor._get_cursor()creates a cursor usingself._lookback_window, which is the runtime lookback (previous sync duration in seconds), loaded from state atairbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.pyline 474:This value is passed to
_create_cursor()at line 596-600:In
model_to_component_factory.pylines 1416-1426, the runtime lookback is subtracted from the cursor value:This becomes
self.startinConcurrentCursor.__init__()atairbyte_cdk/sources/streams/concurrent/cursor.pyline 225.should_be_synced()at line 561-573 usesself.startas the lower bound:The configured
lookback_window(e.g.,P1D) from the manifest'sDatetimeBasedCursoris passed toConcurrentCursorasself._lookback_window(line 226) but is only used instream_slices()for generating API request time ranges. It is never referenced byshould_be_synced().Reproduction
Connector: source-intercom
Stream:
conversation_parts(substream ofconversations)Config:
lookback_window = 1, hourly syncs, incremental append+dedupedManifest settings:
global_substream_cursor: true,is_client_side_incremental: trueObserved behavior:
should_be_syncedcutoff landed between them.Expected behavior:
lookback_window,self.startshould beglobal_cursor - 1 day, notglobal_cursor - previous_sync_duration_secondsRoot cause
The
global_substream_cursortracks one cursor across all partitions. When the cursor advances (from processing parts on other conversations), parts from individual conversations that haveupdated_atslightly behind the global max are dropped byshould_be_synced. The runtime lookback (a few minutes) is far too small to cover the spread ofupdated_atvalues across conversations in a workspace.Suggested fix
Apply the configured
lookback_window(from the manifest'sDatetimeBasedCursor) to theshould_be_syncedfilter when usingglobal_substream_cursor, either instead of or in addition to the runtime lookback. The effective lookback should bemax(configured_lookback_window, runtime_lookback_window).Devin session — requested by @mark-grivnin