Skip to content

[FLINK-39951][python] Fix string reference equality bug in ArrayConstructor causing silent long value truncation#28473

Open
jubins wants to merge 1 commit into
apache:masterfrom
jubins:j-fix-FLINK-39951-array-constructor-string-comparison
Open

[FLINK-39951][python] Fix string reference equality bug in ArrayConstructor causing silent long value truncation#28473
jubins wants to merge 1 commit into
apache:masterfrom
jubins:j-fix-FLINK-39951-array-constructor-string-comparison

Conversation

@jubins

@jubins jubins commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

Fixes FLINK-39951ArrayConstructor.construct() uses reference equality (==) to compare the pickle typecode string "l" instead of value equality (.equals()). Since args[0] is a deserialized String object at runtime and not an interned literal, the == comparison always returns false, making the long[] code path effectively dead code.

As a result, any Python array with typecode 'l' (64-bit longs) silently falls through to super.construct(), which treats the values as int[] and truncates them to 32 bits. Users passing long values larger than Integer.MAX_VALUE receive silently corrupted data with no error or warning.

The fix replaces args[0] == "l" with "l".equals(args[0]) on line 30 of ArrayConstructor.java.

Brief change log

  • Fixed ArrayConstructor.construct() to use "l".equals(args[0]) instead of args[0] == "l" for typecode comparison
  • Added ArrayConstructorTest with four test cases covering: large long values above Integer.MAX_VALUE, single value arrays, empty arrays, and delegation to the parent class for non-'l' typecodes

Verifying this change

This change is covered by new unit tests in ArrayConstructorTest:

  • testLongArrayPreservesValuesAboveIntMax() — the core regression test; uses new String("l") to guarantee a non-interned reference (proving the old == would have failed), then asserts that values like 3_000_000_000L and Long.MAX_VALUE are preserved correctly as long[]
  • testLongArrayWithSingleValue() — verifies a single large long value is handled correctly
  • testLongArrayWithEmptyList() — verifies empty input produces an empty long[]
  • testNonLongTypecodesDelegateToSuper() — confirms other typecodes (e.g. 'i') still fall through to the parent super.construct() correctly

Does this pull request potentially affect one of the following parts

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery (JobManager, Checkpointing, Kubernetes/Yarn, ZooKeeper): no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no — it fixes a bug in existing functionality
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?

  • Yes — Claude Code was used as a pair-programming assistant. All code was written, understood, and verified by the author.
    Generated-by: Claude Opus 4.8

@flinkbot

flinkbot commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@spuru9 spuru9 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the community-reviewed PR has been reviewed by the community. label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants