KAFKA-20314: Update GraalVM to version 25 to fix native image segfault on startup#22059
KAFKA-20314: Update GraalVM to version 25 to fix native image segfault on startup#22059sstremler wants to merge 3 commits intoapache:trunkfrom
Conversation
utafrali
left a comment
There was a problem hiding this comment.
The core fix is correct and well-scoped: swapping the GraalVM base image tag from 21 to 25 directly addresses the getpwuid thread-safety segfault, and the Python test helper change cleanly adds local-archive support alongside the existing URL path. Two minor points: the validation ValueError gets swallowed by the broad except clause, and a one-line comment on the Dockerfile FROM line would document the minimum-version constraint for future maintainers.
| shutil.copy(kafka_archive, f"{temp_dir_path}/kafka.tgz") | ||
| else: | ||
| raise ValueError("Either kafka_url or kafka_archive must be provided") | ||
|
|
There was a problem hiding this comment.
The ValueError is raised inside the try block whose except: clause catches everything and re-raises as SystemError("Failed to run the tests"). That swallows the descriptive message, making it harder to diagnose when run_docker_tests is called programmatically without either argument.
Move the guard before the try:
def run_docker_tests(image, tag, kafka_url, image_type, kafka_archive=None):
if not kafka_url and not kafka_archive:
raise ValueError("Either kafka_url or kafka_archive must be provided")
temp_dir_path = tempfile.mkdtemp()
try:
...Note: the archive_group = parser.add_mutually_exclusive_group(required=True) in __main__ already prevents both from being None when called from the CLI, so this guard only matters for programmatic callers. Keeping it outside the try block makes the intent clearer.
| # limitations under the License. | ||
|
|
||
| FROM ghcr.io/graalvm/graalvm-community:21 AS build-native-image | ||
| FROM ghcr.io/graalvm/graalvm-community:25 AS build-native-image |
There was a problem hiding this comment.
Adding a short comment here would help future maintainers understand why this specific major version is required:
# GraalVM 25+ required: earlier versions call getpwuid (not thread-safe), causing
# intermittent segfaults (~2%) on startup. GraalVM 25 switches to getpwuid_r.
FROM ghcr.io/graalvm/graalvm-community:25 AS build-native-imageWithout this, the next person who sees version 25 here has no clue whether downgrading to 21 would be safe.
@utafrali Thank you for the review. I added the changes you suggested. |
Update GraalVM to version 25 to fix native image segfault on startup.
getpwuidis not thread-safe and caused intermittent segfaults (~2%) during startup of kafka-native. GraalVM 25 fixes this by callinggetpwuid_rinstead.Reviewers: TaiJuWu tjwu1217@gmail.com, PoAn Yang payang@apache.org, Ken Huang s7133700@gmail.com, Chia-Ping Tsai chia7712@gmail.com