Skip to content

Comments

Move maxengine for restructuring#3203

Open
hengtaoguo wants to merge 1 commit intomainfrom
hengtaoguo-re
Open

Move maxengine for restructuring#3203
hengtaoguo wants to merge 1 commit intomainfrom
hengtaoguo-re

Conversation

@hengtaoguo
Copy link
Collaborator

@hengtaoguo hengtaoguo commented Feb 20, 2026

Description

Restructure inference-related modules to align with the new maxtext directory. Moved four files and created two __init__.py as below:

  • src/maxtext/inference/maxengine/:
    • __init___.py.
    • maxengine.py
    • maxengine_config.py
    • maxengine_server.py
  • src/maxtext/inference/mlperf/microbenchmarks/
    • __init___.py
    • benchmark_chunked_prefill.py
  • Update imports/shell scripts/comments/docs throughout the repo.
  • Add shim for both decode and maxengine_server under MaxText, with warning message for deprecation.
  • Move folder maxtext/inference/maxengine_server/ to maxtext/inference/maxengine/maxengine_server_deployment/. This rename is to avoid naming conflict between this folder and the actual code in maxengine_server.py.

Tests

maxengine_server, (logs):

# New API
python -m maxtext.inference.maxengine.maxengine_server maxtext/configs/base.yml model_name=qwen3-0.6b tokenizer_path=Qwen/Qwen3-0.6B load_parameters_path=gs://hengtaoguo-maxtext-logs/checkpoints/qwen3-0.6b/scanned/2026-02-16-11-35/0/items scan_layers=true max_prefill_predict_length=1024 max_target_length=2048 ici_fsdp_parallelism=1 ici_autoregressive_parallelism=1 ici_tensor_parallelism=-1 weight_dtype=bfloat16 attention=dot_product per_device_batch_size=10 quantize_kvcache=False tokenizer_type=huggingface hf_access_token=xxx

# Shim API
python -m MaxText.maxengine_server maxtext/configs/base.yml model_name=qwen3-0.6b tokenizer_path=Qwen/Qwen3-0.6B load_parameters_path=gs://hengtaoguo-maxtext-logs/checkpoints/qwen3-0.6b/scanned/2026-02-16-11-35/0/items scan_layers=true max_prefill_predict_length=1024 max_target_length=2048 ici_fsdp_parallelism=1 ici_autoregressive_parallelism=1 ici_tensor_parallelism=-1 weight_dtype=bfloat16 attention=dot_product per_device_batch_size=10 quantize_kvcache=False tokenizer_type=huggingface hf_access_token=xxx

decode:

# New API
python -m maxtext.inference.decode maxtext/configs/base.yml model_name=gemma3-4b tokenizer_path=google/gemma-3-4b-it tokenizer_type=huggingface load_parameters_path=gs://hengtaoguo-maxtext-logs/checkpoints/gemma3-4b/unscanned/001/0/items per_device_batch_size=1 run_name=ht_test max_prefill_predict_length=280 max_target_length=320 steps=1 async_checkpointing=false scan_layers=false use_multimodal=true prompt=\'Describe\ image\ \<start_of_image\>\' image_path=\'tests/assets/test_image.jpg\' attention=\'dot_product\' hf_access_token=xxx

# Shim API
python -m MaxText.decode maxtext/configs/base.yml model_name=gemma3-4b tokenizer_path=google/gemma-3-4b-it tokenizer_type=huggingface load_parameters_path=gs://hengtaoguo-maxtext-logs/checkpoints/gemma3-4b/unscanned/001/0/items per_device_batch_size=1 run_name=ht_test max_prefill_predict_length=280 max_target_length=320 steps=1 async_checkpointing=false scan_layers=false use_multimodal=true prompt=\'Describe\ image\ \<start_of_image\>\' image_path=\'tests/assets/test_image.jpg\' attention=\'dot_product\' hf_access_token=xxx

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Feb 21, 2026

Codecov Report

❌ Patch coverage is 2.94118% with 33 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/MaxText/maxengine_server.py 0.00% 17 Missing ⚠️
src/MaxText/decode.py 0.00% 16 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@bvandermoon bvandermoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need shims to keep supporting the old decode and maxengine commands?

from MaxText.common_types import DECODING_ACTIVE_SEQUENCE_INDICATOR, MODEL_MODE_PREFILL
from maxtext.layers import quantizations
from MaxText.maxengine import MaxEngine
from maxtext.inference.maxengine import maxengine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this extra import intended?

Comment on lines -25 to 31
from MaxText import maxengine
from MaxText import pyconfig
from maxtext.common import profiler
from maxtext.common.gcloud_stub import jetstream, is_decoupled
from maxtext.inference.maxengine import maxengine
from maxtext.multimodal import processor as mm_processor
from maxtext.multimodal import utils as mm_utils
from maxtext.utils import max_utils
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to your PR, but can this be moved to inference/?

decode and shim

move
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants