Twinkle: Training workbench to make your model glow

✨ What is Twinkle?

Twinkle✨ is a lightweight, client-server training framework engineered with modular, high-cohesion interfaces. Whether you are executing locally with torchrun, or scaling training across Ray clusters, Twinkle✨ eliminates infrastructure friction by encapsulating training logic into standardized APIs. Beyond simple abstraction, Twinkle✨ serves as a robust backend and gateway to enable serverless Training-as-a-Service (TaaS). It offers interfaces that constitute a superset of Tinker APIs, thereby making it possible to access a Twinkle✨ training service via Tinker client or native Twinkle✨ client which offers more functionalities.

🧩 Decoupled Architecture: Standardized Interfaces, backward compatible with Tinker APIs.
🚀 Multiple Runtime Modes: torchrun / Ray / HTTP.
🔌 Versatile Backends: Transformers / Megatron.
👥 Multi-Tenancy Training Service: Train multiple LoRAs that share one base model deployment.

Note: Twinkle✨is built by the team behind ms-swift, and we expect the two projects to evolve together. We expect some fundamental components in Twinkle✨will likely be reused in ms-swift.

Twinkle Wechat Group

Installation

Install with package:

pip install 'twinkle-kit'

Install from Source:

git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e .

Tutorials

Training Type	Model Framework	Cookbook Path
FSDP finetuning	transformers	Script
FSDP MoE finetuning	transformers	Script
ep FSDP MoE finetuning	transformers	Script
sp FSDP finetuning	transformers	Script
EP MoE finetuning	transformers	Script
pp/tp/cp finetuning	megatron	Script
pp/tp/cp MoE finetuning	megatron	Script
tinker client finetuning	megatron	Script
tinker client finetuning/sampling	transformers	Script
twinkle client finetuning	megatron	Script
twinkle client finetuning	transformer	Script

Changelog

🎉2026-02-13 Initial version of Twinkle✨ released, including SFT/PT/RL support for text models and serverless training capabilities on ModelScope.

Training as a Service on ModelScope

We are rolling out training service built atop Twinkle✨ on ModelScope. It is currently in Beta. You may sign up for free access by joining the Twinkle-Explorers organization, and train via API endpoint base_url=https://www.modelscope.cn/twinkle. For more details, please refer to our documentation.

Supported Hardware

Hardware Environment	Notes
Nvidia GPUs	✅ Support for BF16/Flash-Attn may be incomplete in earlier GPUs
Ascend NPU	✅ Some operators may not supported
PPU	✅
CPU	Supports partial components like dataset, dataloader

Supported Models

We will be adding support for more models as new models are released. The following table lists current models supported on Twinkle✨ framework.

Note

For serverless training service accessed via base_url=https://www.modelscope.cn/twinkle, it currently supports one training base at a time, and currently it is Qwen3-30B-A3B-Instruct-2507.

Model Type	Model ID on ModelScope	Requires	Megatron Support	HF Model ID
qwen3 series	Qwen/Qwen3-0.6B-Base~32B	transformers>=4.51	✅	Qwen/Qwen3-0.6B-Base
qwen3_moe series	Qwen/Qwen3-30B-A3B-Base	transformers>=4.51	✅	Qwen/Qwen3-30B-A3B-Base
	Qwen/Qwen3-30B-A3B~235B	transformers>=4.51	✅	Qwen/Qwen3-30B-A3B
qwen2 series	Qwen/Qwen2-0.5B-Instruct ~72B	transformers>=4.37	✅	Qwen/Qwen2-0.5B-Instruct
	Qwen/Qwen2.5-0.5B-Instruct~72B	transformers>=4.37	✅	Qwen/Qwen2.5-0.5B-Instruct
	Qwen/Qwen2.5-0.5B~72B	transformers>=4.37	✅	Qwen/Qwen2.5-0.5B
qwen2_moe series	Qwen/Qwen1.5-MoE-A2.7B-Chat	transformers>=4.40	✅	Qwen/Qwen1.5-MoE-A2.7B-Chat
chatglm4 series	ZhipuAI/glm-4-9b-chat	transformers>=4.42	✘	zai-org/glm-4-9b-chat
	ZhipuAI/LongWriter-glm4-9b	transformers>=4.42	✘	zai-org/LongWriter-glm4-9b
glm_edge series	ZhipuAI/glm-edge-1.5b-chat	transformers>=4.46	✘	zai-org/glm-edge-1.5b-chat
	ZhipuAI/glm-edge-4b-chat	transformers>=4.46	✘	zai-org/glm-edge-4b-chat
internlm2 series	Shanghai_AI_Laboratory/internlm2-1_8b	transformers>=4.38	✘	internlm/internlm2-1_8b
	Shanghai_AI_Laboratory/internlm2-chat-7b	transformers>=4.38	✘	internlm/internlm2-chat-7b
deepseek_v1	deepseek-ai/deepseek-vl-7b-chat	transformers>=4.39.4	✅	——
	deepseek-ai/DeepSeek-V2-Lite	transformers>=4.39.3	✅	deepseek-ai/DeepSeek-V2-Lite
	deepseek-ai/DeepSeek-V2.5	transformers>=4.39.3	✅	deepseek-ai/DeepSeek-V2.5
	deepseek-ai/DeepSeek-R1	transformers>=4.39.3	✅	deepseek-ai/DeepSeek-R1
deepSeek-r1-distill	deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ~32B	transformers>=4.37	✅	deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

For a more detailed model support list 👉 Quick Start.md

Sample Code

Train with Ray

from peft import LoraConfig
import twinkle
from twinkle import DeviceMesh, DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
from twinkle.preprocessor import SelfCognitionProcessor

device_group = [DeviceGroup(name='default',ranks=8,device_type='cuda')]
device_mesh = DeviceMesh.from_sizes(fsdp_size=4, dp_size=2)
# local for torchrun
twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_mesh)


def train():
    # to load model from Hugging Face, use 'hf://...'
    base_model = 'ms://Qwen/Qwen2.5-7B-Instruct'
    # 1000 samples
    dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
    # Set template to prepare encoding
    dataset.set_template('Template', model_id=base_model)
    # Preprocess the dataset to standard format
    dataset.map(SelfCognitionProcessor('twinkle LLM', 'ModelScope Community'))
    # Encode dataset
    dataset.encode()
    # Global batch size = 8, for GPUs, so 1 sample per GPU
    dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
    # Use a TransformersModel
    model = TransformersModel(model_id=base_model, remote_group='default')

    lora_config = LoraConfig(
        r=8,
        lora_alpha=32,
        target_modules='all-linear'
    )

    # Add a lora to model, with name `default`
    # Comment this to use full-parameter training
    model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)
    # Add Optimizer for lora `default`
    model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
    # Add LRScheduler for lora `default`
    model.set_lr_scheduler(scheduler_cls='CosineWarmupScheduler', num_warmup_steps=5,
                           num_training_steps=len(dataloader))
    for step, batch in enumerate(dataloader):
        # Do forward and backward
        model.forward_backward(inputs=batch)
        # Step
        model.clip_grad_and_step()
        if step % 20 == 0:
            # Print metric
            metric = model.calculate_metric(is_training=True)
            print(f'Current is step {step} of {len(dataloader)}, metric: {metric}')
    model.save(f'last-checkpoint')


if __name__ == '__main__':
    train()

Using Tinker-Like API

import os
from tqdm import tqdm
from tinker import types
from twinkle_client import init_tinker_compat_client
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.tinker.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_url='http://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')

# Use twinkle dataset to load the data
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(500)))
dataset.set_template('Template', model_id=base_model, max_length=256)
dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_cache_file=False)
dataset.encode(batched=True, load_from_cache_file=False)
dataloader = DataLoader(dataset=dataset, batch_size=8)

# Initialize tinker client
service_client = init_tinker_compat_client(base_url, api_key)
training_client = service_client.create_lora_training_client(base_model=base_model[len('ms://'):], rank=16)

# Training loop: use input_feature_to_datum to transfer the input format
for epoch in range(3):
    for step, batch in tqdm(enumerate(dataloader)):
        input_datum = [input_feature_to_datum(input_feature) for input_feature in batch]

        fwdbwd_future = training_client.forward_backward(input_datum, "cross_entropy")
        optim_future = training_client.optim_step(types.AdamParams(learning_rate=1e-4))

        fwdbwd_result = fwdbwd_future.result()
        optim_result = optim_future.result()

    training_client.save_state(f"twinkle-lora-{epoch}").result()

Architecture Design

Twinkle✨ features a decoupled Client-Server architecture designed for maximum flexibility. The client-side provides two distinct integration paths:

Twinkle✨ Native: A conforming API that mirrors the server-side interface for seamless end-to-end integration.
Tinker Compatibility: Full support for the native Tinker API, enabling developers to leverage Twinkle✨’s backend using Tinker client.

This dual-path design ensures access to Twinkle✨’s training services using Tinker API, with a simple modification of the Tinker base URL.

Multi-Tenancy

Twinkle✨ supports simultaneous multi-tenant training on a shared base model. Leveraging a LoRA Pool + Tenant Application architecture, Twinkle enables up to N tenants to train in parallel with complete isolation. This design offers unprecedented flexibility: from the model's perspective, each tenant's session is distinct, supporting heterogeneous configurations including unique data padding strategies, optimizers, and loss functions—all running concurrently on the same base model.

Note: This feature is currently optimized for LoRA.

For example:

Tenant A: Load local private dataset locally, LoRA rank=8, using base model for SFT
Tenant B: Load open-source dataset from Hub remotely, LoRA rank=32, using base model for PT
Tenant C: Use base model for GRPO loss calculation, using Sampler for sampling
Tenant D: Use base model for logps inference

These processes are executed concurrently on a single base model because the Model and Sampler are integrated as task-agnostic components within the Twinkle✨ ecosystem. Upon completion, checkpoints are automatically pushed to ModelScope or HuggingFace repositories (private by default). On the server side, Twinkle✨ provides a robust multi-tenant suite featuring automated cluster management and dynamic scaling, making it the foundation for building customizable, enterprise-grade training services.

As a modular framework, Twinkle✨ also supports remote temporary exclusive training, i.e., training in full-parameter mode.

🛠️ Twinkle✨ Modular Ecosystem

Dataset _{Data loading and preprocessing}	Template _{Encoding and decoding}	DataLoader _{Data distribution and batching}	Preprocessor _{Data ETL}	InputProcessor _{Task-specific input processing}
Model _{Large models, supports multiple frameworks}	Sampler _{Sampler logic}	Loss _{Loss functions}	Metric _{Training metrics collection}	Reward _{Reward function}
Advantage _{Advantage function}	CheckpointEngine _{Weight synchronization}	Patch _{Patches for model fixes}	Module _{Components, e.g., Optimizer}	Kernel _Operators
Server _{Start backend cluster}	Client _{Client code}	Infra _{Isolate ray and torchrun differences}	Plugin _{Use hub components}	Hub _{Interface with HF/MS libraries}

Community Components

Component Type	Component Link	Component Function	Author
Patch	qwen3_moe_transformers4_patch	Fixes Qwen3 MoE model hang issue during FSDP2 training, effective for transformers==4.x	ModelScope Official

Contributions

Twinkle✨ is a collaborative initiative put together by ModelScope in partnership with the open-source community, with key contributions from strategic stakeholders including China Merchants Bank Tech Team.

We are grateful to the open-source community, particularly the projects that inspired us, including Transformers, MS-SWIFT, veRL, Tinker, and many others.

We welcome open contributions via issues and pull-requests.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.dev_scripts		.dev_scripts
.github		.github
assets		assets
client_tools		client_tools
cmake		cmake
cookbook		cookbook
docs		docs
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTING_CN.md		CONTRIBUTING_CN.md
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
ROADMAP.md		ROADMAP.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twinkle: Training workbench to make your model glow

✨ What is Twinkle?

Installation

Install with package:

Install from Source:

Tutorials

Changelog

Training as a Service on ModelScope

Supported Hardware

Supported Models

Sample Code

Train with Ray

Using Tinker-Like API

Architecture Design

Multi-Tenancy

🛠️ Twinkle✨ Modular Ecosystem

Community Components

Contributions

About

Uh oh!

Releases

Packages

Contributors 5

Languages

License

modelscope/twinkle

Folders and files

Latest commit

History

Repository files navigation

Twinkle: Training workbench to make your model glow

✨ What is Twinkle?

Installation

Install with package:

Install from Source:

Tutorials

Changelog

Training as a Service on ModelScope

Supported Hardware

Supported Models

Sample Code

Train with Ray

Using Tinker-Like API

Architecture Design

Multi-Tenancy

🛠️ Twinkle✨ Modular Ecosystem

Community Components

Contributions

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages