by ModelScope
English | 中文
Twinkle✨ is a lightweight, client-server training framework engineered
with modular, high-cohesion interfaces. Whether you are executing locally
with torchrun, or scaling training across Ray clusters,
Twinkle✨ eliminates infrastructure friction by encapsulating
training logic into standardized APIs. Beyond simple
abstraction, Twinkle✨ serves as a robust backend and gateway to enable serverless Training-as-a-Service (TaaS).
It offers interfaces that constitute a superset of Tinker APIs,
thereby making it possible to access a Twinkle✨ training service via Tinker client or native Twinkle✨ client
which offers more functionalities.
🧩 Decoupled Architecture: Standardized Interfaces, backward compatible with Tinker APIs.
🚀 Multiple Runtime Modes: torchrun / Ray / HTTP.
🔌 Versatile Backends: Transformers / Megatron.
👥 Multi-Tenancy Training Service: Train multiple LoRAs that share one base model deployment.
Note: Twinkle✨is built by the team behind ms-swift, and we expect the two projects to evolve together. We expect some fundamental components in Twinkle✨will likely be reused in ms-swift.
| Twinkle Wechat Group |
|---|
![]() |
pip install 'twinkle-kit'git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e .| Training Type | Model Framework | Cookbook Path |
|---|---|---|
| FSDP finetuning | transformers | Script |
| FSDP MoE finetuning | transformers | Script |
| ep FSDP MoE finetuning | transformers | Script |
| sp FSDP finetuning | transformers | Script |
| EP MoE finetuning | transformers | Script |
| pp/tp/cp finetuning | megatron | Script |
| pp/tp/cp MoE finetuning | megatron | Script |
| tinker client finetuning | megatron | Script |
| tinker client finetuning/sampling | transformers | Script |
| twinkle client finetuning | megatron | Script |
| twinkle client finetuning | transformer | Script |
- 🎉2026-02-13 Initial version of Twinkle✨ released, including SFT/PT/RL support for text models and serverless training capabilities on ModelScope.
We are rolling out training service built atop Twinkle✨ on ModelScope. It is currently in Beta. You may
sign up for free access by joining the Twinkle-Explorers organization, and
train via API endpoint base_url=https://www.modelscope.cn/twinkle. For more details, please refer to
our documentation.
| Hardware Environment | Notes |
|---|---|
| Nvidia GPUs | ✅ Support for BF16/Flash-Attn may be incomplete in earlier GPUs |
| Ascend NPU | ✅ Some operators may not supported |
| PPU | ✅ |
| CPU | Supports partial components like dataset, dataloader |
We will be adding support for more models as new models are released. The following table lists current models supported on Twinkle✨ framework.
Note
For serverless training service accessed via base_url=https://www.modelscope.cn/twinkle, it currently supports
one training base at a time, and currently it is Qwen3-30B-A3B-Instruct-2507.
For a more detailed model support list 👉 Quick Start.md
from peft import LoraConfig
import twinkle
from twinkle import DeviceMesh, DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
from twinkle.preprocessor import SelfCognitionProcessor
device_group = [DeviceGroup(name='default',ranks=8,device_type='cuda')]
device_mesh = DeviceMesh.from_sizes(fsdp_size=4, dp_size=2)
# local for torchrun
twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_mesh)
def train():
# to load model from Hugging Face, use 'hf://...'
base_model = 'ms://Qwen/Qwen2.5-7B-Instruct'
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
dataset.set_template('Template', model_id=base_model)
# Preprocess the dataset to standard format
dataset.map(SelfCognitionProcessor('twinkle LLM', 'ModelScope Community'))
# Encode dataset
dataset.encode()
# Global batch size = 8, for GPUs, so 1 sample per GPU
dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
# Use a TransformersModel
model = TransformersModel(model_id=base_model, remote_group='default')
lora_config = LoraConfig(
r=8,
lora_alpha=32,
target_modules='all-linear'
)
# Add a lora to model, with name `default`
# Comment this to use full-parameter training
model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)
# Add Optimizer for lora `default`
model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
# Add LRScheduler for lora `default`
model.set_lr_scheduler(scheduler_cls='CosineWarmupScheduler', num_warmup_steps=5,
num_training_steps=len(dataloader))
for step, batch in enumerate(dataloader):
# Do forward and backward
model.forward_backward(inputs=batch)
# Step
model.clip_grad_and_step()
if step % 20 == 0:
# Print metric
metric = model.calculate_metric(is_training=True)
print(f'Current is step {step} of {len(dataloader)}, metric: {metric}')
model.save(f'last-checkpoint')
if __name__ == '__main__':
train()import os
from tqdm import tqdm
from tinker import types
from twinkle_client import init_tinker_compat_client
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.tinker.common import input_feature_to_datum
base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_url='http://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')
# Use twinkle dataset to load the data
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(500)))
dataset.set_template('Template', model_id=base_model, max_length=256)
dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_cache_file=False)
dataset.encode(batched=True, load_from_cache_file=False)
dataloader = DataLoader(dataset=dataset, batch_size=8)
# Initialize tinker client
service_client = init_tinker_compat_client(base_url, api_key)
training_client = service_client.create_lora_training_client(base_model=base_model[len('ms://'):], rank=16)
# Training loop: use input_feature_to_datum to transfer the input format
for epoch in range(3):
for step, batch in tqdm(enumerate(dataloader)):
input_datum = [input_feature_to_datum(input_feature) for input_feature in batch]
fwdbwd_future = training_client.forward_backward(input_datum, "cross_entropy")
optim_future = training_client.optim_step(types.AdamParams(learning_rate=1e-4))
fwdbwd_result = fwdbwd_future.result()
optim_result = optim_future.result()
training_client.save_state(f"twinkle-lora-{epoch}").result()Twinkle✨ features a decoupled Client-Server architecture designed for maximum flexibility. The client-side provides two distinct integration paths:
- Twinkle✨ Native: A conforming API that mirrors the server-side interface for seamless end-to-end integration.
- Tinker Compatibility: Full support for the native Tinker API, enabling developers to leverage Twinkle✨’s backend using Tinker client.
This dual-path design ensures access to Twinkle✨’s training services using Tinker API, with a simple modification of the Tinker base URL.
Twinkle✨ supports simultaneous multi-tenant training on a shared base model. Leveraging a LoRA Pool + Tenant Application architecture, Twinkle enables up to N tenants to train in parallel with complete isolation. This design offers unprecedented flexibility: from the model's perspective, each tenant's session is distinct, supporting heterogeneous configurations including unique data padding strategies, optimizers, and loss functions—all running concurrently on the same base model.
Note: This feature is currently optimized for LoRA.
For example:
- Tenant A: Load local private dataset locally, LoRA rank=8, using base model for SFT
- Tenant B: Load open-source dataset from Hub remotely, LoRA rank=32, using base model for PT
- Tenant C: Use base model for GRPO loss calculation, using Sampler for sampling
- Tenant D: Use base model for logps inference
These processes are executed concurrently on a single base model because the Model and Sampler are integrated as task-agnostic components within the Twinkle✨ ecosystem. Upon completion, checkpoints are automatically pushed to ModelScope or HuggingFace repositories (private by default). On the server side, Twinkle✨ provides a robust multi-tenant suite featuring automated cluster management and dynamic scaling, making it the foundation for building customizable, enterprise-grade training services.
As a modular framework, Twinkle✨ also supports remote temporary exclusive training, i.e., training in full-parameter mode.
|
Dataset |
Template |
DataLoader |
Preprocessor |
InputProcessor |
|
Model |
Sampler |
Loss |
Metric |
Reward |
|
Advantage |
CheckpointEngine |
Patch |
Module |
Kernel |
|
Server |
Client |
Infra |
Plugin |
Hub |
| Component Type | Component Link | Component Function | Author |
|---|---|---|---|
| Patch | qwen3_moe_transformers4_patch | Fixes Qwen3 MoE model hang issue during FSDP2 training, effective for transformers==4.x | ModelScope Official |
Twinkle✨ is a collaborative initiative put together by ModelScope in partnership with the open-source community, with key contributions from strategic stakeholders including China Merchants Bank Tech Team.
We are grateful to the open-source community, particularly the projects that inspired us, including Transformers, MS-SWIFT, veRL, Tinker, and many others.
We welcome open contributions via issues and pull-requests.



