Skip to content

modelscope/twinkle

Twinkle: Training workbench to make your model glow

by ModelScope
English  |  中文 

English Documentation   |   中文文档  

✨ What is Twinkle?

Twinkle✨ is a lightweight, client-server training framework engineered with modular, high-cohesion interfaces. Whether you are executing locally with torchrun, or scaling training across Ray clusters, Twinkle✨ eliminates infrastructure friction by encapsulating training logic into standardized APIs. Beyond simple abstraction, Twinkle✨ serves as a robust backend and gateway to enable serverless Training-as-a-Service (TaaS). It offers interfaces that constitute a superset of Tinker APIs, thereby making it possible to access a Twinkle✨ training service via Tinker client or native Twinkle✨ client which offers more functionalities.

🧩 Decoupled Architecture: Standardized Interfaces, backward compatible with Tinker APIs.
🚀 Multiple Runtime Modes: torchrun / Ray / HTTP.
🔌 Versatile Backends: Transformers / Megatron.
👥 Multi-Tenancy Training Service: Train multiple LoRAs that share one base model deployment.

Note: Twinkle✨is built by the team behind ms-swift, and we expect the two projects to evolve together. We expect some fundamental components in Twinkle✨will likely be reused in ms-swift.

Twinkle Wechat Group

Installation

Install with package:

pip install 'twinkle-kit'

Install from Source:

git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e .

Tutorials

Training Type Model Framework Cookbook Path
FSDP finetuning transformers Script
FSDP MoE finetuning transformers Script
ep FSDP MoE finetuning transformers Script
sp FSDP finetuning transformers Script
EP MoE finetuning transformers Script
pp/tp/cp finetuning megatron Script
pp/tp/cp MoE finetuning megatron Script
tinker client finetuning megatron Script
tinker client finetuning/sampling transformers Script
twinkle client finetuning megatron Script
twinkle client finetuning transformer Script

Changelog

  • 🎉2026-02-13 Initial version of Twinkle✨ released, including SFT/PT/RL support for text models and serverless training capabilities on ModelScope.

Training as a Service on ModelScope

We are rolling out training service built atop Twinkle✨ on ModelScope. It is currently in Beta. You may sign up for free access by joining the Twinkle-Explorers organization, and train via API endpoint base_url=https://www.modelscope.cn/twinkle. For more details, please refer to our documentation.

Supported Hardware

Hardware Environment Notes
Nvidia GPUs ✅ Support for BF16/Flash-Attn may be incomplete in earlier GPUs
Ascend NPU ✅ Some operators may not supported
PPU
CPU Supports partial components like dataset, dataloader

Supported Models

We will be adding support for more models as new models are released. The following table lists current models supported on Twinkle✨ framework.

Note

For serverless training service accessed via base_url=https://www.modelscope.cn/twinkle, it currently supports one training base at a time, and currently it is Qwen3-30B-A3B-Instruct-2507.

Model Type Model ID on ModelScope Requires Megatron Support HF Model ID
qwen3 series Qwen/Qwen3-0.6B-Base~32B transformers>=4.51 Qwen/Qwen3-0.6B-Base
qwen3_moe series Qwen/Qwen3-30B-A3B-Base transformers>=4.51 Qwen/Qwen3-30B-A3B-Base
Qwen/Qwen3-30B-A3B~235B transformers>=4.51 Qwen/Qwen3-30B-A3B
qwen2 series Qwen/Qwen2-0.5B-Instruct ~72B transformers>=4.37 Qwen/Qwen2-0.5B-Instruct
Qwen/Qwen2.5-0.5B-Instruct~72B transformers>=4.37 Qwen/Qwen2.5-0.5B-Instruct
Qwen/Qwen2.5-0.5B~72B transformers>=4.37 Qwen/Qwen2.5-0.5B
qwen2_moe series Qwen/Qwen1.5-MoE-A2.7B-Chat transformers>=4.40 Qwen/Qwen1.5-MoE-A2.7B-Chat
chatglm4 series ZhipuAI/glm-4-9b-chat transformers>=4.42 zai-org/glm-4-9b-chat
ZhipuAI/LongWriter-glm4-9b transformers>=4.42 zai-org/LongWriter-glm4-9b
glm_edge series ZhipuAI/glm-edge-1.5b-chat transformers>=4.46 zai-org/glm-edge-1.5b-chat
ZhipuAI/glm-edge-4b-chat transformers>=4.46 zai-org/glm-edge-4b-chat
internlm2 series Shanghai_AI_Laboratory/internlm2-1_8b transformers>=4.38 internlm/internlm2-1_8b
Shanghai_AI_Laboratory/internlm2-chat-7b transformers>=4.38 internlm/internlm2-chat-7b
deepseek_v1 deepseek-ai/deepseek-vl-7b-chat transformers>=4.39.4 ——
deepseek-ai/DeepSeek-V2-Lite transformers>=4.39.3 deepseek-ai/DeepSeek-V2-Lite
deepseek-ai/DeepSeek-V2.5 transformers>=4.39.3 deepseek-ai/DeepSeek-V2.5
deepseek-ai/DeepSeek-R1 transformers>=4.39.3 deepseek-ai/DeepSeek-R1
deepSeek-r1-distill deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ~32B transformers>=4.37 deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

For a more detailed model support list 👉 Quick Start.md

Sample Code

Train with Ray

from peft import LoraConfig
import twinkle
from twinkle import DeviceMesh, DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
from twinkle.preprocessor import SelfCognitionProcessor

device_group = [DeviceGroup(name='default',ranks=8,device_type='cuda')]
device_mesh = DeviceMesh.from_sizes(fsdp_size=4, dp_size=2)
# local for torchrun
twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_mesh)


def train():
    # to load model from Hugging Face, use 'hf://...'
    base_model = 'ms://Qwen/Qwen2.5-7B-Instruct'
    # 1000 samples
    dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
    # Set template to prepare encoding
    dataset.set_template('Template', model_id=base_model)
    # Preprocess the dataset to standard format
    dataset.map(SelfCognitionProcessor('twinkle LLM', 'ModelScope Community'))
    # Encode dataset
    dataset.encode()
    # Global batch size = 8, for GPUs, so 1 sample per GPU
    dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
    # Use a TransformersModel
    model = TransformersModel(model_id=base_model, remote_group='default')

    lora_config = LoraConfig(
        r=8,
        lora_alpha=32,
        target_modules='all-linear'
    )

    # Add a lora to model, with name `default`
    # Comment this to use full-parameter training
    model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)
    # Add Optimizer for lora `default`
    model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
    # Add LRScheduler for lora `default`
    model.set_lr_scheduler(scheduler_cls='CosineWarmupScheduler', num_warmup_steps=5,
                           num_training_steps=len(dataloader))
    for step, batch in enumerate(dataloader):
        # Do forward and backward
        model.forward_backward(inputs=batch)
        # Step
        model.clip_grad_and_step()
        if step % 20 == 0:
            # Print metric
            metric = model.calculate_metric(is_training=True)
            print(f'Current is step {step} of {len(dataloader)}, metric: {metric}')
    model.save(f'last-checkpoint')


if __name__ == '__main__':
    train()

Using Tinker-Like API

import os
from tqdm import tqdm
from tinker import types
from twinkle_client import init_tinker_compat_client
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.tinker.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_url='http://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')

# Use twinkle dataset to load the data
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(500)))
dataset.set_template('Template', model_id=base_model, max_length=256)
dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_cache_file=False)
dataset.encode(batched=True, load_from_cache_file=False)
dataloader = DataLoader(dataset=dataset, batch_size=8)

# Initialize tinker client
service_client = init_tinker_compat_client(base_url, api_key)
training_client = service_client.create_lora_training_client(base_model=base_model[len('ms://'):], rank=16)

# Training loop: use input_feature_to_datum to transfer the input format
for epoch in range(3):
    for step, batch in tqdm(enumerate(dataloader)):
        input_datum = [input_feature_to_datum(input_feature) for input_feature in batch]

        fwdbwd_future = training_client.forward_backward(input_datum, "cross_entropy")
        optim_future = training_client.optim_step(types.AdamParams(learning_rate=1e-4))

        fwdbwd_result = fwdbwd_future.result()
        optim_result = optim_future.result()

    training_client.save_state(f"twinkle-lora-{epoch}").result()

Architecture Design

Twinkle✨ features a decoupled Client-Server architecture designed for maximum flexibility. The client-side provides two distinct integration paths:

  • Twinkle✨ Native: A conforming API that mirrors the server-side interface for seamless end-to-end integration.
  • Tinker Compatibility: Full support for the native Tinker API, enabling developers to leverage Twinkle✨’s backend using Tinker client.

This dual-path design ensures access to Twinkle✨’s training services using Tinker API, with a simple modification of the Tinker base URL.

Multi-Tenancy

Twinkle✨ supports simultaneous multi-tenant training on a shared base model. Leveraging a LoRA Pool + Tenant Application architecture, Twinkle enables up to N tenants to train in parallel with complete isolation. This design offers unprecedented flexibility: from the model's perspective, each tenant's session is distinct, supporting heterogeneous configurations including unique data padding strategies, optimizers, and loss functions—all running concurrently on the same base model.

Note: This feature is currently optimized for LoRA.

For example:

  • Tenant A: Load local private dataset locally, LoRA rank=8, using base model for SFT
  • Tenant B: Load open-source dataset from Hub remotely, LoRA rank=32, using base model for PT
  • Tenant C: Use base model for GRPO loss calculation, using Sampler for sampling
  • Tenant D: Use base model for logps inference

These processes are executed concurrently on a single base model because the Model and Sampler are integrated as task-agnostic components within the Twinkle✨ ecosystem. Upon completion, checkpoints are automatically pushed to ModelScope or HuggingFace repositories (private by default). On the server side, Twinkle✨ provides a robust multi-tenant suite featuring automated cluster management and dynamic scaling, making it the foundation for building customizable, enterprise-grade training services.

As a modular framework, Twinkle✨ also supports remote temporary exclusive training, i.e., training in full-parameter mode.

🛠️ Twinkle✨ Modular Ecosystem

Dataset
Data loading and preprocessing

Template
Encoding and decoding

DataLoader
Data distribution and batching

Preprocessor
Data ETL

InputProcessor
Task-specific input processing

Model
Large models, supports multiple frameworks

Sampler
Sampler logic

Loss
Loss functions

Metric
Training metrics collection

Reward
Reward function

Advantage
Advantage function

CheckpointEngine
Weight synchronization

Patch
Patches for model fixes

Module
Components, e.g., Optimizer

Kernel
Operators

Server
Start backend cluster

Client
Client code

Infra
Isolate ray and torchrun differences

Plugin
Use hub components

Hub
Interface with HF/MS libraries

Community Components

Component Type Component Link Component Function Author
Patch qwen3_moe_transformers4_patch Fixes Qwen3 MoE model hang issue during FSDP2 training, effective for transformers==4.x ModelScope Official

Contributions

Twinkle✨ is a collaborative initiative put together by ModelScope in partnership with the open-source community, with key contributions from strategic stakeholders including China Merchants Bank Tech Team.

We are grateful to the open-source community, particularly the projects that inspired us, including Transformers, MS-SWIFT, veRL, Tinker, and many others.

We welcome open contributions via issues and pull-requests.

About

Twinkle: Training workbench to make your model glow.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published