Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
daf20d9
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 6, 2026
6f1e63c
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 6, 2026
4deb7a7
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 9, 2026
676daf6
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 9, 2026
9bcfdca
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 10, 2026
2bfa878
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 10, 2026
262c470
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 11, 2026
171b4d3
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 17, 2026
def0bd2
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 19, 2026
4fad5dc
Merge remote-tracking branch 'upstream/develop' into develop
cloudforge1 Mar 20, 2026
3d739a6
Port ngram_match and hybrid_mtp_ngram kernels to CUDA
cloudforge1 Mar 20, 2026
477f749
Add correctness + latency test for GPU ngram kernels
cloudforge1 Mar 20, 2026
c349b12
Fix test data: step_idx semantics and ngram-matchable patterns
cloudforge1 Mar 20, 2026
217e587
fix: add CPU fallback path for ngram_match and hybrid_mtp_ngram ops
cloudforge1 Mar 21, 2026
08fe00a
fix(test): wrap imported ops with staticmethod to prevent self-binding
cloudforge1 Mar 21, 2026
305868d
fix(test): ensure max_model_len >= input_len to prevent broadcast err…
cloudforge1 Mar 21, 2026
1dfaed5
fix: keep input_ids_len on CPU in __init__, move to GPU in _run_impl
cloudforge1 Mar 22, 2026
b7f1f38
Extract shared ngram search into __device__ helper (ngram_match_commo…
cloudforge1 Mar 25, 2026
3f71877
refactor: parallel CUDA kernels for ngram_match (<<<bsz,256>>> search)
cloudforge1 Mar 30, 2026
838d6dc
fix: move __global__ kernel defs from .cuh to .cu files (fix linker m…
cloudforge1 Mar 30, 2026
f45e39b
fix: align mixed kernel signatures with host function tensors
cloudforge1 Mar 30, 2026
f0f623d
【Hackathon 9th No.49】Replace serial Phase 2 with CUB BlockScan parall…
cloudforge1 Apr 1, 2026
d37b581
fix: resolve Copilot/bot review comments on PR #7136
cloudforge1 Apr 1, 2026
8d7a4cb
test: add multi-scale latency benchmark (batch 32→1024)
cloudforge1 Apr 1, 2026
d4f09a8
cleanup: remove unused kernel params, dead struct, add benchmark env …
cloudforge1 Apr 1, 2026
2fab292
revert: remove benchmark env gate — let CI run benchmarks
cloudforge1 Apr 1, 2026
4a6d7d8
fix: address Copilot review — GPU mirror for input_ids_len, device fi…
cloudforge1 Apr 2, 2026
453f9bf
fix: correct stale comment in mixed gather (at-least-ori → 1-token)
cloudforge1 Apr 2, 2026
e769f5a
bench: add 5-group benchmark matching NKNaN methodology
cloudforge1 Apr 2, 2026
8ce4c53
fix: rename benchmark for CI discovery, bump to 10k iterations
cloudforge1 Apr 2, 2026
2ba6779
fix: correct stale filename in benchmark docstring
cloudforge1 Apr 2, 2026
c139634
fix: move PD_CHECK before Phase 1 launch (fail-fast)
cloudforge1 Apr 2, 2026
04346f8
bench: remove env-gate from benchmark groups, cut NUM_ITERS to 1000
cloudforge1 Apr 3, 2026
00a6d4c
fix: address Copilot review — conditional return, defensive guards, G…
cloudforge1 Apr 3, 2026
9bb642a
fix: clarify CAS comment, fix negative intermediate in CPU fallback
cloudforge1 Apr 3, 2026
d6f07ba
perf: A1 (1024 threads) + A2 (early-exit) + fix B1 UB in ngram_match
cloudforge1 Apr 3, 2026
9457c50
perf: template-specialize ngram search + cache scratch buffers + fix …
cloudforge1 Apr 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
389 changes: 330 additions & 59 deletions custom_ops/gpu_ops/speculate_decoding/draft_model/ngram_match_mixed.cu

Large diffs are not rendered by default.

227 changes: 0 additions & 227 deletions custom_ops/gpu_ops/speculate_decoding/ngram_match.cc

This file was deleted.

Loading
Loading