Paged Attention合入pr，待验证 by Susskind115 · Pull Request #54 · InfiniTensor/InfiniLM

Susskind115 · 2025-09-23T12:07:07Z

Paged Attention合入pr，已修改完当前所有冲突，经验证后方可合入。
特点：vllm-like Scheduler and Memory Manager；Paged Attention；

PanZezhong1725 · 2025-09-24T01:43:21Z

python/example.py

@@ -0,0 +1,158 @@
+import os
+os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"


这个调脚本的时候用户自己写吧，要么就作为脚本参数传进来

PanZezhong1725 · 2025-09-24T01:50:49Z

scripts/jiuge.py


 if __name__ == "__main__":
-    test()
+    test()


这不小心改的？

PanZezhong1725 · 2025-09-24T01:51:45Z

scripts/test_ppl.py

-    dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
+
+    print("Loading dataset...")
+    local_file_paths = {


这种写死的就不要出现在脚本里了

PanZezhong1725 · 2025-09-24T01:52:14Z

scripts/test_ppl.py

    parser = argparse.ArgumentParser()
    parser.add_argument("--model-path", type=str, required=True)
    parser.add_argument("--port", type=int, default=8000)
-    parser.add_argument("--endpoint", type=str, default="/completions")


ppl用的就是completion接口不是chat

PanZezhong1725 · 2025-09-24T01:53:36Z

setup.sh

@@ -0,0 +1,589 @@
+#!/bin/bash


这是什么？

PanZezhong1725 · 2025-09-24T01:54:08Z

src/tensor.hpp

    std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape) const;
    std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape, const std::vector<ptrdiff_t> &new_strides) const;

+    // template <typename T>


没用就删掉吧

PanZezhong1725 · 2025-09-24T01:54:23Z

src/tensor/tensor.cpp

 void Tensor::debug() const { this->debug(""); }
+
+
+// template <typename T>


没用就删掉

wooway777

全篇，包括很多没标记的文件，包含大量被注释的代码。请核对是否有保留的必要。
如果是使用示例请标明。
一两行常用修改可能还好，大段的调试中或弃用代码建议只保留最终使用的版本。

wooway777 · 2025-09-24T02:08:30Z

include/infinicore_infer/models/jiuge.h

+// __C __export void
+// dropKVCache(const struct JiugeModel *,
+//             struct KVCache *);
+


是不是不需要加这么多commented code

wooway777 · 2025-09-24T02:09:48Z

include/infinicore_infer/models/jiuge.h

+           const int32_t *block_tables,
+           const int32_t *slot_mapping,
+           const float *temperature, const uint32_t *topk, const float *topp,
+           const uint32_t is_prefill, const bool enable_paged_attn,


is_prefill和enable_paged_attn是否都应为bool

wooway777 · 2025-09-24T02:11:36Z

include/infinicore_infer/models/jiuge.h

+             struct KVCache **kv_caches,
+             const int32_t *block_tables,
+             const int32_t *slot_mapping,
+             const uint32_t is_prefill, const bool enable_paged_attn,


wooway777 · 2025-09-24T02:12:41Z

python/bench.py

+import time
+import sys
+from random import randint, seed
+# from nanovllm import LLM, SamplingParams


是否需要保留？

wooway777 · 2025-09-24T02:14:33Z

python/example.py

+              attention_bias=True, enable_paged_attn=args.enable_paged_attn, max_kvcache_tokens=max_kvcache_tokens)
+
+    sampling_params = SamplingParams(temperature=0.6, max_tokens=128)
+    # prompts = [


后面这60行是不是要整理一下

Susskind115 and others added 10 commits August 21, 2025 11:51

feat: Add vLLM scheduler

74c92cb

paged attention final version

10cc4f8

optimize test interface

c5a8d3e

switch to 70B

b64a93f

ppl challenge finish.

66817ae

new func perplexity

2da3df3

new func perplexity_

860e068

add feature ppl and perf

628becd

优化paged attn寻址

5c1686c

修改所有冲突，未验证

8710077

PanZezhong1725 requested review from Ceng23333, PanZezhong1725 and wooway777 September 24, 2025 01:35

PanZezhong1725 requested changes Sep 24, 2025

View reviewed changes

wooway777 requested changes Sep 24, 2025

View reviewed changes

voltjia requested a review from a team December 29, 2025 01:53

		@@ -0,0 +1,158 @@
		import os
		os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

		void Tensor::debug() const { this->debug(""); }


		// template <typename T>

Conversation

Susskind115 commented Sep 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wooway777 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants