Open
Conversation
PanZezhong1725
requested changes
Sep 24, 2025
| @@ -0,0 +1,158 @@ | |||
| import os | |||
| os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7" | |||
Collaborator
There was a problem hiding this comment.
这个调脚本的时候用户自己写吧,要么就作为脚本参数传进来
|
|
||
| if __name__ == "__main__": | ||
| test() | ||
| test() No newline at end of file |
| dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test") | ||
|
|
||
| print("Loading dataset...") | ||
| local_file_paths = { |
| parser = argparse.ArgumentParser() | ||
| parser.add_argument("--model-path", type=str, required=True) | ||
| parser.add_argument("--port", type=int, default=8000) | ||
| parser.add_argument("--endpoint", type=str, default="/completions") |
Collaborator
There was a problem hiding this comment.
ppl用的就是completion接口不是chat
| @@ -0,0 +1,589 @@ | |||
| #!/bin/bash | |||
| std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape) const; | ||
| std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape, const std::vector<ptrdiff_t> &new_strides) const; | ||
|
|
||
| // template <typename T> |
| void Tensor::debug() const { this->debug(""); } | ||
|
|
||
|
|
||
| // template <typename T> |
wooway777
requested changes
Sep 24, 2025
Collaborator
wooway777
left a comment
There was a problem hiding this comment.
全篇,包括很多没标记的文件,包含大量被注释的代码。请核对是否有保留的必要。
如果是使用示例请标明。
一两行常用修改可能还好,大段的调试中或弃用代码建议只保留最终使用的版本。
| // __C __export void | ||
| // dropKVCache(const struct JiugeModel *, | ||
| // struct KVCache *); | ||
|
|
Collaborator
There was a problem hiding this comment.
是不是不需要加这么多commented code
| const int32_t *block_tables, | ||
| const int32_t *slot_mapping, | ||
| const float *temperature, const uint32_t *topk, const float *topp, | ||
| const uint32_t is_prefill, const bool enable_paged_attn, |
Collaborator
There was a problem hiding this comment.
is_prefill和enable_paged_attn是否都应为bool
| struct KVCache **kv_caches, | ||
| const int32_t *block_tables, | ||
| const int32_t *slot_mapping, | ||
| const uint32_t is_prefill, const bool enable_paged_attn, |
| import time | ||
| import sys | ||
| from random import randint, seed | ||
| # from nanovllm import LLM, SamplingParams |
| attention_bias=True, enable_paged_attn=args.enable_paged_attn, max_kvcache_tokens=max_kvcache_tokens) | ||
|
|
||
| sampling_params = SamplingParams(temperature=0.6, max_tokens=128) | ||
| # prompts = [ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Paged Attention合入pr,已修改完当前所有冲突,经验证后方可合入。
特点:vllm-like Scheduler and Memory Manager;Paged Attention;