Conversation
feat: v0 tab completion
|
Hi @wjiayis, thanks for the update. I’ve created a new issue to address the token expiration problem. Regarding the latency issue, do we have visibility into which part of the pipeline contributes most to the high latency? For example, a rough breakdown across:
I’ll take a look at this PR later this evening as well. |
|
@Junyi-99 I haven't gotten to the latency breakdown yet, but I've settled everything else and I'm gonna work on this next. Thanks for helping to review when convenient, I'll update my findings when I have them too! |
|
@wjiayis Got it, thanks for the update. Looking forward to your findings. |
Root CauseThere's a ~20s latency in the inline-suggestion loop, and >99% of the latency comes from waiting for LLM to start responding. This issues arises because I'm passing in a large (but realistic) bibliography (the bibliography of SolutionI think it's reasonable to expect that a regular user's max latency tolerance is ~2s. I'll implement the following 3 solutions to achieve that. Model Selection
Prompt CachingSince bibliography remains generally constant and takes up the bulk of the prompt, I'll use OpenAI's prompt caching - advertised to reduce latency by up to 80% and input token costs by up to 90%.
Prompt RefinementI'll remove info-sparse fields (eg cc: @Junyi-99 |
#33
In short, it'll
\cite{)There are some outstanding issues