Skip to content

feat: v0 tab completion#110

Draft
wjiayis wants to merge 22 commits intostagingfrom
feat/tab-completion
Draft

feat: v0 tab completion#110
wjiayis wants to merge 22 commits intostagingfrom
feat/tab-completion

Conversation

@wjiayis
Copy link
Member

@wjiayis wjiayis commented Feb 4, 2026

#33

In short, it'll

  1. [Frontend] Recognize that user is trying to add a citation (trigger is \cite{)
  2. [Frontend] Temporarily suppress default overleaf dropdown suggestions
  3. [Frontend] Get the last sentence as context for LLM
  4. [Backend] Fetch bibliography in .bib files stored in project as raw text
  5. [Backend] Query a lightweight LLM (hardcoded to gpt-5-nano for now) to get at most 3 citation keys

There are some outstanding issues

  1. Latency is way too high (severe) -> I'll see what kind of caching / design choices I can do to bring the latency down
  2. When user is idle for a long time, token expires and user is logged out. User will not see citation suggestions until they log back in -> won't fix in this iteration

@wjiayis wjiayis mentioned this pull request Feb 4, 2026
@Junyi-99
Copy link
Member

Junyi-99 commented Feb 5, 2026

Hi @wjiayis, thanks for the update. I’ve created a new issue to address the token expiration problem.

Regarding the latency issue, do we have visibility into which part of the pipeline contributes most to the high latency? For example, a rough breakdown across:

frontend → backend → LLM provider (reasoning + response)

I’ll take a look at this PR later this evening as well.

@Junyi-99 Junyi-99 linked an issue Feb 5, 2026 that may be closed by this pull request
@wjiayis
Copy link
Member Author

wjiayis commented Feb 5, 2026

@Junyi-99 I haven't gotten to the latency breakdown yet, but I've settled everything else and I'm gonna work on this next. Thanks for helping to review when convenient, I'll update my findings when I have them too!

@Junyi-99
Copy link
Member

Junyi-99 commented Feb 5, 2026

@wjiayis Got it, thanks for the update. Looking forward to your findings.

@wjiayis
Copy link
Member Author

wjiayis commented Feb 5, 2026

Root Cause

There's a ~20s latency in the inline-suggestion loop, and >99% of the latency comes from waiting for LLM to start responding. This issues arises because I'm passing in a large (but realistic) bibliography (the bibliography of PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing itself), and gpt-5-nano takes a while to parse it.

Solution

I think it's reasonable to expect that a regular user's max latency tolerance is ~2s. I'll implement the following 3 solutions to achieve that.

Model Selection

gpt-5-nano is takes a long time to process the long bibliography. Just swapping it out for gpt-5.2 brings latency down to 2-4s. But gpt-5.2 is expensive to call. I'll improve latency and cost with the next 2 solutions.

Prompt Caching

Since bibliography remains generally constant and takes up the bulk of the prompt, I'll use OpenAI's prompt caching - advertised to reduce latency by up to 80% and input token costs by up to 90%.

  1. Place bibliography at the start of the prompt (prompt caching uses exact prefix match)
  2. Run a "no-reply" LLM query at the start of each session and when database reloads, and config it to cache 24h
  3. Each time \cite{ is triggered, the cached bibliography is used -> lower latency

Prompt Refinement

I'll remove info-sparse fields (eg doi, url, pages) and retain only info-rich fields (eg title, booktitle), to reduce the total size of bibliography by (hopefully) at least 40%.

cc: @Junyi-99

@kah-seng kah-seng mentioned this pull request Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Tab-Completion

2 participants