script to continuously evaluate elser#2670
Draft
davidkyle wants to merge 1 commit intoelastic:mainfrom
Draft
Conversation
| parser.add_argument('--output_file', default='out.json') | ||
| parser.add_argument('--log_file', default='log.txt') | ||
| parser.add_argument('--num_threads_per_allocation', type=int, help='The number of inference threads used by LibTorch. Defaults to 1.') | ||
| parser.add_argument('--num_allocations', type=int, help='The number of allocations for parallel forwarding. Defaults to 1') |
There was a problem hiding this comment.
We should probably include setting the cache size here. Does it default to 0 if --cacheMemorylimitBytes isn't passed?
Contributor
There was a problem hiding this comment.
Does it default to 0 if --cacheMemorylimitBytes isn't passed?
Yes, the default cache value is 0:
--cacheMemorylimitBytes arg Optional memory in bytes that the
inference cache can use - default is 0
which disables caching
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First download the elser model locally. Either
The script runs pytorch_inference, loads the model then continuously runs inference on it. Logging is to std out, the model output is written to a json file. Every 100 request the script asks the pytorch_inference how much memory it is using and this is written to the same json file.
grep mem out.jsonwill show that data.Run with
--num_threads_per_allocationand--num_allocationsare the parameters to tweak. Increasing either of those will make inference faster and changes in memory should be seen sooner.