-
Notifications
You must be signed in to change notification settings - Fork 12
AVX512 + Neon impl #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
AVX512 + Neon impl #520
Conversation
|
@MarcinMordecki I wanted to look into finally integrating this lately. Is this a fully updated branch? Corollary: where can I find your ARM integration? |
d803723 to
331c070
Compare
I've updated the PR, please see the new contents. |
|
Working on this now. This should close both #269 and #115. For now I rebased onto main and fixed errors, but I will need to look through all the code and make sure we have sufficient test coverage. Good news is that since the start of this work the AVX512 intrinsics got stabilised, so we no longer need to rely on the nightly toolchain 🎉 |
* chore: enable and fix a number of lints Went through the list of optional clippy lints added since the project started and enabled some that I considered worthwhile. This resulted in two major changes: - all #[allow] occurrences now have a reason field; - all assertions in non-test code have a custom message. This additionally bumps the MSRV to 1.89 in preparation for #520
Additionally, the AVX512 feature is now stable on Rust 1.89. Bumped the MSRV and reverted to a stable toolchain.
| pub(crate) phantom: PhantomData<&'a ()>, | ||
| } | ||
|
|
||
| // TODO FIXME: consider rewriting training and count_zeros etc. functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcinMordecki What exactly does this item mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was writing the code, I was afraid that count_trailing (there's a typo) and count_zeros functions (popcnt etc.) might not be well-optimized on NEON - when testing performance as a whole, I saw speedup over the no-SIMD version, so I deemed this concern not critical.
I don't remember the details right now (which exact instructions are used), but it might be worth double-checking if there's some room for improvement.
Experimental AVX512 and Neon usage.
It's in quite a dirty state. As far as I remember, I just did a brutal override to run Neon for my experiments. Nevertheless, the functions should work, more or less.