Skip to content

Conversation

@MarcinMordecki
Copy link

@MarcinMordecki MarcinMordecki commented Jun 28, 2024

Experimental AVX512 and Neon usage.

It's in quite a dirty state. As far as I remember, I just did a brutal override to run Neon for my experiments. Nevertheless, the functions should work, more or less.

@V0ldek
Copy link
Member

V0ldek commented Dec 15, 2025

@MarcinMordecki I wanted to look into finally integrating this lately. Is this a fully updated branch? Corollary: where can I find your ARM integration?

@MarcinMordecki MarcinMordecki force-pushed the main branch 3 times, most recently from d803723 to 331c070 Compare December 18, 2025 18:01
@MarcinMordecki MarcinMordecki changed the title [DRAFT] AVX512 implementation for classification/structural AVX512 + Neon impl Dec 18, 2025
@MarcinMordecki
Copy link
Author

@MarcinMordecki I wanted to look into finally integrating this lately. Is this a fully updated branch? Corollary: where can I find your ARM integration?

I've updated the PR, please see the new contents.

@MarcinMordecki MarcinMordecki marked this pull request as ready for review December 18, 2025 18:08
@V0ldek
Copy link
Member

V0ldek commented Feb 10, 2026

Working on this now. This should close both #269 and #115. For now I rebased onto main and fixed errors, but I will need to look through all the code and make sure we have sufficient test coverage.

Good news is that since the start of this work the AVX512 intrinsics got stabilised, so we no longer need to rely on the nightly toolchain 🎉

V0ldek added a commit that referenced this pull request Feb 11, 2026
* chore: enable and fix a number of lints

Went through the list of optional clippy lints added since the
project started and enabled some that I considered worthwhile.

This resulted in two major changes:
- all #[allow] occurrences now have a reason field;
- all assertions in non-test code have a custom message.

This additionally bumps the MSRV to 1.89 in preparation for #520
Additionally, the AVX512 feature is now stable on Rust 1.89.
Bumped the MSRV and reverted to a stable toolchain.
pub(crate) phantom: PhantomData<&'a ()>,
}

// TODO FIXME: consider rewriting training and count_zeros etc. functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcinMordecki What exactly does this item mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was writing the code, I was afraid that count_trailing (there's a typo) and count_zeros functions (popcnt etc.) might not be well-optimized on NEON - when testing performance as a whole, I saw speedup over the no-SIMD version, so I deemed this concern not critical.

I don't remember the details right now (which exact instructions are used), but it might be worth double-checking if there's some room for improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for AVX512. Add support for NEON (128-bit wide SIMD for ARM) for 64-bit architectures

2 participants