feat: Add global fast_math flag to CudaBuilder#331
feat: Add global fast_math flag to CudaBuilder#331koreaygj wants to merge 1 commit intoRust-GPU:mainfrom
Conversation
- Add fast_math field to CudaBuilder struct - Enables ftz, fast_sqrt, fast_div and fma_contraction(fmad) internally - Provides convenient parity with NVCC's --use_fast_math
nnethercote
left a comment
There was a problem hiding this comment.
The implementation is confused. How do the individual flags interact with the new flag? What happens if you specify fast_math and then also fast_sqrt(false)? Seems like either:
- CudaBuilder should handle all the fast_math stuff, and pass individual flags to NVCC, but not
--use_fast_math - CudaBuilder should just record fast_math and then pass
--use_fast_maththrough to NVCC
But currently it's doing a mixture of both.
Also, was this PR generated by AI? The description is very long, with subheadings and bullet points that are typical for AI.
| self.ftz = true; | ||
| self.fast_sqrt = true; | ||
| self.fast_div = true; | ||
| self.fma_contraction = true; |
There was a problem hiding this comment.
This is a redundant representation. You're storing fast_math and also the individual flags. It should do one or the other.
There was a problem hiding this comment.
Also, nothing happens if fast_math is false. Is that intended?
|
|
||
| if builder.fast_math { | ||
| llvm_args.push("--use_fast_math".to_string()); | ||
| } |
There was a problem hiding this comment.
Should this be plumbed through to NVCC, or should it be handled by CudaBuilder which then specifies the individual flags?
|
Thank you for your review. Do you think delegating to NVCC would be the better approach? I need your advices. |
|
How does this jive with https://simonbyrne.github.io/notes/fastmath/ ? |
Hi
I try to implemente a
fast_math()method to solve this issue. It provides a convenient way to enable fast math approximations globally, equivalent to NVCC's--use_fast_mathoption.Description
Implement a global
fast_mathflag for CudaBuilder as a convenience method equivalent to NVCC's--use_fast_mathoption.According to the NVCC official documentation, the
--use_fast_mathflag enables fast approximations for floating-point operations by internally settingftz,prec-sqrt, andprec-div.Implementation
I implemented a
fast_math()method that sets these three options internally:Usage
Users can now enable fast math globally with a single method call:
Instead of manually setting each flag:
Implementation Details
fast_math()method is a convenience wrapper that internally enablesftz,fast_sqrt, andfast_divinvoke_rustc()false)Relates to #262