⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@niklebedenko
Copy link
Contributor

In C++ CUDA, there's #pragma unroll, which allows you to force a loop to be unrolled. Rust does not have such an equivalent, but LLVM will decide whether to unroll your loop based on heuristics. If you set the LLVM option -unroll-threshold to a large number, you can make LLVM be more aggressive in its unrolling of loops.

This gave my Rust code a 10x speed improvement for some kernels, due to being able to index arrays at compile time, removing the need for local memory, stack frames, and function calls.

N.B. there is the unroll crate but it only supports unrolling loops with integer bounds. The LLVM approach allows unrolling loops over iterators also.

Copy link
Contributor

@LegNeato LegNeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! But I think there is one unrelated change?


[dependencies]
glam = { version = ">=0.22", default-features = false, features = ["libm", "cuda", "bytemuck"] }
glam = { version = ">=0.27", default-features = false, features = ["libm", "cuda", "bytemuck"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this bumped?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants