My name is Arseny Kapoulkine and this is my blog where I write about computer graphics, optimization, programming languages and related topics. I’m the author of pugixml, meshoptimizer, volk and other projects.

31 May 2024 X is justifiably slow

I regularly hear or read statements like this: “X is slow but this is to be expected because it needs to do a lot of work”. It can be said about an application or a component in a larger system, and can refer to other resources that aren’t time. I often find these profoundly unhelpful as they depend much more on the speaker’s intuition and understanding of the problem, than X itself.

20 April 2024 target_clones is a trap

In Luau, modulo operator a % b is defined as a - floor(a / b) * b, the definition inherited from Lua 5.1. While it has some numeric issues, like behavior for b = inf, it’s decently fast to compute so we have not explored alternatives yet.

That is, it would be decently fast to compute if floor was fast.

09 April 2024 Meshlet triangle locality matters

When working with mesh shaders, the geometry needs to be split into meshlets: small geometry chunks where each meshlet has a set of vertices and triangle indices that refer to the vertices inside each meshlet. Mesh shader then has to transform all vertices and emit all transformed vertices and triangles through the shader API to the rasterizer. When viewed through the lens of traditional vertex reuse cache, mesh shaders seemingly make the reuse explicit so you would think that vertex/triangle locality within one meshlet doesn’t matter.

You would be wrong.

23 March 2024 Condvars and atomics do not mix

When using std::condition_variable, there’s an easy to remember rule: all variables accessed in wait predicate must be changed under a mutex. However, this is easy to accidentally violate by throwing atomics in the mix.

15 March 2024 LLM inference speed of light

In the process of working on calm, a minimal from-scratch fast CUDA implementation of transformer-based language model inference, a critical consideration was establishing the speed of light for the inference process, and measuring the progress relative to that speed of light. In this post we’ll cover this theoretical limit and its implications.