My name is Arseny Kapoulkine and this is my blog where I write about computer graphics, optimization, programming languages and related topics. I’m the author of pugixml, meshoptimizer, volk and other projects.
A data structure that comes up fairly often when working with graphs or graph-like structure is a jagged array, or array-of-arrays. It’s very simple to build it out of standard containers but that’s often a poor choice for performance; in this post we’ll talk about a simple representation/construction code that I found useful across multiple different projects and domains.
Crucially, we will focus on immutable structures - ones that you can build in one go from source data and then continuously query without having to change it. This seems like a major constraint but for many problems it is sufficient to build the structure once, and it makes significantly simpler and more efficient implementations possible.
Backface culling is something we take for granted when rendering triangle meshes on the GPU. In general, an average mesh is expected to have about 50% of its triangles facing away from the camera. Unless you forget to set appropriate render states in your favorite graphics API, the hardware will reject these triangles as early in the rasterization pipeline as possible. Thus, it would seem that backface culling is a solved problem. In this post, however, we’ll explore a few alternative strategies that may or may not improve rendering performance.
When working with mesh shaders to draw meshes, you need to split your source geometry into individual units called meshlets. Each meshlet would be processed by one mesh shader workgroup, and when compiling this mesh shader you need to specify the maximum number of triangles and vertices that the meshlet contains.
These numbers are subject to some hardware limits. On current drivers, AMD, Intel and NVidia expose limits of 256 triangles and 256 vertices through EXT_mesh_shader
Vulkan extension, but NVidia advertises a higher limit of 512 triangles & 256 vertices through NV_mesh_shader
. These limits are the ones you’d want to use when building your meshlets, eg when using meshoptimizer’s meshopt_buildMeshlet function - but what numbers do you actually use?
When working with various forms of culling, it can be useful to project the object bounds to screen space. This is necessary to implement various forms of occlusion culling when using a depth pyramid, or to be able to reject objects or clusters that don’t contribute to any pixels. The same operation can also be used for level of detail selection, although it’s typically faster to approximate the projected area on screen - here we’re interested in efficient conservative projected bounds. “Conservative” means that the resulting bounds must contain the original object. “Efficient” means that we’ll need to restrict ourselves to projecting 3D bounds that are known to contain the object - naturally, two common choices are a sphere and a box.
When working on vertex compressor for meshoptimizer in 2018, one of the goals was to make a bitstream that can be decompressed using (short) SIMD vector operations. This led to a host of design decisions in terms of how the data is structured, and some challenges when mapping the decoder flow onto various SIMD architectures. The most significant issue has to do with implementing an operation that often doesn’t have a straightforward implementation: byte expansion.