My name is Arseny Kapoulkine and this is my blog where I write about computer graphics, optimization, programming languages and related topics. I’m the author of pugixml, meshoptimizer and other projects.

11 March 2019 Small, fast, web

When implementing vertex/index decoders in meshoptimizer, the main focus was on lean implementation and decompression performance.

When your streaming source is capable of delivering hundreds of megabytes per second, as is the case with SSD drives, and you want to accelerate loading by compressing the data further, you need to be decompressing at multiple hundreds of megabytes per second, ideally a gigabyte, to make sure a small number of CPU cores can keep up with IO. Keeping implementation lean meant it was easy to understand and optimize. To supplant the inevitable loss of compression ratio, the codecs were designed in such a way that their output can be compressed further using lossless general purpose compressors such as lz4 or zstd, thus offering an easy tradeoff between compression ratio and performance.

This set of implementation decisions unexpectedly resulted in algorithms that are a pretty good fit for delivering web content. The performance penalty often induced by running code in the browser is offset by the incredibly high baseline performance, and most web content has “free” gzip compression efficiently applied during the download process. This article will describe the evolution of decoder.js, a WebAssembly port of geometry decoders from meshoptimizer.

17 February 2019 Flavors of SIMD

During development of meshoptimizer a question that comes up relatively often is “should this algorithm use SIMD?”. The library is performance-oriented, but SIMD doesn’t always provide significant performance benefits - unfortunately, the use of SIMD can make the code less portable and less maintainable, so this tradeoff has to be resolved on a case by case basis. When performance is of utmost importance, such as vertex/index codecs, separate SIMD implementations for SSE and NEON instruction sets need to be developed and maintained. In other cases it’s helpful to understand how much SIMD can help to make the decision. Today we will go through the exercise of accelerating sloppy mesh simplifier, a new algorithm that was recently added to the library, using SSEn/AVXn instruction sets.

17 January 2019 Is C++ fast?

A library that I work on often these days, meshoptimizer, has changed over time to use fewer and fewer C++ library features, up until the current state where the code closely resembles C even though it uses some C++ features. There have been many reasons behind the changes - dropping C++11 requirement allowed me to make sure anybody can compile the library on any platform, removing std::vector substantially improved performance of unoptimized builds, removing algorithm includes sped up compilation. However, I’ve never quite taken the leap all the way to C with this codebase. Today we’ll explore the gamut of possible C++ implementations for one specific algorithm, mesh simplifier, henceforth known as simplifier.cpp, and see if going all the way to C is worthwhile.

30 December 2017 Voxel terrain: physics

In the last article we’ve discussed the particulars of voxel data definition and storage for voxel terrain we use at Roblox. From there on a lot of other systems read & write data from the storage and interpret it in different ways - the implementation for each system (rendering, networking, physics) is completely separate and not tied too much to decisions storage or other systems are making, so we can study them independently.

While logically speaking it would make sense to look at mesher next (which is how we call the component that is capable of taking a box of voxel data and producing triangle data representing the terrain surface with material attributes), since it is used by both physics and rendering systems, the algorithm is pretty involved and has quite a bit of “magic” so we will leave that for some other time and will instead look at physics today.

31 July 2017 Optimal grid rendering isn’t optimal

I have been working a lot on vertex cache optimization lately, exploring several algorithms from multiple axes - optimization performance, optimization efficiency, corner cases and the like. While doing so, I’ve implemented a program to verify that the algorithms actually produce results beneficial for real hardware - and today we will discuss one such algorithm, namely “Optimal Grid Rendering”.