About

Work

I’m available for advising (but not for employment or contract work); please reach out via e-mail.

Previously I’ve worked as a technical fellow at Roblox, rendering engineer at Sperasoft on FIFA 13 and FIFA Street, as a PS3 programmer at Saber Interactive and as a lead engine developer at Creat Studios.

During my career I’ve helped ship many games on PS2/PS3/XBox 360/PC as well as being involved in supporting PC/Mac/iOS/Android/XBox One/PS4 at Roblox. My resume contains more details although I am not looking for recruiter reachouts.

Among many other things at Roblox, I’ve started Luau, a fast, small, safe, gradually typed embeddable scripting language derived from Lua. Luau is open-source, and while I am no longer working on it full-time, I occasionally contribute patches and remain an expert on the internals.

Projects

I’m working on a wide variety of open-source projects, most of which are hosted on GitHub. Here’s a short selection:

meshoptimizer

meshoptimizer is a library that can optimize geometry to render faster on GPUs by reordering vertex/index data. The library has algorithms that optimize vertex reuse, vertex access locality and overdraw, resulting in fewer vertex/fragment shader invocations and fewer cache misses when loading vertex data, as well as simplification, compression algorithms and other mesh processing utilities. gltfpack, which is a tool that can automatically optimize glTF files, is also developed and distributed alongside the library.

pugixml

pugixml is a light-weight C++ XML processing library with an extremely fast and memory efficient DOM parser and XPath 1.0 support. It is used in a wide range of applications, including various embedded systems, video game engines, offline renderers, web backends and robotics/space software. A lot of effort goes into making sure pugixml has an easy-to-use API, has as few defects as possible and runs on all widespread platforms.

calm

calm is a fast minimal GPU-accelerated large language model (decoder-only transformers) inference implementation, written in C with no dependencies. calm supports CUDA and Metal, and is one of the fastest implementations for NVidia/Apple hardware for token generation on a single-user single-GPU use case (calm intentionally does not support request batching or fast prefill to limit the scope to generation). It is not meant to be a production inference application, but aims to get as close to the bandwidth limit as possible on supported hardware and formats.

volk

volk is a meta-loader for Vulkan. It allows you to write a Vulkan application without adding a dependency to the Vulkan loader, which can be important if you plan to support other rendering APIs in the same applications; additionally it provides a way to load device entrypoints using vkGetDeviceProcAddr which can reduce the draw call dispatch overhead by bypassing the loader dispatch thunks.

niagara

niagara is a Vulkan renderer that is written on stream (broadcast on YouTube) from scratch. The project implements a few modern Vulkan rendering techniques, such as GPU culling & scene submission, cone culling, automatic occlusion culling, task/mesh shading. It is not meant to be a production renderer, but aims to present production-ready techniques even if the implementation cuts some corners for expediency. Past streams are available in a playlist.

qgrep

qgrep is a fast grep that uses an incrementally updated index to perform fast regular-expression based searches in large code bases. It uses RE2 and LZ4 along with a lot of custom optimizations to make sure queries are as fast as possible (see blog post), and a file system watcher to keep the index up to date. Additionally it features a Vim plugin for great search experience in the best text editor ;)

Past projects

In addition to current projects, I’ve worked on open-source projects that have been shelved for the foreseeable future and are archived on GitHub. Here’s a short selection:

codesize

codesize is a tool that shows the memory impact of your code using a hierarchical display adapted to work well in large C++ codebases. It works by parsing debug information from PDB/ELF/Mach-O files, including a performant DWARF information reader based on GNU binutils. The purpose of the tool is to let the developer quickly find areas in the codebase that can be improved to gain memory by reducing code size, which is particularly important on memory-constrained platforms.

phyx

PhyX is a 2D physics engine with SoA/SIMD/multicore optimizations. The project is based on an open-source physics engine SusliX; the original engine is scalar and single-core, whereas the main goal of this project was to explore various optimization techniques. In addition to traditional physics engine optimizations the project features fully SIMDified (SSE/AVX2) solver which works by pre-sorting the constrains into non-overlapping groups to eliminate data dependencies as well as “sloppy” island-less parallelization scheme that relies on convergence of racy impulse updates. Other parts of the project are heavily optimized with carefully implemented algorithms and data structures.

fastprintf

fastprintf is an F# library that replaces built-in formatting functions from printf family with much faster alternatives. The implementation compiles and caches the format specification and a set of types to a sequence of functions that do the necessary formatting and get JIT compiled by .NET runtime (without the use of Reflection.Emit). The library was written in F# 2/3 days to eliminate significant performance bottlenecks in code that used standard formatting facilities; modern versions of F# (5+) include string interpolation as well as a faster standard printf style formatter.

Publications

Here are some talks and publications I’ve done over the years:

SIGGRAPH 2020, “Large Voxel Landscapes on Mobile”. Slides Video
Game Developers Conference 2020, “Rendering Roblox: Vulkan Optimisations on Imagination PowerVR GPUs”. Slides Video
Game Developers Conference 2020, “Optimizing Roblox: Vulkan Best Practices for Mobile Developers”. Slides Video
Reboot Develop Red 2019, “Getting Faster and Leaner on Mobile: Optimizing Roblox with Vulkan”. Slides Video
GPU Zen 2: Advanced Rendering Techniques (2019), Chapter “Writing an efficient Vulkan renderer” Read online or buy the book
Game Developers Conference 2018, “Vulkan on Android: Gotchas and best practices”. Slides Video
The Performance of Open Source Applications (2013), Chapter 4: Parsing XML at the Speed of Light. Read online or buy the book (all royalties are donated to Amnesty International)
Russian Game Developers Conference 2010, “Job scheduler: as simple as possible”. Slides
Russian Game Developers Conference 2009, “SPU Render”. Slides
Russian Game Developers Conference 2008, “Baking graphics resources for next-generation platforms”. Slides in Russian

I also have a blog with technical posts on various subjects (you are reading it!).

Biography

Arseny Kapoulkine has worked on game technology for more than 15 years. Having worked on rendering, physics simulation, language runtimes, multithreading and many other areas, he is still discovering exciting problems in computing that require low-level thinking. After helping ship many titles on PS3 including several FIFA games, he worked on the Roblox platform for 11 years, helping young game developers achieve their dreams.

Contacts

You can reach me by e-mail at arseny.kapoulkine@gmail.com, on Twitter/X @zeuxcg or on Mastodon zeux@mastodon.gamedev.place

License

Unless specified otherwise, code snippets on this blog are distributed under the terms of MIT License.

Author

Posts