Implementing Direct3D for fun and profit

I can’t believe I’m writing this, it’s been what, 2 months? During that time a lot of things happened – I’ve been to the conference and gave an hour-long talk about our SPU rendering stuff (which was more or less well received), I’ve almost completed an occlusion subsystem (rasterization-based), which is giving good results; and the financial crisis has finally hit the company I work at – some projects are freezed due to the lack of funding, and some people are fired. It’s kind of sad walking through half-empty offices… Anyway, I know I promised to write often but as I am actively developing my pet engine at home and there is a lot of stuff to work on at my day job, so time is a scarce resource for me. My blog/todo.txt file is already 20 entries long, where some things are too small to deserve a post, and others demand a lengthy series. I’ll try to select something interesting from time to time and blog about it. As for todays topic,

Every object in core Direct3D (I’ll be talking about 9 today, but the same thing should apply to 10 and 11) is an interface. This means that the details of actual implementation is hidden from us, but this also means that we can implement those interfaces. Why could we want to do that?

Reverse engineering

If you work in game industry/computer graphics, or, well, any other IT-related field, I suppose, then you should be constantly gaining new knowledge; otherwise your qualification as a specialist will decrease very fast. There are lots of ways to learn, and one of the best is to learn from others experience. Unfortunately, while there is a lot of information on the technology of some titles, most are not described at all. Also sometimes the descriptions are inaccurate – after all, devil is in the details. So what you can do is take an existing title and reverse-engineer it – that is, gain information about implementation details from the outside. Disclaimer: Of course, this information is provided only for educational value. Reverse engineering can violate the laws of your country and/or the EULA of the product. Don’t use it if it does.

In PC / Direct3D world there are two primary tools than can allow such introspection – NVidia PerfHUD and Microsoft PIX. There is also a beta of Intel GPA (which is, by the way, quite promising, if lacking polish), but it is more or less like PIX. Using PIX does not require modifications of the host program, however PIX does not work for some titles (it might crash), is slow (especially for titles with complex scenes, lots of draw calls, etc.) and is not very convenient to use as a reverse engineering tool for other reasons.

PerfHUD is more useful in some areas, but you need to create Direct3D device with a special adapter and in REF mode in order for PerfHUD to work. While some games already have this kind of support in released version (notable examples include The Elder Scrolls 4: Oblivion and S.T.A.L.K.E.R. - Shadows of Chernobyl), others are more careful (I hope if you’re reading this blog you have a build configuration such as Master or Retail, which sets appropriate defines so you can compile development-only stuff, such as asset reloading, profiling or NVPerfHUD support) out of the executable). But still if you manage to intercept the call to Direct3DCreate9 (which can be done for example by creating a DLL, calling it d3d9.dll and putting it near the game executable), you can return a proxy IDirect3D9 object, that forwards all calls to the actual object, except that it modifies the adapter/device type that are passed to CreateDevice. In fact, such proxy objects are used by both PIX and GPA, though the injection technique is more complex.

There are even some programs that simplify the following for you, allowing you to run any title in PerfHUD-compatible mode.

Multithreaded rendering

In fact, this is already described in a Gamefest 2008 presentation “Practical Parallel Rendering with DirectX 9 and 10, Windows PC Command Buffer Recording” (you can get slides and example code here). Basically, since neither Direct3D9 nor Direct3D10 support proper multithreading (creating device as multithreaded means that all device calls will be synchronized with one per-device critical section), you can emulate it via a special proxy device, which records all rendering calls in a buffer, and then uses the buffer to replay the command stream via real device. This saves processing time for other rendering work you do alongside API calls by allowing it to work in multiple threads, and is a good stub for deferred context functionality that’s available on other platforms (including Direct3D11 and all console platforms). I use this technique in my pet engine mainly for the purpose of portability – I can render different parts of the scene into different contexts simultaneously, and then “kick” the deferred context via the main one. On PS3 the “kick” part is very lightweight, so the savings are huge; on Windows during the “kick” part deferred context replays the command stream, so it can be quite heavy, but it’s faster than doing everything in one thread, and the code works the same way. When I start supporting Direct3D11, the same code will work concurrently, provided a good driver/runtime support of course.

Note that I don’t use Emergent library as is – I consider it too heavyweight and obscure for my purposes. They try to support all Direct3D calls, while I use only a handful – I don’t use FFP, I don’t create resources via this device, etc. My implementation is simple and straightforward, and is only 23 Kb in size (11 of which are reused in another component – see below). If anybody wants to use it I can provide the code to you to save you an hour of work – just drop a comment.

Currently my implementation has a fixed size command buffer, so if you exceed it, you’re doomed. There are several more or less obvious ways to fix this, but I hope that by the time I get to it I’ll already have D3D11 in place.

Asset pipeline

My asset pipeline is more or less the same for all asset types – there is a source for the asset (Maya/Max scene, texture, sound file, etc.), which is converted via some set of actions to a platform-specific binary that can be loaded by the engine. In this way the complexity of dealing with different resource formats, complex structures, data non suitable for runtime, etc. is moved from engine to tools, which is great since it reduces the amount of runtime code, making it more robust and easier to maintain. The data is saved to a custom format which is optimized for loading time (target endianness, platform-specific data layout/format for graphics resources, compression). I think I’ll blog about some interesting aspects/choices in the future as time permits (for example, about my experience of using build systems, such as SCons and Jam, for data builds), but for now I’ll focus on a tool that builds textures.

This tool loads the texture file, generates mipmap levels for the texture if necessary (if it was not a DDS with mip chain, and if target texture requires mipmap levels), compresses it to DXTn if necessary (again, that depends on source format and building settings), and makes some other actions, both platform-specific and platform-independent. In order for it to work, I need an image library that can load image formats I care about, including DDS with DXTn contents (so that I don’t need to unpack/repack it every time, and so that artists can tweak DXT compression settings in Photoshop plugin – in my experience there is rarely a visible difference, but if they give me a texture and I compress it to DXT and there are some artifacts, I’m to blame – and if they use Photoshop, it’s not my scope :)). As it turns out, D3DX is a good enough image loading library, at least it works for me (although in retrospect I probably should’ve used DevIL, and perhaps I will switch to it in the future).

Anyway, to load a texture via D3DX, you need a Direct3D device. As it turns out, while you can create a working REF device in under 10 lines of code (using desktop window and hardcoded settings), you can’t create any device, including NULLREF, if your PC does not have a monitor attached. This problem appeared once I got my pipeline working via IncrediBuild, and sometimes on some machines texture building failed. Since I did not want to modify my code too much, I ended implementing another proxy device, which is suitable for loading a texture with D3DX functions. This time it was slightly harder, because I needed implementations for some functions of IDirect3DDevice9, IDirect3DTexture9 and IDirect3DSurface9, but again the resulting code is quite small and simple – 6 Kb (plus the 11 Kb dummy device I mentioned earlier), and I can load any 2D texture. Of course I’ll need to add some code to load cubemaps and even more code to load volume textures, but for now it’s fine the way it is.

So these are some examples of situations where implementing Direct3D interfaces might prove useful. The next post is going to either be about multithreading, or about some asset pipeline-related stuff, I guess I’ll decide once I get to writing it.

UPDATE 25 OCT 2010: Here is the example code:

dummydevice.h - this is just an example of a dummy device implementation; it implements all device methods with stubs that can’t be called without a debugging break. This is useful for other partial implementations.

deferreddevice.h - this is the implementation of the device that buffers various rendering calls and then allows to execute them on some other device. Note that it lives in a fixed size memory buffer, which can be easily changed, and that it implements only a subset of rendering-related functions (i.e. no FFP).

texturedevice.h - this is the implementation of the device that works with D3DXCreateTextureFromFile for 2D textures and cubemaps (3D texture support is missing but can be added in the same way).