How to Resolve an MSAA DepthBuffer

If you want to implement MSAA (multisampled antialiasing) rendering, you need to render into multismpled render targets. When you want to read an anti aliased rendertarget as a shader resource, first you need to resolve it. Resolving means copying it to a non multisampled texture and averaging the subsamples (in D3D11 it is performed by calling ResolveSubresource on the device context). You can quickly find out, that it doesn’t work that way for a depthbuffer.

When you specify D3D11_BIND_DEPTHSTENCIL when creating a texture, and later try to resolve it, the D3D11 debug layer throws an error, telling you that you can’t do that. You must do the resolve by hand in a shader.

I chose the compute shader to do the job, because there is less state setup involved. I am doing a min operation on the depth buffer while reading it to get the closest one of the samples to the camera. I think most applications want to do this, but you could also get the 0th sample or the maximum, depending on the computation needs.

[code language="cpp"]Texture2DMS<float> input : register(t0);

RWTexture2D<float> output : register(u0);

[numthreads(16, 16, 1)]

void main(uint3 dispatchThreadId : SV_DispatchThreadID)

{

uint2 dim;

uint sampleCount;

input.GetDimensions(dim.x, dim.y, sampleCount);

if (dispatchThreadId.x > dim.x || dispatchThreadId.y > dim.y)

{

return;

}

float result = 1;

for (uint i = 0; i < sampleCount; ++i)

{

result = min(result, input.Load(dispatchThreadId.xy, i).r);

}

output[dispatchThreadId.xy] = result;

}[/code]

I call this compute shader like this:

[code language="cpp"]Dispatch(ceil(screenWidth/16.0f), ceil(screenHeigh/16.0f), 1)[/code]

That’s the simplest shader I could do, it just loops over all the samples, and does a min operation on them.

When dispatching a compute shader with parameters like this, the dispatchThreadID gives us a direct pixel coodinate. Because there could be cases when the resolution is not dividable by the threadcount, we should make sure to discard the out of boundary texture accesses.

It could also be done with a pixel shader, but I wanted to avoid the state setup of it. In the pixel shader, we woud need to bind rasterizer, depthstencil, and blend states, and even input layouts, vertex buffers or primitive topologies unless we abuse the immediate constant buffer. I want ot avoid state setup whenever possibe because it increases CPU overhead and we can do better here.

However, I’ve heard that calling a compute shader in the middle of a rasterization pipeline can incur additional pipeline overhead, I’ve yet to witness it (comment if you can prove it).

If I’d like to do a custom resolve for an other type of texture, I would keep the shader as it is, but would change the min operation only for an other one, for example an average, or max, etc…

That is all I wanted to keep this fairly short.

turanszkij Avatar

Posted by

7 responses to “How to Resolve an MSAA DepthBuffer”

  1. Hi – I found your blog on twitter and there’s some great stuff! Correct me if I’m wrong, but in the compute shader, you can skip the dimensions check. Out-of-bounds reads will return zero (https://msdn.microsoft.com/en-us/library/windows/desktop/bb509694(v=vs.85).aspx) and similarly I believe out-of-bounds writes are ignored.

    1. Hi, you are right, the bounds check is not necessary. Thanks for the feedback!

  2. Do you resolve the normal buffer as well to use it for post-processing?

    1. Yes, but that can be resolved in DX11 automatically by calling ID3D11DeviceContext::ResolveSubresource. This averages the subsamples and copies the result into a single sample resource.

      1. The downside is that you do not have normalized (not normalized for 2-norm, but normalized for 1-norm) normals anymore, then multiple post-process stages must all do a separate normalization when fetching texture data.

  3. I do something like this in a regular pixel shader, and reading values from the depth textures seems to be clamped between 0.0 and 1.0. Did you observe something like this in the compute shader also? Not sure if the values should be clamped to this range by the projection matrix – but why using float in this case something like a (non existing) D32_UNORM would make better usage of the 32 bits.

    1. Reading from compute shader vs. pixel shader should make no difference. The clip space in D3D is in the [0,1] range for depth values, so usually you would get values in that range if you enable depth clipping. I think if you disable depth clipping, then you could end up with values outside the range.
      As for the D32_UNORM format, to be honest I am not sure why is it not available, but I would guess that it’s not necessary, because D32_FLOAT has already very good precision in the [0,1] range. If you are having problems with depth buffer precision, I suggest the “reversed Z buffer” technique, which has no downsides, but can increase effective precision greatly. With a FLOAT depth buffer, you also have more opportunity to use a custom depth buffer solution, such as logarithmic depth buffer.
      If you still need UNORM and greater precision than 16_UNORM, then there is a D24_UNORM_S8_UINT format. That is most likely will be using a 32 bit render target for the depth + 8 for stencil (at least AMD does) as opposed to the name, but it is implementation defined. With this format you should get precision like 24 bit UNORM.
      Is this helpful? 🙂

Leave a Reply

Discover more from Wicked Engine

Subscribe now to keep reading and get access to the full archive.

Continue reading