If you want to implement MSAA (multisampled antialiasing) rendering, you need to render into multismpled render targets. When you want to read an anti aliased rendertarget as a shader resource, first you need to resolve it. Resolving means copying it to a non multisampled texture and averaging the subsamples (in D3D11 it is performed by calling ResolveSubresource on the device context). You can quickly find out, that it doesn’t work that way for a depthbuffer.
When you specify D3D11_BIND_DEPTHSTENCIL when creating a texture, and later try to resolve it, the D3D11 debug layer throws an error, telling you that you can’t do that. You must do the resolve by hand in a shader.
I chose the compute shader to do the job, because there is less state setup involved. I am doing a min operation on the depth buffer while reading it to get the closest one of the samples to the camera. I think most applications want to do this, but you could also get the 0th sample or the maximum, depending on the computation needs.
[code language="cpp"]Texture2DMS<float> input : register(t0); RWTexture2D<float> output : register(u0); [numthreads(16, 16, 1)] void main(uint3 dispatchThreadId : SV_DispatchThreadID) { uint2 dim; uint sampleCount; input.GetDimensions(dim.x, dim.y, sampleCount); if (dispatchThreadId.x > dim.x || dispatchThreadId.y > dim.y) { return; } float result = 1; for (uint i = 0; i < sampleCount; ++i) { result = min(result, input.Load(dispatchThreadId.xy, i).r); } output[dispatchThreadId.xy] = result; }[/code]
I call this compute shader like this:
[code language="cpp"]Dispatch(ceil(screenWidth/16.0f), ceil(screenHeigh/16.0f), 1)[/code]
That’s the simplest shader I could do, it just loops over all the samples, and does a min operation on them.
When dispatching a compute shader with parameters like this, the dispatchThreadID gives us a direct pixel coodinate. Because there could be cases when the resolution is not dividable by the threadcount, we should make sure to discard the out of boundary texture accesses.
It could also be done with a pixel shader, but I wanted to avoid the state setup of it. In the pixel shader, we woud need to bind rasterizer, depthstencil, and blend states, and even input layouts, vertex buffers or primitive topologies unless we abuse the immediate constant buffer. I want ot avoid state setup whenever possibe because it increases CPU overhead and we can do better here.
However, I’ve heard that calling a compute shader in the middle of a rasterization pipeline can incur additional pipeline overhead, I’ve yet to witness it (comment if you can prove it).
If I’d like to do a custom resolve for an other type of texture, I would keep the shader as it is, but would change the min operation only for an other one, for example an average, or max, etc…
That is all I wanted to keep this fairly short.
Leave a Reply to dickyjimCancel reply