Wednesday, March 31, 2010

OIT - Order Independent Transparency

recently i'm checking dx11 features and one of the nice thing he have is read/write/structured buffers. (unlike shader model 4 that have read only buffers)
read/write buffer (or RWBuffer) is just as it sounds, a buffer that we can use to read from or/and write into.
structured buffers (or SB) is a buffer that contains elements (c style structs), for example:
struct MyStruct
float4 color;
float4 normal;

then, we can use it like this:
StructuredBuffer mySB;
mySB[0].Normal = something
New Resource Types
RWSB have another nice feature, it have internal counter that we can increment/decrement its value.
those buffers can be used to create very nice effects and really fast (if used wisely).
this time i'm using it to solve some old problem in computer graphics:
order independent transparency (or OIT), if you don't know what i'm talking about then i suggest you should google on it...
in dxsdk you can find a sample called OIT, this sample uses compute shader to solve this problem, the sample show you how you should NOT use compute shader! and just show you that its possible using it... this sample runs so slow that you better do a software solution for it ;)
one thing you should remember is: if you write your pixel shader good enough, you probably get better performance than using compute shader (with some exception off course)

anyway, the question i want to answer is: how can we draw transparent surfaces without worrying about the order?
there are few solution to this but every one of them have some drawback:
1. depth peeling - very heavy, even if using dual version.
reverse depth peeling is more friendly as it peel layers from back to front so we can blend immediately with the backbuffer and only use one buffer for it.
2. stencil routed k buffer - say goodbye to your stencil and allow you up to 16 fragments to be blended, for some of you its enough for the some its not an option.
3. dx11 method, based on the idea presented on gdc2010 by ati.
i will focus on 3 as its what i implemented and it doesn't suffer from the issues the other methods is.
the idea can be simplify into 2 main steps:
1. build - in this step we build linked list for every fragment that contains all the fragments that "fall" on this fragment screen position, i use RWBuffer to store the linked list head for that pixel, and another 2 RWBuffers to store pixel attributes (color, depth and next "pointer" to the next fragment in the list), those attributes buffers need to be big enough to contain all the fragments that placed at the same fragment position.
in this step i'm also using the structured buffer internal counter to get unique "id" for each fragment and use it to write/read the fragments attributes.

2. resolve - this is where we do the magic, here i just traverse the linked list stored in step 1, and sort the fragments, then i manually blend the fragments.
note that you need to sort the fragments in place! that means your linked list should be accessed a lot or... you can simply fill a bunch of registered, and sort them instead.
tip: use the [earlydepthstencil] attribute to skip pixels that aren't needed to be processed.

here is a screenshots that show it in action using 32 registers (max of 32 fragments per screen fragment)

31 transparent surfaces

debug mode that shows the fragments count in each of the fragments linked list
black means 0 fragments, white means 32 fragments (num_fragments/max_fragments)

yes i know, this 32 surfaces can be easily sorted and rendered correctly without the GPU, but here i'm not sorting and not worrying about the order! i'm just rendered them as one batch, maybe next time i will show complex geometry... ;)

Monday, March 15, 2010

Real-Time Destruction/Fractures

in the last two weekends i'v been working on real time destruction/fractures on 3d models, i always amazed by havok destruction demonstration and even more by red fraction guerrilla.
unlike havok physics/animation, havok destruction isn't free, but i really wanted something like that in my engine.
after few days of thinking and googling, i got few options on how to implement it:
1) get into some really nasty physics using fem model, build tetrahedrons that fit to the model and use it to know which part do i need to break.
2) voxelize the model and use the voxels to know which part do i need to break.
3) clip the model by a plane.
every option need to take care two things:
1) the render model which is the model that you render and contain valid tnb vectors, mapping, position etc.
2) the physics model which you use to do collision with.

the option i implement is 3, because it is very fast to implement, doesn't need to preprocess the mesh like option 1 and does not restrict you with same fracture size like option 2.
also it doesn't take extra memory to store data per model, everything is computed in real time, so if you are not breaking the model it is just like any other model in the scene.
if you break it, it will take the memory needed to store fractures data and that's it.
as you all know (or don't), nothing comes for free, more memory means less cpu work and less memory means more cpu work.
this is always the dilemma when implementing an algorithm, but in this case its not because you can't really preprocess the fractures because you don't really know where the player is going to shot the model.
some games generating fractures in advance and when you shot on the model, the closest fractures is detached from the parent model, in my opinion its doesn't do the work like the real thing and again, memory could be huge if you split the model into small fractures.
another thing to take into account is how many time do you have to implement it, if you don't really have a limit i think you can find some cool idea...
anyway, option 3 can be simplified in few steps:
1. determine the model that you hit, call it M (your physics engine should do the job)
2. split M by a plane into two parts, negative and positive, call them N and P
3. close the holes of N, P using triangulation algorithm (ear cutting or something similar), if you assume convex shapes you can simply build a fan to close those holes very fast.
4. apply uv mapping for the faces that closed N,P holes.
5. build physic shapes for N and P.
thats it, the catch is to do those steps fast!
here is a video to show what i'v done so far, note that in the video step 4 is disabled and you see the same color (all vertices map into 0,0) that's because i need to add option to give faces from step 3 different material so it will look good and not using #1 texture i used in the video