Oren Game Engine: 2010

Thursday, December 2, 2010

Deferred Iterative Parallax Mapping

hi
the title name is a "bomb" but its nothing too fancy...
just added parallax mapping using iterative scheme, this method uses few iterations of parallax mapping in order to get more accuracy.
for those who doesn't familiar with parallax mapping, here is a short description:
1. you need two texture maps: height, normal map
2. start from current texcoord used to sample your diffuse map (for example) and offset it by the height value (scaled and biased) of the current pixel in the direction you view direction (in texture/tangent space)
3. sample your diffuse map with the new texcoord.
iterative parallax mapping does just like that, but for X iteration, every iteration tries to move one step closer to more accurate solution.
don't get me wrong, you are not going to get better solution than a ray traced methods (depends on you viewing angle, at some angles you can't even tell the difference), but this method cost almost nothing, especially if you can save your height map values in normal map alpha channel.
also num iteration can be changed depending on surface properties and such.
here is screenshots thats shows it in action using deferred method:

without parallax

with parallax (4 iterations)

Wednesday, October 27, 2010

OIT - Order Independent Transparency #3

recently i needed to do OIT in DX9 HW and as you know (from my previous posts), i don't have the same access i have in DX11 (i can't create linked list per pixel and do the sorting after), so what i can do is simple depth peeling...
few words about depth peeling: depth peeling works by peel geometry with specific depth value, so to create OIT we can peel geometry starting from the nearest depth (to the camera eye) going to the farthest depth, each step we peel few pixels and store them to in separate RT, then we composite those RT's in back to front order to achieve the right transparency order effect.
so how do we implement that?
well, from what we can see we need color RT to store the result, also a depth buffer so we could compare pixels depth against, we also need a "second" depth buffer so we could reject pixels that already peeled/visited.
1. color RT supported by DX9 and upper
2. first depth buffer, also supported but...
3. second depth buffer, not supported but...
so we have issues with 2,3, to implement pixel depth compression we need to have the ability to pass depth buffer into the pixel shader so we could do the compare ourselves so we could reject already peeled/visited pixels, BUT DX9 doesn't even let us bind depth buffer as texture (even if i'm not writing to it - can be done in DX10, but we need DX9), so we need a way to emulate depth buffer that can be bind as texture and act as depth buffer.
to do this we can use float point RT, for the OIT we will use 2 float point RT, the first one used for first pass, second for the second pass, then the first again and so on, this is our ping/pong depth RT's.
the reason we need this kind of ping/pong RT's is to peel the correct layer every pass.
for example:
to peel the first layer we need empty depth buffer, the we render and store it to our color RT
to peel the second layer, we need to use the depth buffer from the first pass, so we could ignore the pixels handled at the first layer
to peel the third layer, we need to use the depth buffer from the second pass, so we could ignore the pixels handled at the second layer (and first layer)
and so on... you can see that we don't need more then 2 depth buffer to manage all those passes.
ok, so now we can peel layers with our depth buffer RT's, but how can do OIT for more than 8 layers? (we can only bind 8/16 texture), well here comes the reverse depth peeling (RDP) technique, instead of peeling in front to back order, peel with back to front order, so every time you peel one layer you can immediately composite it to the back buffer, so only one color buffer needed.
total memory needed could be one of those:
1. 3 RT's, two 32 bit float point and one RGBA
2. 2 RT's, one 32 bit float point for both depth buffer, and one RGBA
NOTE: in deferred rendering you can get rid of the RGBA buffer.
so as you can see, total memory is very low.
after all the talking, here comes a video that shows RDP in action:

simple planes shows the idea, this model doesn't need per pixel sorting but it shows that the algorithm works well.

Teapot model - needs per pixel sorting

NOTE: technique implemented in render monkey, the video shows very simple mesh containing 8 planes so it will be easy to see the effect.
each peeled layer render with small alpha value, so you can see how layers blended when you see through all layers (you get the white color)
some performance info: RM window 1272x856 ATI 5850 - 270+ FPS
thats it for now, cya...

Sunday, October 10, 2010

Post Anti-Aliasing

hi
recently i'v added few post effects and i thought i need some AA solution to remove those jaggy edges.
traditional AA solution have its quality, no questions but it have some drawbacks:
1. deferred or semi deferred need special care and will need DX10+.
2. needs more memory
3. eat gpu power
i guess there is more but this is enough for me to consider some other solution, not the same quality but better than none.
so what i did is pretty simple:
1. run edge detection
2. run post AA which samples 4 pixels and use the value from 1 to know if its an edge or not, if its an edge i output the filtered pixel, if not i output none filtered pixel.
the heart of the algorithm is 1, what i did is taking few neighbors pixels and check for a big change in their depths/normals (for normals i check angles, for depth i check the gradients from center pixel)
again, very simple but i think it does the work, very fast and its generic for xbox,ps3,pc...
here is few screenshots that shows you the process

Without Post AA

Edge Detection

With Post AA

Friday, September 10, 2010

Deferred Decals

hi
this time i improved my decals system.
one of the things that make decals looks real is the way it interact with the environment lighting and how it clipped against the world geometry...
the old decals system has good clipping but it was rendered using a forward rendering, so it didn't interact with environment lights/shadows which look very strange when you put decals in a dark room with small candle light.
another thing is, decals on dynamic entities, which need to take into account fast clipping against those entities.
to fix problem 1, i replaced the forward renderer into a deferred one, which rendered the decals into two render targets (using MRT support), one rt for diffuse color and the other for normals, then in the rendering pass i blend the diffuse/normal with surfaces diffuse/normals to get the final image.
to fix the second one, i change the way i create collision data, each object in the scene that needs collision detection keeps handle to collision tree which used to get the needed data when doing clipping, collisions etc...
that's it for now, i think the result are pretty good...
here is a screen shots that show it in action:

wall rendered without decal on it
the small picture in the top left shows diffuse term of lighting

wall rendered with decal on it
notice the bump map generated from the decal normals

physical entity rendered without decal on it

physical entity rendered with decal on it
note that when i "shot" the decal on the box, i apply impulse on the box so it moves a little

Thursday, August 5, 2010

Editor Updates

hi
last weeks i worked hard to make the editor useful in the meaning that i actually build full level with physics, lights, logic and all game play feature i could and would want to have.
to support future feature that i might want to have but i can't really think of them right now, i added something i called "generic entity system", this system basically allow me to define any entity i want by specifying entity params which also define the ui elements needed to define and set those params, i also support serialization, which load and save those params automatically.
params can be scalar, vectors, color, file, string, multiple selection values etc...
with the system, i define the main entities that allow me to build full working level:
1. light entity - define lights in the scene such as: point/spot/sun/dir light
2. player entity - define player start position when the level is up
3. physics constraints entity - define physics constraints such as spherical/hinge/fixed etc.
4. fx entity - define special effect using my effect system
5. model entity - define geometric model to be placed in the scene

the last time i wrote about the editor is said that i added boolean operation which helps me to build brushed that i could build the level from, after long thinking i decided that it wont be good in the long term as brushes i good for old school bsp and stuff like that.
the things is my engine doesn't do any assumption about the scene is being static, this means that everything can move and the engine will handle with no problem.
as for the visibility issue, i want to replace my portals with hierarchia z buffer occlusion culling (but this will be in another post...)

when i finish to work on the scene, i save it as a .map file, this is very similar format to the old quake map because it fit my generic entity system and all the entities written into it will all the needed params. it also very easy to parse it, used to load q4 maps, so i fit like a glove ;p
anyway, here is a screen shot that shows my progress:

the left toolbar is my action bar which i create all entities, manipulate scale/rotation/position.
the right window is the entity property grid which allow me to modify selected entity params.
the button window is the log windows, where i write all the action and info to the user.
that's it for now, cya...

Tuesday, June 8, 2010

Multiple API Render

hi
recently i didn't post a lot because the engine was down, not all things worked because replace the current render system with new one.
my engine use dx9 api for the rendering and when dx11 was out i wanted to integrate it, to do that i had two options:
1. replace all dx9 related code with dx11
2. create generic render system to support any render api (dx9, dx10, dx11, opengl, whatever)
after doing some thinking i got to the conclusion that option 2 will be the best.
option 1: good if i want dx11 only (win vista+) and supporting only dx11 features, if in the future ms decide to change few api and add few shader stages, this will be a mess to change and add those.
option 2: take some time to do, but will support anything you need and will need in the future (its all about how you design your api)

now i will describe few things i think is important to know when implementing option 2 with dx9, dx11 support (in case you will need to do it for some project).

the biggest challenge was to replace my shader system, i mainly use the dx effect framework because it seems like the best choice and very easy to use.
one of the features i use in techniques to apply different shaders, for example: when you generate shadow maps for a fench you want to reject pixels using alpha value sampled from a texture, and when you render opaque surfaces you don't, so you need 2 techniques, one for each.
features that the new shader system must to support are:
1. support any shader stages we have now and new ones that will be added in the future (if any), also it must support the technique feature so i won't need to change a lot of the code that already assume techniques.
2. keep track of constants, sampler states, textures etc, so it won't set the same constant again and again when it already set, or set texture when it is already binded.
3. ability to compile shaders for different api (for example: dx9 shader - shader model 3, and dx11 shaders - shader model 5, or opengl glsl shaders)
4. easy to use - setting shader data must be easy and understandable from the name of the functions.

supporting 1: done using file of each stage (.vs, .ps, .gs etc)
supporting 2: done using few variables that store the last state (dirty value vars)
supporting 3: done using a script that defines every technique and the stage shaders, this script supports the define macro so i could select which stage shaders and other defines to load and set depending on api macro, for example:
#if DX11
code block
#endif
the code block will be parsed only if the current renderer is dx11.
note that this script also solved the techniques issue because it contain the techniques that defines which shaders to bind when the technique is set.
when i create a shader from this script the shader is actually a multi shader as it contain few shaders and not one (each technique contain shader per stage), so this script is my multi shader script definition.
supporting 4: well, this is the easiest i think...

the next challenge was state management.
from dx10 all render states replaced with state objects: rasterizer, depth-stencil, blend and sampler state object. so you can't call SetRenderState anymore.
to solve this issue, we can do few things:
1. emulate dx10+ state objects using dx9 style render states (SetRenderState functions)
2. emulate dx9 render states using state objects.
1. in some cases you want option 1, in fact i implemented this at work where all the developers used to work with SetRenderState style.
to do this, you can use hash for the state object and create it on the fly when it's not in the hash.
2. this is what i did in my engine, 99% of the states is known in advance so no need to create state object on the fly, for the other 1%, when it happens, there must be a new object/mesh that was loaded so his material was loaded too, and all states objects was created (on the fly).
each state is tracked and managed so there the state will be state only once (when needed)

another thing that took a lot of time is to convert the dx9 shaders into both dx9/11 shaders using files each stage.
instead of duplicating the files and rewrite them using dx11 functions, i added render system define (DX9 for directx9, DX11 for directx11 etc) which is added automatically for each shader i compile and use it to know which function to use, for example:
#if DX9 == 1
col = tex2D(diffuseSampler, UV);
#else
col = diffuseTexture.Sample(diffuseSampler, UV);
#endif
when i need to recreate specific shader stage for DX11 only (for example: using tessellation), i write the shaders (add domain, hull shader files) and add their filenames the multi shader script (the same can be done for optimizing shaders to use the new dx11 functions, for example: blur and pcf can be optimized using gather function)

for now, the render system supports only dx9,dx11 render paths, opengl can be added by implementing the renderer functions.
note that i kept dx9 renderer path so i know every thing was implemented right in dx11 renderer, this is mainly to validate every function i wrote in dx11 renderer.
everything renderer with dx11 renderer must be the same (using only dx9 features) using dx9 renderer.

thats it for now, i hope will find this info useful, if you have any question on topics i didn't or forgot to cover, feels free to ask...
cya

Monday, May 10, 2010

OIT - Order Independent Transparency #2

hi
recently i'm very busy converting my engine into cross renderer supporting both dx 9,11 and maybe gl in the future, so until i will finish it and have some nice features to show here is a complex & large model (50k) using OIT from the last post.
note that this scene isn't complex compared to the simple 31 planes in terms of solving the OIT problem (because the fragments counts in the linked list is low => less pixel work).
for the ones who interesting in fps: 1024 x 768 with ati 5850
1. planes scene: avg 180
2. this scene: avg 230

N dot L

Output of OIT

Fragments Count

Wednesday, March 31, 2010

OIT - Order Independent Transparency

hi
recently i'm checking dx11 features and one of the nice thing he have is read/write/structured buffers. (unlike shader model 4 that have read only buffers)
read/write buffer (or RWBuffer) is just as it sounds, a buffer that we can use to read from or/and write into.
structured buffers (or SB) is a buffer that contains elements (c style structs), for example:
struct MyStruct
{
float4 color;
float4 normal;
};

then, we can use it like this:
StructuredBuffer mySB;
mySB[0].Normal = something
New Resource Types
RWSB have another nice feature, it have internal counter that we can increment/decrement its value.
those buffers can be used to create very nice effects and really fast (if used wisely).
this time i'm using it to solve some old problem in computer graphics:
order independent transparency (or OIT), if you don't know what i'm talking about then i suggest you should google on it...
in dxsdk you can find a sample called OIT, this sample uses compute shader to solve this problem, the sample show you how you should NOT use compute shader! and just show you that its possible using it... this sample runs so slow that you better do a software solution for it ;)
one thing you should remember is: if you write your pixel shader good enough, you probably get better performance than using compute shader (with some exception off course)

anyway, the question i want to answer is: how can we draw transparent surfaces without worrying about the order?
there are few solution to this but every one of them have some drawback:
1. depth peeling - very heavy, even if using dual version.
reverse depth peeling is more friendly as it peel layers from back to front so we can blend immediately with the backbuffer and only use one buffer for it.
2. stencil routed k buffer - say goodbye to your stencil and allow you up to 16 fragments to be blended, for some of you its enough for the some its not an option.
3. dx11 method, based on the idea presented on gdc2010 by ati.
i will focus on 3 as its what i implemented and it doesn't suffer from the issues the other methods is.
the idea can be simplify into 2 main steps:
1. build - in this step we build linked list for every fragment that contains all the fragments that "fall" on this fragment screen position, i use RWBuffer to store the linked list head for that pixel, and another 2 RWBuffers to store pixel attributes (color, depth and next "pointer" to the next fragment in the list), those attributes buffers need to be big enough to contain all the fragments that placed at the same fragment position.
in this step i'm also using the structured buffer internal counter to get unique "id" for each fragment and use it to write/read the fragments attributes.

2. resolve - this is where we do the magic, here i just traverse the linked list stored in step 1, and sort the fragments, then i manually blend the fragments.
note that you need to sort the fragments in place! that means your linked list should be accessed a lot or... you can simply fill a bunch of registered, and sort them instead.
tip: use the [earlydepthstencil] attribute to skip pixels that aren't needed to be processed.

here is a screenshots that show it in action using 32 registers (max of 32 fragments per screen fragment)

31 transparent surfaces

debug mode that shows the fragments count in each of the fragments linked list
black means 0 fragments, white means 32 fragments (num_fragments/max_fragments)

yes i know, this 32 surfaces can be easily sorted and rendered correctly without the GPU, but here i'm not sorting and not worrying about the order! i'm just rendered them as one batch, maybe next time i will show complex geometry... ;)

Monday, March 15, 2010

Real-Time Destruction/Fractures

hi
in the last two weekends i'v been working on real time destruction/fractures on 3d models, i always amazed by havok destruction demonstration and even more by red fraction guerrilla.
unlike havok physics/animation, havok destruction isn't free, but i really wanted something like that in my engine.
after few days of thinking and googling, i got few options on how to implement it:
1) get into some really nasty physics using fem model, build tetrahedrons that fit to the model and use it to know which part do i need to break.
2) voxelize the model and use the voxels to know which part do i need to break.
3) clip the model by a plane.
every option need to take care two things:
1) the render model which is the model that you render and contain valid tnb vectors, mapping, position etc.
2) the physics model which you use to do collision with.

the option i implement is 3, because it is very fast to implement, doesn't need to preprocess the mesh like option 1 and does not restrict you with same fracture size like option 2.
also it doesn't take extra memory to store data per model, everything is computed in real time, so if you are not breaking the model it is just like any other model in the scene.
if you break it, it will take the memory needed to store fractures data and that's it.
as you all know (or don't), nothing comes for free, more memory means less cpu work and less memory means more cpu work.
this is always the dilemma when implementing an algorithm, but in this case its not because you can't really preprocess the fractures because you don't really know where the player is going to shot the model.
some games generating fractures in advance and when you shot on the model, the closest fractures is detached from the parent model, in my opinion its doesn't do the work like the real thing and again, memory could be huge if you split the model into small fractures.
another thing to take into account is how many time do you have to implement it, if you don't really have a limit i think you can find some cool idea...
anyway, option 3 can be simplified in few steps:
1. determine the model that you hit, call it M (your physics engine should do the job)
2. split M by a plane into two parts, negative and positive, call them N and P
3. close the holes of N, P using triangulation algorithm (ear cutting or something similar), if you assume convex shapes you can simply build a fan to close those holes very fast.
4. apply uv mapping for the faces that closed N,P holes.
5. build physic shapes for N and P.
thats it, the catch is to do those steps fast!
here is a video to show what i'v done so far, note that in the video step 4 is disabled and you see the same color (all vertices map into 0,0) that's because i need to add option to give faces from step 3 different material so it will look good and not using #1 texture i used in the video

Wednesday, February 24, 2010

DX11 Hardware Tessellation

hi
recently i got a new job and i got new radeon 5850 which supports dx11 so i stated to check out those nastly new dx11 features, and let me tell you that: hw tessellation is COOOOL.
little info, dx11 tessellation works with 3 new stages added:
1.) the hull shader (aka HS) - get [1..32] input control points of the patch and responsible to output [1..32] new control points, it is also output constants which control how much to subdivide each patch.
2) the tessellator (fixed-function) - get the output of the hull shader as input and generate the new points, you have no control on this stage.
3) the domain shader (aka DS) - generates surface geometry from the transformed control points from a hull shader and the UV coordinates. the DS run once for each point generated in step 2, the input of DS is bary coords of the point on the patch so you could easily interpolate vectors other than position and uv.
HS supports 4 different partitioning methods, topology and few more things (check out dx sdk if you are interested on the tiny details)
so why do we need it if we already have geometry shader? well, geometry shader is programmable and generic in a way that you can do whatever you want in it, such has point sprites, fur, object silhouette and more but its not free and not optimized for things like tessellation. in dx11 you have dedicated unit for doing tessellation so the speed can't even compared with gs.
here is a video that shows dx11 tessellation with adaptive tessellation:

note that the geometry passed is simple 4 vertices quad.

Saturday, January 23, 2010

Atmospheric scattering

hi
this time i'm showing atmospheric scattering using oneil atmospheric scattering from gpu gems 2.
to implement it correctly i started with cpu based solution which computed all the nasty integrals on cpu and store it in a texture, this cpu solution used so i can debug and fix issues with the integral computations.
after everything works on cpu i convert it to gpu version and use it as default.
this texture is then mapped on a sky dome where camera eye is at origin.
clouds are procedurally generated using few noise textures, they also change their color based on sun direction.
for now i'm not taking into account sun lighting, shadowing this is simple solution so the sky wont look empty.
here is a short video to show it in action: