Welcome to part three of the DirectX native C++ tutorials. In this part we’re going to look at drawing a set of 3D triangles. These triangles will make up a cube and the camera will be positioned inside of it (like you’re standing in a square room). While the last tutorial technically had a 3D triangle – after all, we specified z coordinates for every vertex – the scene was not truly 3D as far as DirectX is concerned. Try changing the z coordinate for any of those vertices and you’ll see what I mean – the triangle still doesn’t look like it’s in perspective.
Requirements
The tutorial sample code was built and tested with Visual Studio Express 2008. However, using the code “as-is” in Visual Studio 2005 should work too.
You’ll need the DirectX 9 SDK for compiling the sample code. Use this link for downloading: click here.
Tutorial Source and Project Files
Download the project files, binaries, and source with this link:
Getting Started
DirectX needs three matrices in order to render a true 3D scene – a world transformation matrix, a camera view matrix, and a projection matrix. If you’re unfamiliar with mathematical matrices, read the next section for a brief introduction. Those already seasoned in this subject can just skip it.
Introduction to Matrices
Matrices are basically mathematical tables used in linear algebra. They look like a grid and contain numbers. An example is shown below:
This is known as a 4-by-4 matrix since it has four columns and four rows. It doesn’t necessarily have to be a square – an m-by-n matrix is quite common. However, the world, view, and projection matrices are all squares (4×4).
Just like regular numbers, matrices can be multiplied together or added to one another. A matrix like the one above is known as an identity matrix because of this special property: some matrix (let’s call it “M”) multiplied by an identity results in the same M matrix.
For this tutorial I’m going to show you how to multiply a m-by-m matrix by a m-by-1 matrix. If you’d like to know about the other rules for matrix addition and multiplication, search for “matrix mathematics” on Wikipedia. For this tutorial, however, it would only be helpful to know the procedure behind multiplying a square matrix by a one column matrix. Here’s how it would be done given a 2-by-2 and 2-by-1 matrix:
A 3-by-3 matrix multiplied by a 3-by-1 matrix would look like this:
For the sake of discussion, let’s call the bigger matrix “A” and the one column matrix “B.” The values in each row of A are multiplied by the values in B and the results are summed making a matrix with the same dimensions as B.
So why is this useful? Well, for one, we can represent a lot of common formulas in a table format as opposed to writing them out long-hand. For example, the regular way of expressing the 2D point rotation formula is this:
Now if we put the sine and cosine coefficients into a 2-by-2 matrix and put the x and y coordinates into a 2-by-1 matrix, we can represent the above formula like this instead:
Try multiplying these two matrices as described above and you’ll find that it evaluates to exactly the same as the long-hand version. In other words, we can rotate any 2D point just by multiplying it with this 2-by-2 matrix! The same holds true for 3D points – we multiply them against a 3-by-3 matrix to get the newly rotated point (however, the 3-by-3 rotation matrix is much more complicated than its 2D counterpart as you’ll see shortly).
The reason that matrices are used in many 3D graphics applications, and what makes them unique, is the fact that fairly complex mathematical calculations can be simplified with just one or two matrix computations. For example, if you wanted to rotate a line segment around the z-axis in 3D space, you could do it via a brute-force method – apply the same long-hand trigonometric rotation equations to each point:
Note that the z component stays the same. To take the resulting vector and then rotate it around the y-axis, you’d use the same equations but apply them to the x and z components of each point:
Note that this time the y component stays the same and that x’ (the result of rotation #1) is used to calculate the new position instead of the original x. To do a third rotation, around the x-axis, the same equations would apply except Zrot and Yrot would be used as inputs (Xrot would stay the same).
Now here’s the drawback to doing it brute-force versus using a matrix: we’re always rotating around one of the three coordinate system axes. What if we want to rotate a point around some arbitrary vector instead of the x, y, or z axis? Using the long-hand equations to do that gets extremely complicated. However, there is one kind of matrix you can construct that, if multiplied by a 3-by-1 matrix representing a point, will result in a new 3-by-1 matrix containing values for the rotated point!
In case you’re wondering, this special type of matrix is called an angle/axis matrix and it looks like this:
It may look scary at first, but you really only need to plug in two things: the components for vector V (which is the vector you want to rotate the point around) and Θ (the angle you want the point rotated). Note, however, that vector V must be normalized (it’s length must equal 1) for this matrix to work.
There’s another special property to this matrix – after plugging in your angle/vector combination and calculating the number values for each element, those numbers actually represent vectors themselves – each column corresponds to a vector (so three total). These three vectors have one special property – they’re always perpendicular to each other. That’s right, no matter what angle or V vector you plug in, you’ll always end of with three vectors that are aligned just like the x, y, z coordinate system axes. But there’s more! If you were to rotate the coordinate system axes around vector V by your angle, the new coordinate system axis vectors would match the vectors contained in each column of this matrix! Can you see now what this matrix is really doing?
It’s called an angle/axis matrix because you’re actually calculating how the coordinate system axes would be oriented if you rotated them by a given angle around a certain vector. Considering these properties, you could rewrite the above matrix like this:
The X vector, column one, holds the new x-axis after rotation. Column two, vector Y, holds the new y-axis after rotation. And as you can probably guess, column three, vector Z, holds the new z-axis after rotation.
Rotation is not the only type of operation that can be done with matrices – you can also translate a point (move it to a new position). A typical translation matrix looks like this:
The Tx, Ty, and Tz components in the fourth column determine how much to move along the x, y, and z axes respectively. Since this is a 4-by-4 matrix, you need to change the dimensions of the point matrix into 4-by-1. You do this by just inserting 1 into the fourth row. In other words, the entire operation would look like this:
Now you’re about to see the second beauty of matrices and why they’re really used in 3D graphics: if you want to rotate and translate a point, you can actually combine the rotation matrix with the translation matrix! The new matrix becomes:
With just this one matrix you can specify not only how much to rotate a point, but also how to translate that point as well! A matrix which combines rotation and translation is called a “transformation” matrix. As you can probably guess, this type of matrix moves a 3D world around the player.
Now that you’ve had an introduction to matrices and how a transformation matrix works, we can move on to the rest of the tutorial.
About the Code
As usual we’re going to build off of the example program from the last tutorial. Details about creating a window, setting up DirectX, and drawing a 2D triangle will not be covered here. If you need to know about those things, then I suggest reading the previous tutorial before moving on. Also, the example program for this segment uses DirectInput to capture keystrokes from the user. I won’t be covering details on DirectInput, but those parts of the code are clearly marked.
Also, unlike some of the previous posts, I won’t include a full listing – the code has simply grown too large. However, I will go over all the pertinent parts so you can learn the most from this tutorial. Anything that was changed or modified will be shown.
Going from 2D to 3D
The code in the last tutorial drew a simple 2D triangle on the screen. Even though we specified a z-coordinate for each vertex of that triangle, it was still not really 3D. Changing the z-coordinate for any of the vertices still didn’t draw the triangle in perspective. Why is this?
The vertex type used was D3DFVF_XYZRHW – the type used for pre-transformed vertices. DirectX expects these vertex types to have already gone through a 3D transformation pipeline. Once through the pipeline, z-coordinates become irrelevant – only screen coordinates (x and y) are used.
In order to render a true 3D scene, we need to add and/or change two things:
-
Feed the following matrices to DirectX so it knows how to transform the vertices from world space to the screen: world transformation matrix, view matrix, and projection matrix.
- Change the vertex types to D3DFVF_XYZ – these vertices are untransformed and relative to world space.
Adding the Transformation Matrices
Before getting into the code, let’s briefly go over the purpose of each matrix. As the last section mentioned, DirectX needs the following three matrices:
- The world transformation matrix.
- The view transformation matrix.
- The projection transformation matrix.
The world transformation matrix tells DirectX how to rotate, translate, and possibly scale 3D model coordinates into world space. In a typical 3D video game, polygonal models are the objects which tend to be reused in different parts of the world (weapons, bonus items, enemy players, monsters, etc.). Their vertices are defined relative to their own local coordinate system. The world transformation matrix converts these local coordinates into absolute positions in the 3D world (hence the name).
The view matrix (sometimes called the camera matrix, or camera-view matrix) tells DirectX how to transform world coordinates into camera coordinates (basically, where are you and what are you looking at). The world coordinates become relative to the camera axes after this matrix is applied.
If you have some experience in 3D graphics programming, don’t confuse the world matrix with the view matrix. Many tutorials and books about graphics sometimes refer to the world matrix and view matrix as one of the same. This is due to a certain optimization that can be done where you combine the world matrix and the view matrix into one master transformation matrix resulting in just one matrix update per frame instead of two.
The projection matrix tells DirectX about your 2D viewport into the 3D world. It holds the following information about the screen and camera: field of view, aspect ratio of the screen, how far the camera can see (far clipping plane), and how near the camera can see (near clipping plane).
The process of creating a world, view, and projection matrix isn’t difficult – if you use the Direct3D Extensions Utility library. Among other things, this library contains a few useful functions that return fully populated matrices given certain parameters. For example, provide an angle for the D3DXMatrixRotationY() function, and it will return a world transformation matrix that does rotation around the y-axis. If you ever need to calculate these matrices yourself, without the library’s help, you can refer to DirectX’s SDK documentation – it contains the layout and formulas for each matrix.
The order in which you feed these matrices to DirectX is irrelevant – although it internally applies them to the scene in the same order (world -> view -> projection). Since the order doesn’t matter, we set the projection matrix at initialization time and then forget about it. This matrix would only need to change if the screen’s aspect ratio, field of view, or clipping planes were altered.
We add the following code in order to create and set the projection matrix:
D3DXMATRIXA16 ProjectionMatrix;
D3DXMatrixPerspectiveFovLH(&ProjectionMatrix, PI/4, 1.0f, 1.0f, 500.0f);
g_pDirect3D_Device->SetTransform(D3DTS_PROJECTION, &ProjectionMatrix);
The call to D3DXMatrixPerspectiveFovLH() creates a projection matrix given the following values:
- Field of view (in radians).
- Aspect ratio.
- Z-value of the near clip plane.
- Z-value of the far clip plane.
These values go in parameters two through five, respectively. The first parameter holds a pointer to the matrix object which will receive the result.
The last line calls SetTransform(). This function is used to feed DirectX all the different types of matrices. The first parameter distinguishes which matrix type you want to set. D3DTS_PROJECTION indicates a projection matrix is contained in the second parameter.
Next we create the world transformation matrix. This is also set at initialization time. Why? Our example program has no 3D polygonal models and therefore doesn’t need to use it. As such, we simply send DirectX an identity matrix so it doesn’t affect any of the math in the 3D pipeline. Here’s what that code looks like:
D3DXMATRIXA16 WorldTransformMatrix;
D3DXMatrixIdentity(&WorldTransformMatrix);
g_pDirect3D_Device->SetTransform(D3DTS_WORLD, &WorldTransformMatrix);
We initialize the world transform matrix to an identity with the D3DXMatrixIdentity() function and then call SetTransform() just as we did with the projection matrix. The first parameter, D3DTS_WORLD, tells DirectX to use this matrix as the world transform. One thing to note: instead of calling D3DXMatrixIdentity(), we could have easily set the matrix manually through the object’s constructor:
D3DXMATRIXA16 WorldTransformMatrix(1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1);
I used the function call instead for clarity, but both methods are equivalent.
Now for the view/camera matrix. This one we must set on every frame since the direction of the camera can change at any time. If you take a look at the code, you’ll notice a function named CalcMatrices() – this is where the matrix is being populated. That code looks like:
void CalcMatrices(void)
{
D3DXMATRIXA16 ViewMatrix;
// set the view matrix
D3DXVECTOR3 EyePoint(g_Camera.Location.x,
g_Camera.Location.y,
g_Camera.Location.z);
D3DXVECTOR3 LookAt(g_Camera.Location.x+cos(g_Camera.Rotation),
g_Camera.Location.y,
g_Camera.Location.z+sin(g_Camera.Rotation));
D3DXVECTOR3 UpVector(0.0f, 1.0f, 0.0f);
D3DXMatrixLookAtLH(&ViewMatrix, &EyePoint, &LookAt, &UpVector);
g_pDirect3D_Device->SetTransform(D3DTS_VIEW, &ViewMatrix);
}
In this function we must give DirectX two points and something called an “up” vector. The first point is the camera’s position – its location in 3D space. The second is any point along the camera’s direct line-of-sight – in other words, any point that, if you were to look through the camera, would be centered in the camera’s field of view. DirectX simply uses these two points to calculate the camera’s view vector – what direction the camera is pointing. Since the look-at point can be any point along the camera’s line-of-sight, I just use the sine and cosine functions to calculate some point directly in front of the camera. The y-coordinate doesn’t change because, in this example program, I’ve tried to keep things simple and not allowed the camera to look up or down (only side-to-side). The “up” vector defines which direction points directly up from the camera’s point of view.
You may be wondering why the up-vector is needed if we already have two points describing the camera’s direction. Here’s why: suppose that the camera is looking straight ahead, directly down the z-axis. Now turn the camera up-side down (in other words, rotate it 180 degrees around the z-axis). Did the camera’s viewing direction change? Nope. What if the camera was turned side-ways (or rotated 90 degrees around the z-axis)? Even then, the viewing direction doesn’t change (it’s still looking down the z-axis). So using just two points gives enough information to know where the camera is pointing, but it doesn’t describe the “roll” of the camera relative to itself. In this tutorial I haven’t allowed the camera to roll over, so the up-vector stays at (0, 1, 0) – in other words, the camera can’t look up or down.
Once we’ve created the camera point, look-at point, and up-vector, we pass all of them to D3DXMatrixLookAtLH() – a function that calculates a view/camera matrix and puts it into the first parameter, ViewMatrix.
Finally we call SetTransform() to feed DirectX our newly calculated matrix. The first parameter, D3DTS_VIEW, tells DirectX to use this matrix as the view/camera matrix. The second parameter is a pointer to the matrix itself.
The Rendering Loop
Now that all the matrices have been set, we’re ready to tackle the main rendering loop. Just as in the last tutorial, we begin by setting up the vertex format structure:
struct D3DVERTEX {float x, y, z; DWORD color;} vertices[NUM_VERTICES];
This time we don’t need the “rhw” component. We’re feeding DirectX non-transformed vertices so therefore a “w” component doesn’t apply. DirectX just needs the (x, y, z) components and the color of each vertex.
We then fill in the “vertices” array. Here’s an example of the first point:
vertices[0].x = -64.0f*3;
vertices[0].y = -64.0f;
vertices[0].z = 0;
vertices[0].color = FRONT_WALL_COLOR;
<…more vertices here…>
Once we’ve filled in all the vertices, we must feed them to DirectX. The code which feeds DirectX is exactly the same as before (see last tutorial) with one exception – the vertex format is different.
LPDIRECT3DVERTEXBUFFER9 pVertexObject = NULL;
void *pVertexBuffer = NULL;
if(FAILED(
g_pDirect3D_Device->CreateVertexBuffer(NUM_VERTICES*sizeof(D3DVERTEX), 0,
D3DFVF_XYZ|D3DFVF_DIFFUSE, D3DPOOL_DEFAULT, &pVertexObject, NULL)))
return;
if(FAILED(pVertexObject->Lock(0, NUM_VERTICES*sizeof(D3DVERTEX), &pVertexBuffer, 0)))
return;
memcpy(pVertexBuffer, vertices, NUM_VERTICES*sizeof(D3DVERTEX));
pVertexObject->Unlock();
The vertex type, D3DFVF_XYZRHW, gets replaced with D3DFVF_XYZ because we’re feeding in (x, y, z) components instead of (x, y, z, 1/w). I’m not going to go into the inner-workings on this code because it was already covered in the last tutorial.
Now we’re ready to render the scene. Again, this code looks much like the last tutorial except the vertex format has changed, plus, we’re now calculating the transformation matrices before rendering.
g_pDirect3D_Device->Clear(0, NULL, D3DCLEAR_TARGET|D3DCLEAR_ZBUFFER, D3DCOLOR_XRGB(0, 0, 0), 1.0f, 0); // clear frame
if(SUCCEEDED(g_pDirect3D_Device->BeginScene()))
{
CalcMatrices();
g_pDirect3D_Device->SetStreamSource(0, pVertexObject, 0,
sizeof(D3DVERTEX));
g_pDirect3D_Device->SetFVF(D3DFVF_XYZ|D3DFVF_DIFFUSE);
g_pDirect3D_Device->DrawPrimitive(D3DPT_TRIANGLELIST, 0, NUM_VERTICES/3);
g_pDirect3D_Device->EndScene();
}
g_pDirect3D_Device->Present(NULL, NULL, NULL, NULL);
pVertexObject->Release();
This time around, when we call SetFVF(), we supply the (x, y, z) vertex format instead of (x, y, z, 1/w) using the D3DFVF_XYZ constant as opposed to D3DFVF_XYZRHW. Also, we call CalcMatrices() before drawing any primitives so DirectX knows which transformations to apply to the scene. The rest of the code behaves exactly the same as the last tutorial, so I’m not going to cover it again.
Handling Keyboard Input
The tutorial program uses DirectX/DirectInput to capture keyboard actions, but I’m not going to cover how DirectInput works here. Instead, I’m going to show how the program reacts to different key presses.
The function HandleKeys() is called on every frame and is responsible for updating the global camera position tracking variables depending on the keyboard state.
void HandleKeys(void)
{
float RotationStep = PI/175.0f;
float WalkStep = 3.0f;
//——————————————-
// adjust the camera position and orientation
//——————————————-
if(dx_keyboard_state[DIK_UP]&0×80) // moving forward
{
g_Camera.Location.x += cos(g_Camera.Rotation)*WalkStep;
g_Camera.Location.z += sin(g_Camera.Rotation)*WalkStep;
}
if(dx_keyboard_state[DIK_DOWN]&0×80) // moving backward
{
g_Camera.Location.x -= cos(g_Camera.Rotation)*WalkStep;
g_Camera.Location.z -= sin(g_Camera.Rotation)*WalkStep;
}
if(dx_keyboard_state[DIK_LEFT]&0×80) // look left
{
g_Camera.Rotation += RotationStep;
if(g_Camera.Rotation > PI*2)
g_Camera.Rotation = g_Camera.Rotation-PI*2;
}
if(dx_keyboard_state[DIK_RIGHT]&0×80) // look right
{
g_Camera.Rotation -= RotationStep;
if(g_Camera.Rotation < 0)
g_Camera.Rotation = PI*2+g_Camera.Rotation;
}
if(dx_keyboard_state[DIK_W]&0×80) // strafe left
{
float SideStepAngle = g_Camera.Rotation+(PI/2.0f);
if(SideStepAngle > PI*2) // handle wrap-around
SideStepAngle = SideStepAngle-PI*2;
g_Camera.Location.x += cos(SideStepAngle)*WalkStep;
g_Camera.Location.z += sin(SideStepAngle)*WalkStep;
}
if(dx_keyboard_state[DIK_E]&0×80) // strafe right
{
float SideStepAngle = g_Camera.Rotation-(PI/2.0f);
if(SideStepAngle < 0) // handle wrap-around
SideStepAngle = PI*2+SideStepAngle;
g_Camera.Location.x += cos(SideStepAngle)*WalkStep;
g_Camera.Location.z += sin(SideStepAngle)*WalkStep;
}
}
Walking forward and backward is a simple matter of updating the camera’s (x, z) position. We don’t update the y-coordinate because there’s no way to “jump” or float in the air for this example. If we were using a polar coordinate system (where point positions are based on angle and ray length), moving the camera forward or backward would be easy – just increase or decrease the ray length. But since we’re in the rectangular coordinate system (where point positions are determined by x, y, and z), we must convert this increase or decrease of ray length into the (x, z) equivalent. We do this with the sine/cosine functions and then add the result to the camera’s last position in order to get the new position (or subtract from the camera’s last position if we’re moving backwards). I’m not going to get into the basics of simple trigonometry, but if you want a detailed explanation of how these trajectory formulas work, email me through the Contact page.
Strafing, or side-stepping, is done just like moving forward or backwards except the angle used in the calculation is 90 degrees plus or minus the camera’s real angle. If you’re moving left the angle is plus 90, and if you’re moving backwards, the angle is minus 90.
For looking left or right we just add or subtract from the camera’s current angle to get the new angle. However, we must check to make sure the angle hasn’t “overflowed” or gone below zero. After all, a circle has a maximum of 360 degrees – so rotating by 370 degrees is really the same as rotating by just 10 degrees. Same goes for the other side – rotating by negative 10 gets you to the same point as +350.
All this updating of the camera’s global position/orientation object is eventually used by the CalcMatrices() function in order to create a new view matrix on every frame. In other words, DirectX always has the most current camera position and renders the room based on the camera position controlled by the keyboard.
Screenshot
Here’s what the output looks like:
Conclusion
Wow, this post ended up being _way_ longer than expected! Anyway, if you have any questions about any of the material covered, please post a comment or send me an email through my Contact page.
Thanks for reading!
-Greg Dolley