GPU 2014 — Homework #1

GPU Programming for Video Games

Summer 2014

Homework #1: “Roll Your Own” 3-D Rendering

Due: Thursday, June 5, 23:59:59 (via T-square)

Late policy: The homework will be graded out of 100 points. We will
accept late submissions up to Friday, June 6 at 23:59:59; however,
for every day that is it is overdue,
we will subtract 30 points from the total.
We understand that sometimes multiple assignments hit at once, or other
life events intervene, and hence you have to make some tough choices. We’d
rather let you turn something in
late, with some points off, than have a “no late assignments
accepted at all”
policy, since the former encourages you to still do the assignment
and learn something from it, while the latter just grinds
down your soul. The
somewhat late penalty is not
intended to be harsh – it’s intended to
encourage you to get things in relatively on time (or just punt if you have
to and not leave it hanging over you all
semester) so that you can move on to
assignments for your other classes.

Read these instructions completely and carefully before beginning your
work.

Using a high-level scripting language of your choice,
write a program that
implements the
geometry transformations and lighting calculations discussed
in Sessions 3 through 5
to render
an image of a scene consisting of a single 3-D object.
For this assignment, you shouldn’t
worry too much about “modularity,” “reuse,” “extensibility,” “good taste,”
etc.,
and you shouldn’t worry at all about speed.
This is a “quick and dirty”
assignment that is primarily intended to
make you review the 3-D graphics material we
covered and make sure
you understand it. 3-D APIs like Direct3D, OpenGL, and XNA,
and game engines like Unity and Unreal,
handle most of this “behind the
scenes,” but we want
to make sure you understand what is going on behind the scenes. Also, you
wind up coding much
of this “behind the scenes” work explicitly when you write vertex shaders
in languages such as
HLSL/Cg; hence, there is value in first testing your understanding of
these basic computer
graphics concepts using
a simple language like MATLAB or Python
before we add the additional complexities of
shader languages on
top of it.

Your lighting model should include ambient and emissive components, as well
as diffuse and specular components arising from a single non-directional point
light source. You do not need to apply any decay-with-distance type
of effects or spotlight effects as described on pp. 16-17
of the Session 5 lecture slides.

At the top of your program, you should set variables that determine:

The world-space XYZ position and RGB color of the light source. (You
will use this for both specular and diffuse lighting effects. An artist
might want to use different RGB calues for the specular and diffuse effects
to get some special effect, but here we’ll go with what’s physically possible.)
The RGB color of the ambient light.
The world-space XYZ position
of the camera and the XYZ point the camera is looking
at.
The world-space position and orientation of the object. There are numerous
ways to represent object orientation; we will represent it as rotations
around the
x, y, and z axis (in that order), with the amount of rotation expressed
in degrees. Remember to do the rotations first, then the translation; these
operations can all be combined into a signal matrix through matrix
multiplication. (FYI, other common orientation representations
include pitch, roll, and yaw, and orientation around a specified axis,
and the closely related idea of quaternions.) I don’t care if your
rotations follow a left-handed or right-handed rule; use whatever you like.
The “field of view” and the “near” and “far” distances
of the perspective projection viewing frustum.
You may assume an aspect ratio of one.

When we run your code, we should be able to change the
variables at the top
to render
different scenes. The variables should be given easily understandable names.

The first time we ran this course,
the students were required
to find
their own 3-D model and figure out how to read it in. This turned out to be
pretty challenging. This year, we are going to let you have benefit of
using some of the models they converted to a “raw triangle” format:
shuttle
and cessna.
Pick one that you like.
(Students in previous years have reported that there are about 3 or 4
large triangles in the back that have a winding order inconsistent with
the rest of the model, so they
erroneously disappeared when they implemented culling.
They’re by the tail on the top of the plane. I’m not asking you to implment
backface culling in this year’s version of the assignment, but I thought
I should mention that in case you get adventurous.)
To give credit where it is
due, I have added the names of the students who converted the models to
raw triangle format in the
filename.
The files consists of rows of
9 numbers, which are just the x,y,z coordinates of the three vertices of
the triangles.
You may use one of these model for your assignment, or
if you are feeling ambitious, you may find and use a model not given here
if you can figure out how to read it in.
(This won’t be worth more points, but if you’re a Halo fan, for instance,
and find a model of the Master Chief – go for it! It could be fun.)

We will generally use the Direct3D/XNA convention as representing spatial
coordinates as row vectors (vs. OpenGL, which uses column vectors).

Your program will
need to transform each of the vertices of the model
by first applying
the “world” transformation to get it at the appropriate position and
orientation in world coordinates, then applying the “view” transformation to
get it into eyespace coordinates, and then applying the “projection”
transformation to get it into normalized coordinates. Your program
will then divide the
x,y, and z coordinates by the w coordinate to implement the perspective
effect.

In this assignment, you can pre-multiply the view and
projection matrices if you want to save computation time.
(You can’t premultiply the
world transformation matrix too, since you’ll need that intermediate
result to do the lighting calculations, which we will do in the world
space for this assignment.)

Note that since you will be representing coordinates with row vectors, you
could store all the vertex coordinates for the object in a single
array with number-of-vertex rows and four columns. Then you can multiply that
big matrix a 4×4 geometry transformation matrix to transform all of the
vertices at once.

You may choose to use a left-handed or right-handed coordinate system;
please
describe your choice in a comment at the top of your program.
You should use the View transformation matrices
given in D3DXMatrixLookAtRH or
D3DXMatrixLookAtLH (use (0,1,0) for the “Up” vector), and the
perspective transformation matrices given in D3DXMatrixPerspectiveFovLH or
D3DXMatrixPerspectiveFovRH. Note that we’re just borrowing the equations from the
Microsoft documentation; you should write the code to create these various
matrices yourself.

In the interest of simplicity, you
should feel free to use the same emissive color for all the facets, the
same diffuse material color for all the facets, and the same
specular material
color for
all the facets, etc. – if you do this, you should set these variables
(emissive color RGB, ambient material RGB,
diffuse material RGB, and specular material RGB)
at the beginning of your program.
If you feel like doing something more sophisticated, where
different facets have different properties,
you are welcome to do so, but it is not required for full credit.

For this assignment, use a “flat shading” lighting
model. For your lighting calculations, have your program
compute its
own normal for each flat-faced triangle based on the vertex
information for that
triangle (instead of using artist-supplied normals for each vertex, as
described in class). For issues such as computing the eye and light vector
needed for diffuse and specular light calculations, use the center point of
the facet (the average position of the three vertices). In general,
lighting calculations
can be done in whatever coordinate space you want (object, world, or view/eye),
as long as you are consistent. Here, we will do lighting calculations in
world coordinates, i.e.
do the lighting calculations after you’ve transformed
the object to world coordinates, but before you’ve transformed them to
view coordinates. (Many 3-D engines actually do the lighting in view space,
so they can multiply world and view transformation matrices to gain some
efficiency. But that involves transforming the lighting positions and
pointing
vectors as well, as I don’t want to get into transforming direction vectors
at this point.)

Once you get things
into “normalized coordinates,”
you only need to worry
about “clipping in z,” i.e. have your program delete all
facets whose z-values all fall outside the viewing frustum in
the z-dimension. (If only some of the vertices
fall outside the z-dimension, go ahead and
render it.) If you clip in z after applying the
projective transformation matrix,
this is relatively easy since the z values get mapped to a range from 0 to 1.
If you don’t premultiply the view and the projection matrices, you can
clip in view space by comparing z to your specified near and far planes.
We’ll let the scripting
language’s native triangle drawing features worry
about clipping in x and y.

Instead of using a z-buffer to handle the fact that some facets will
obscure other
facets,
use “z-sorting,” which is also called the painter’s algorithm.
Z-sorting was popular when memory was
expensive; for instance,
the Playstation 1
uses z-sorting. Real-time
implementations typically use some sophisticated data structures to
do the sorting; here, you can
just use the “sort” command built into whatever scripting language
you use. After you’ve done the perspective division operation,
compute the average of the z-values of the vertices of each triangle,
the facets in order of
these z-value averages. Then, render the facets in order of farthest
to closest.

<!–
At an appropriate point in your processing chain, you should perform
“backface culling” and
remove those facets that are facing away from the camera.
(Be careful to
make sure the model you are using is following the conventions you
are expecting it to; if you you use
backface culling and see the back of the object instead
of the front, you’ll know to swap conventions.)
There are many different choices of when and where to cull, and each
possible choice leads to several sutle issues. In previous years, we
let the choice of culling technique be fairly open ended with limited guidance,
but this generated a great deal of confusion. For this assignment,
may choose to cull in one of two places (please make a note in a comment
at the top of your code to make sure we know which technique you are using):

Technique 1: After the
world transformation, but before the view transformation,
using
the dot product test described on Slides 33 through 39.
For efficiency, if you
cull using this approach,
do it before you do any of the lighting calculations. In this
assignment, since we are using a flat-shading lighting model,
you can re-use the normal you computed to do the culling while
doing the lighting. This choice is closer to what a traditional “software”
renderer, like you might find described in an older computer graphics textbook,
would do. (I say “closer” since a real 3-D engine will typically transform
the light back to the original object coordinate system
and do the culling in object
space. However, that would require talking about how to transform normals,
and I figured this assignment was already sufficiently complicated.)

Technique 2: After the perspective division, but before z-sorting
(for efficiency – we’ll have less things to sort).
To do this, you can use a cross product
to compute new normals for the trianges (note these are different than
the ones you previously computed to do the lighting). Using this approach,
you don’t need to compute any dot products; you can just check the sign
of the z-value
of the normal to see which way the facet is facing. (See Slide 7 of
this slide set for an illustration of why this works.)
Notice that since you’re only looking at the z-value of the cross product,
you don’t actually need to compute the “full” cross product (i.e.
you don’t care about the x-value and the y-value of the cross product.)
This choice is closer to what modern
GPUs do internally. (I say “closer” since what’s typically done is a
more brute-force calculation of the “winding” of the triangles using a series
of comparisons. The approach
of computing the normal is nice for this assignment since it matches
techniques we’ve already covered.)
–>
<!–Clarification: It seems that a lot of models
out there are not consistent in following either a right hand or left
hand rule. We want to see the line(s) in your code that perform(s) this
culling
operation, but if you see that half your facets randomly disappear when
you turn this on because the modeler was sloppy, feel free to comment it
out. –>
<!–

Again, don’t worry about efficiency when doing the culling and sorting.
It doesn’t matter at this stage if your program runs more slowly with
culling than without it. All we care about is that you understand the
core operations.
–>

Choice of implementation language:
You should choose a
scripting language that has built-in matrix and vector operations
(preferably with built-in dot product and cross product operations), as well
as a mechanism to draw
filled 2-D triangles on the
screen – we will let the language handle the
rasterization process for you.
The language you choose may have built
in 3-D graphics features, but you should not use them for this
assignment!!!

We recommend using MATLAB; it has all the operations you need
“out of the box,” including
dot and cross products; you can compute many dot and cross products at
once with a single
line of code. It should be available on
most campus lab machines, such as the library and CoC and
ECE computing labs. (You also may be able to get some use out of
octave or
FreeMat.)
MATLAB’s vectorization features let you write compact,
expressive code.
MATLAB is now used in the intro CS class for
engineers, and is also extensively used
throughout the ECE curriculum, particularly in ECE2026.
CS and CM students will have been less likely to be exposed to it;
however, an advanced CS or CM undergraduate, who has
had exposure to many different kinds of programming
languages, will have little difficulty picking it up.
In any case, if you are CS or CM major, you will find
MATLAB to be a worthy weapon to add to your arsenal,
as it lets you try out a variety of numerical
algorithms with a minimal amount of fuss. Here
is an example session at a MATLAB prompt that illustrates
various features. ECE students will find this familiar; CS and CM students
should be able to quickly
get a “feel” for the language.

>> % MATLAB comments start with a % sign
>> % type 'help command' into MATLAB to get help on a particular command
>> % 'ones(rows,columns)' generates a rows-by-columns matrix of 1s
>> % * by itself is matrix multiplication, but .* will do elementwise multiplication
>> % a semicolon at the end of a command suppresses output
>> a = ones(3,1) * (9:-2:1)
a =
     9     7     5     3     1
     9     7     5     3     1
     9     7     5     3     1
>> 	b = (11:-2:7)' * ones(1,5)
b =
    11    11    11    11    11
     9     9     9     9     9
     7     7     7     7     7
>> c = a + b
c =
    20    18    16    14    12
    18    16    14    12    10
    16    14    12    10     8
>> d = a * b
??? Error using ==> mtimes
Inner matrix dimensions must agree.
>> d = a .* b
d =
    99    77    55    33    11
    81    63    45    27     9
    63    49    35    21     7	
>> % compute columnwise cross product
>> cross(a,b)
ans = 
-18   -14   -10    -6    -2
 36    28    20    12     4
-18   -14   -10    -6    -2
>> % compute columnwise dot product
>> dot(a,b)
ans =
   243   189   135    81    27
>> 1 / (c + 3)
??? Error using ==> mrdivide
Matrix dimensions must agree.
>> 1 ./ (c + 3)
ans =
    0.0435    0.0476    0.0526    0.0588    0.0667
    0.0476    0.0526    0.0588    0.0667    0.0769
    0.0526    0.0588    0.0667    0.0769    0.0909
>> dude = [1 2 3; 5 6 7; 11 12 29]
dude =
     1     2     3
     5     6     7
    11    12    29
>> inv(dude)
ans =
	   -1.4062    0.3437    0.0625
	    1.0625    0.0625   -0.1250
	    0.0937   -0.1562    0.0625
>> dude(:,2) = [99 100 101]'
dude =
     1    99     3
     5   100     7
    11   101    29
>> dude(1:2,:)
ans =
     1    99     3
     5   100     7
>> % most importantly for this assignment, MATLAB will also draw triangles for you!
>> the image below was created via these commands:
>> axis([-10 10 -10 10])
>> axis square
>> % the first argument to patch consists of x coordinates, the second consists of y
>> coordinates, and the third consists of an RGB triple
>> patch([3 4 6],[-4 -3 -6],[1 0 0])
>> patch([1 5 9],[10 13 14],[0 1 0])
>> patch([-3 -6 -9],[1 2 5],[0 0 1])
>> patch([-1 -3 -5],[-4 -6 -7],[0.25 0.5 0.3])

There’s two versions of the “patch” command in MATLAB. One is for
drawing 3-D triangles using MATLABs 3-D graphics capabilities. This isn’t
what you want here. You want to use the “patch” that draws 2-D triangles, since
the point of the assignment is to understand how 3-D objects get turned
into 2-D graphics presented on a 2-D screen.

Here are some MATLAB tutorials
(I nicked these links from our old ECE2025 recommendations):

You can tell MATLAB to not draw edges on the patches via
set(0,’DefaultPatchEdgeColor’,’none’) – thanks to Michael Cook (a student
from a previous year) for the tip.

If you don’t want to use MATLAB, you might try Scilab, R, or perhaps
something like Python or Ruby
with one of their numeric/scientific/graphical extensions; Mathematica
or Maple might also be useable. You can even use Scheme or Lisp, if you
can find one that will draw triangles.
(If you really insist,
you can use a compiled language like
Java, Processing, C#, or C++,
if you can find an appropriate matrix-manipulation and 2-D graphics library and
are
willing to lose the
interactivity of use of an interpreted language. However, you probably
will find
that the assignment
will take much longer than
necessary if you take that route. That said, I have
seen some students produce some reasonably compact solutions to this
assignment using Processing; it provides a minimum-fuss way of getting the
needed graphics functionality out of Java.)

The main reason we are asking you to use a flat shading model instead
of Gourard shading is
that MATLAB, as far as we can tell, will only do Gourard shading
in a “colormap” sort of mode
instead of a full RGB sort of mode.

Homogeneous coordinates in computer graphics are usually represented
as row vectors,
with operations conducted by doing row * matrix
type operations. However, some of the “vectorized”
commands in MATLAB, such as cross and dot,
work better with coordinates stores along the columns; hence, you may find
it useful
to use some transposition operations (indicated using a single quote) to flip
between row and column representations as needed. Your mileage may vary.

Philosophy:
The instructions to this assignment are
deliberately a little bit vague – you should feel free to experiment a
bit and come
up with your own choices of parameters and implementation techniques.
For instance, how exactly
should you parameterize orientations? It’s up to you!
Here, you’re not
stuck with whatever choices an API designer made.

Deliverables:
Package everything needed to run your script (3D data file, program, etc.),
as well as three
example scenes (in any common
image format you’d like) created with your program with different
parameters to demonstrate its capabillity, and upload them
to T-square as a zip file or gzipped tar file.
Include “HW1” and as much as possible of your full name
in the filename, e.g., HW1_Aaron_Lanterman.zip.
(The upload procedure should
be reasonably self explanatory once you log in to T-square.)
Be sure to finish
sufficiently in advance of the deadline that you will be able to work around
any troubles T-square gives you to successfully submit before the deadline.
If you have trouble getting T-square to work, please e-mail your
compressed file to lanterma@ece.gatech.edu, with “GPU HW #1” and your
full name in the header line; please only use this e-mail submission as a
last resort if T-square isn’t working.

The midnight due date is intended to discourage people from pulling
all-nighters, which are not healthy.

Ground rules: You are welcome to discuss high-level implementation
issues with your fellow students, but you should avoid actually looking
at one another student’s code as whole,
and under no circumstances should you be
copying any portion of another student’s code.
However, asking another student to focus
on a few lines of your code discuss why you are getting a particular
kind of error is reasonable. Basically, these “ground rules” are
intended to prevent
a student from “freeloading” off another student, even accidentally, since
they won’t get the full yummy nutritional educational goodness out of the
assignment if they do.

Assorted notes:

Don’t get the ideas of “spotlight” and “specular” confused. They give
similar kind of effects but are quite different things.
Sometimes you can run into “dynamic range issues,” in which color
values higher than some fixed upper limit will “clip” to that limit. You
can manually back your light RGB values down until this isn’t a problem,
or you may want to re-normalize all your color values after you compute them
(i.e. find the max color value, divide all your colors by that, and
then multiply them all by that upper limit). Or you could do some sort or
renormalization compromise, where you normalize to something slightly
bigger than
the language’s natural clip value and let just a few facets clip.
Usually, colors are specified as floating
point values between 0 and 1 (whether it be light colors or the
emissive colors) – so when you multiply them you get something
less than 1, which helps things to not get too crazy.
(I suppose the physics would indicate that the material
values should be less than 1
if they represented a fraction of light reflected.)
When rasterizing triangles,
0 to 1 color values usually need to be scaled to some integer
according to whatever the “native” depth of the frame buffer is. As a side
note, some older graphics cards had weird specialized floating
point formats designed to present floats in [0,1] and [-1,1] in
some sort of optimal fashion in a limited number of bits, but
nowadays it’s pretty much your usual IEEE floating point formats.)
You may want to
first get a sense of the size of the model you’re using. In
MATLAB, I’d use min() and max() (obviously use whatever equivalent in
whatever language you’re using) to find the most extreme vertices in
the various dimensions – that should give you a sense of where to put
the front-back clipping planes if you move it to some location.
In most API convetions, the Z_near and Z_far planes are
positive numbers in “worldspace/viewspace length units,” even if the
coordinate system is right-handed (meaning that Z becomes more negative
as objects are moved away from the camera). In the past, I’ve seen a few
cases where someone tried to set Z_near to a negative number (which puts
the near plane behind your head) and Z_far to a positive number.
That doesn’t make sense and will cause things to freak out.
I didn’t put anything in the assignment that requires
you to be able to scale the object, so you don’t have to. It’s easy to
put in if you feel like it, though (remember to do it before the translation).
If your 3-D model is taking ages to load in,
you might want to pre-load it – i.e. put in a flag
that checks to see if whatever variable you’re
loading the model in is already filled, and if it is, doesn’t bother
to load it again. That’s a trick I use a lot. In MATLAB, I use the “clear”
command to clear a variable and force a reload if I need to.
How should you choose the field of view? It depends on how far out
you put the object – further out, smaller field
of view, closer in, bigger field of view, to be able to show the whole
object. Most FPS games use a FOV of like 70 to 90 degrees; some
let you adjust
it. Humans have a FOV closer to 180, although our peripheral vision is
shoddy – it mostly detects motion. So when you’re playing a FPS, you’re
essentially playing with tunnel vision.
Notice that we’re not worrying about the “viewport transformation” (not
to be confused with the view transformation). After the projection matrix
is applied and you do the “perspective divide” – i.e. the divide by w
(of course that’s assuming you are using a perspective projection
matrix – an orthographic projection matrix wouldn’t do the divide),
your x and y coordinates should range from -1 to 1. In your program,
you may
have some outside of that, but we’ll rely on the capability of the 2-D
graphics routines in your package to clip edges appropriately.
The “viewport transform” is the final transform that maps this -1 to 1 coordinate system into actual pixel coordinates for the screen. Usually the upper left corner is (0,0) in screen pixel coordinates, and the lower right is something like (1023,767). In clip coordinates, y-up is positive, so you usually need to a negation somewhere there. Anyway, once you figure out what you’re mapping to where, it’s pretty easy to come up with the mapping you would need; if the display is happening in a particular subset of the screen, i.e. a window you’ve created, you would need an additional offset. But nowadays you rarely see any of this, as this final Viewpoint Transform is almost always handled by the GPU according to just a few screen size settings in your host API. In MATLAB, you can draw your triangles and then use
axis([-1 1 -1 1]) and that will crop the image to those limits.
If you’re using some other language
you might have to do something a bit more complicated. If you’d like to learn
more about viewpoint transformations, see
here.
Don’t forget to normalize the vectors used in lighting calculation! (This
is a common error.)
A lot of folks get confused about all the different coordinate spaces,
and do the calculations in the wrong space, or more often, have
problems when they erroneously mix two different space in one calculation.
In the past, I’ve seen some severely confused students try to element-wise
multiply (x,y,z,w) spatial coordinates with colors (r,g,b,w), yielding
(x*r,y*g,z*b,w*z). Please don’t do that. How would it make sense to multiply
the x coordinate by the amount of red??? It’s so nonsensical it makes my
brain twitch.
The
matrices we are borrowing from the
DirectX webpages assume a row-vector system,
where you multiply vectors by matrices like this:

new_vector = old_vector * transformation_matrix (row-style-transformation)

OpenGL assumes a column-vector system, where you multiply vectors like this:

new_vector = transformation_matrix * old_vector (column-style-transformation)

In the past, I’ve seen a few instances of people using the DirectX
transformation matrices in the second style.
If you want to use a column transformation style,
you’d need to use the *transpose* of the matrices
given in the DirectX documentation.
Just to re-emphasize, you should only use the 2-D drawing capabilities of
your chosen language. Each year I see people working on their programs and
they show me a *3-D* MATLAB plot with three axes (x, y, and z) shown, and
the student could spin the model around using the mouse. IF YOU HAVE
SOMETHING LIKE THAT, YOU HAVE DRASTICALLY MISSED THE POINT OF THE
ASSIGNMENT. How many dimensions does your laptop screen have? Two
dimensions, yes? If you’re using the 3-D plotting
capabilities of MATLAB to draw your object,
how do you think your laptop is
turning those 3-D coordinates into things to plot on your 2-D screen????? HW
#1 is about programming that pipeline yourself so you understand how it works.
Your HW #1 is all about rendering 3-D objects on a *2-D screen* by doing the
operations that map 3-D object into 2-D.
You should only be using 2-D drawing commands that draw in a 2-D window.

After the perspective projection matrix multiply
operation, you have homogeneous
coordinates for your vertices:

[x,y,z,w]

To finish the perspective projection, you divide by the fourth coordinate:

[x’,y’,z’,1] = [x/w, y/w, z/w, w/w].

At that point, the primary thing you might use z’ – or whatever
you call that third coordinate – for
is the z-sorting, so triangles
that are further away from the camera
get drawn first, and things closer to the camera get drawn later.

Your plot commands should only be using x’,y’ in drawing 2-D triangles
on the 2-D plane.

Yeah, I know I’m repeating myself a lot. But this issue comes up every year.