Sunteți pe pagina 1din 7

knight666 Just another WordPress weblog

About this entry Tutorial: A practical example of Data-Oriented Design


You’re currently reading “Tutorial: A practical For this tutorial, I’m going to assume a couple of things.
example of Data-Oriented Design,” an entry on
knight666 You are familiar with C++
You have at least a basic knowledge of OpenGL 2.0
Published: 6.26.11 / 3pm
You know what object-oriented programming is
Category: Tutorials
Bonus:
Tags: C++, data-oriented, dataoriented, dod,
object, objectoriented, oriented You are familiar with game programming
You are familiar with low-level optimizations
You know what a cache is

Ready? Alright, here we go!

What is Data-Oriented Design?

Data-oriented design (DOD) is when you look at your game’s state as data transforming into other data.
In object-oriented programming (OOP) you view things as objects, each holding their own state and
interacting with the world. A small example.

Suppose this is our game world. We have four balls bouncing around a room. When the balls hit a wall
they go into a different direction.

In OOP, we say each ball has a position, a direction and a speed. Each ball gets an Update function that
updates the position and a Render function that renders the ball’s sprite at their current position. This is
what the data looks like:

We have a Ball class with some members and we have a BallManager class that has a list of Ball instances.
To update the balls, the BallManager calls each ball’s Update and Render function.

void Ball::Update(float a_DT)


{
m_Pos += m_Dir * m_Speed * a_DT;
}
void BallManager::Update(float a_DT)
{
for (unsigned int i = 0; i < m_BallTotal; i++)
{
m_Balls[i]->Update(a_DT);
}
}

So what’s the problem? Works fine right? Each ball has its own properties and you update those
properties.

The main driving force behind data-oriented design is that you never tend to work with single objects, you
work with groups of objects. So what would that look like?

converted by Web2PDFConvert.com
Now, there is no Ball class. There is only the BallManager. And instead of a list of Ball instances, it has a list
of ball properties. Now its Update function looks like this:

void BallManager::Update(float a_DT)


{
for (unsigned int i = 0; i < m_BallData.filled; i++)
{
m_BallData.pos[i] += m_BallData.dir[i] * m_BallData.speed[i] * a_DT;
}
}

The advantages are not obvious. It just looks like I moved some stuff around. How is that going to help?

A good way of trying to wrap your brain around data-oriented programming is to think of it as a factory:
data comes in, an operation is performed and data comes out.

The Problem with Object-Oriented Programming

You might not realize there is a problem. But it’s been slowly creeping up on the industry. Computers,
consoles and even phones nowadays have multiple cores. But our programs are traditionally written for
one-cored computers. Yes, we can do multithreading. Yes, we have job systems. But our data is not
inherently multithreadable. Each object is treated as its own entity, which makes it difficult to stream.

The inherent problem is that while our processors have been steadily increasing in speed (at about 60%
per year), memory hasn’t kept up, increasing speed by only 10% per year. The gap is widening, and it is a
problem. Keeping your data in cache (very fast but small memory near the processor) is becoming
increasingly important. If you can keep all your data as near to the processor as possible, you can
squeeze some significant performance from it.

As a game programmer in training, I learned about the difference between an Array of Structures (AoS)
and a Structure of Arrays (SoA). I never paid much attention to it, because it seemed like such a hassle.
But that was because I wasn’t forced to think exclusively in AoS. When I did an internship intake for a
large gaming company, they asked me to use data-oriented design. I found that there wasn’t much
available on the subject. A handful of talks and a couple of blogposts. But no practical examples and no
source code.

So I decided to write my own.

The Test Program

The program we’re going to discuss is a voxel renderer. Voxels are essentially three-dimensional pixels. You
can build and shape worlds with them, but they are traditionally very expensive to render. That is because
graphics cards aren’t optimized for square blocks, they’re optimized for triangles. On top of that, we’re
going to throw a raytracer on top of it. Each voxel casts a ray from a target position to itself. If the ray
hits a block other than itself it is not visible and culled.

There are two versions, one using object-oriented programming (OOP) and one using data-oriented
design (DOD). I have set some restrictions for myself for this example:

No macro-optimizations – The OOP and the DOD version both use the same algorithm.
No micro-optimizations – The DOD version does not use SIMD functions, even though it totally could.
No multi-threading – Mostly because I didn’t want to multithread the OOP version.

Controls:

Left and Right – Rotate camera


Spacebar – Switch between OOP and DOD
G – Generate new voxels
Q – Decrease amount of voxels
W – Increase amount of voxels
R – Turn raycasting on or off
D – Turn debug lines on or off

Format:
The example was built with Microsoft Visual Studio 2008. A VS2008 solution is provided. It should be
forward-compatible with VS2010. The code will not compile out of the box on Macintosh or Linux machines,
because of Microsoft-specific features (Windows window, Microsoft extension for font rendering).

Download:
VoxelExample-1.0.zip (1.5 MB)

This example is provided without any license. That means that anyone can download, modify, redistribute
and sell it or any part of it. This example is provided “as-is”, without any guarantee of bug fixes or future
support.

The Basics

We’ll need a window with an OpenGL context. This is handled by Framework.h and Framework.cpp. It’s
not that interesting overall so I won’t focus too much on it. Suffice to say it creates a window, handles to
the game code and runs their Update and Render functions. It also does some global key handling.

Common.h contains includes and global defines. Included with this example is part of my Toolbox:

converted by Web2PDFConvert.com
TBSettings.h, TBMath.h and TBVec3.h. You are free to use these headers in your own projects, if you
wish. There are no license requirements attached to them.

The two interesting classes are OOPVoxels and DODVoxels. They are both children of the Game class.

We create voxels with a very simple algorithm. It starts at position (0, 0, 0) in the world and for every
voxel, it takes a step forward, backwards, left, right, up or down. The color is a random color between (0,
0, 0) and (1, 1, 1, 1). The result is a colorful, but rough, “snake” of voxels.

The Object-Oriented Approach

I’ll skip the part where I generate the voxels. It’s not really relevant for this example. If you want to look
at it you can find it in OOPVoxels::GenerateVoxels.

We will need some kind of container for voxel data. We are going to store the voxels in a VBO buffer.
Every voxel consists of 6 quads, so each voxel will need to add its sides to the vertex buffer. The size of
the voxels doesn’t change, so we store position offsets for each of the coordinates in the game class.

// offsets
float halfsize = m_VoxelSize / 2.f;
m_VoxelOffset[0] = Vec3(-halfsize, -halfsize, halfsize);
m_VoxelOffset[1] = Vec3( halfsize, -halfsize, halfsize);
m_VoxelOffset[2] = Vec3(-halfsize, halfsize, halfsize);
m_VoxelOffset[3] = Vec3( halfsize, halfsize, halfsize);
m_VoxelOffset[4] = Vec3(-halfsize, -halfsize, -halfsize);
m_VoxelOffset[5] = Vec3( halfsize, -halfsize, -halfsize);
m_VoxelOffset[6] = Vec3(-halfsize, halfsize, -halfsize);
m_VoxelOffset[7] = Vec3( halfsize, halfsize, -halfsize);

Next, we initialize our data and create a VBO to hold our voxels as a streaming quad and color buffer.

// voxel buffer
glGenBuffers(1, &m_VoxelVBOVertex); CGLE();
glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex); CGLE();
glBufferData(GL_ARRAY_BUFFER, m_VoxelDataSize * sizeof(Vec3), m_VoxelVertices, GL_STREAM_DRAW); CGLE();
glGenBuffers(1, &m_VoxelVBOColor); CGLE();
glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOColor); CGLE();
glBufferData(GL_ARRAY_BUFFER, m_VoxelDataSize * sizeof(Vec3), m_VoxelColors, GL_STREAM_DRAW); CGLE();
glBindBuffer(GL_ARRAY_BUFFER, 0); CGLE();

What our voxel needs to store for now is its position, a color and whether it is clipped.

class Voxel
{
Voxel();
~Voxel();
void SetClipped(bool a_State) { m_Clipped = a_State; }
bool IsClipped();
void SetColor(Vec3& a_Color) { m_Color = a_Color; }
Vec3& GetPosition() { return m_Pos; }
void SetPosition(Vec3& a_Position);
private:
Vec3 m_Pos;
Vec3 m_Color;
bool m_Clipped;
};

In the PostTick function, we loop over every voxel and set its clipping state to false.

for (unsigned int i = 0; i < Framework::s_VoxelCurrentTotal; i++)


{
m_Voxels[i].SetClipped(false);
}

And now for the slightly harder part. In the Render function, we want to lock the vertex and color buffer
and add voxels that aren’t clipped to it. So we add an AddToRenderList function to the Voxel class. It
takes a destination vertex buffer, a destination color buffer and the offsets for each vertex of the voxel.

Every voxel needs to check its clipping state before it can add itself to the buffer.

// write visible voxels to buffers

glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex);
Vec3* write_vertices = (Vec3*)glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE);
glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOColor);
Vec3* write_color = (Vec3*)glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE);

unsigned int total = 0;


for (unsigned int i = 0; i < Framework::s_VoxelCurrentTotal; i++)
{
if (!m_Voxels[i].IsClipped())
{
m_Voxels[i].AddToRenderList(write_vertices, write_color, m_VoxelOffset);
write_vertices += 24;
write_color += 24;

total++;
}
}

converted by Web2PDFConvert.com
glUnmapBuffer(GL_ARRAY_BUFFER); CGLE();
glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex);
glUnmapBuffer(GL_ARRAY_BUFFER); CGLE();

glBindBuffer(GL_ARRAY_BUFFER, 0);

The Voxel::AddToRenderList function is very long, but most of it is copied. It writes its position plus an
offset to the buffer and copies the same color to all six quads.

void OOPVoxels::Voxel::AddToRenderList(Vec3* a_Vertex, Vec3* a_Color, Vec3* a_Offset)


{
Vec3* dst_pos = a_Vertex;
Vec3* dst_color = a_Color;
Vec3* offset[] = {
// Front
&a_Offset[0], &a_Offset[1], &a_Offset[3], &a_Offset[2],
// more offsets here
};
Vec3** read_offset = offset;

for (unsigned int j = 0; j < 24; j += 4)


{
dst_pos[0] = m_Pos + *read_offset[0];
dst_pos[1] = m_Pos + *read_offset[1];
dst_pos[2] = m_Pos + *read_offset[2];
dst_pos[3] = m_Pos + *read_offset[3];
dst_color[0] = m_Color;
dst_color[1] = m_Color;
dst_color[2] = m_Color;
dst_color[3] = m_Color;
dst_color += 4;
dst_pos += 4;
read_offset += 4;
}
}

We use the unsigned int total to figure out how many voxels are actually in the quad buffer. Then we
render.

// draw buffers

glEnableClientState(GL_VERTEX_ARRAY); CGLE();
glEnableClientState(GL_COLOR_ARRAY); CGLE();
glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOVertex); glVertexPointer(3, GL_FLOAT, 0, 0); CGLE();
glBindBuffer(GL_ARRAY_BUFFER, m_VoxelVBOColor); glColorPointer(3, GL_FLOAT, 0, 0); CGLE();

glDrawArrays(GL_QUADS, 0, total * 4 * 6); CGLE();


glBindBuffer(GL_ARRAY_BUFFER, 0); CGLE();

glDisableClientState(GL_COLOR_ARRAY); CGLE();
glDisableClientState(GL_VERTEX_ARRAY); CGLE();

Alright, voxels on the screen!

Now, let’s focus on the raytracing portion. When raytracing is turned on, every voxel generates a ray
that starts at the target position and points towards the voxel. This ray then stores the voxel instance
that is closest to the target position. So, each ray checks each voxel to see which is the closest. Luckily
one of my teachers had a very fast ray-AABB collision check.

bool OOPVoxels::Ray::CollidesWith(Voxel* a_Other)


{
float t0, t1;
float tmin = 0.f;
float tmax = 10000.f;

t0 = (a_Other->GetMin().x - m_Origin.x) * m_Direction.x;


t1 = (a_Other->GetMax().x - m_Origin.x) * m_Direction.x;
if (t0 > t1)
{
tmin = (t1 > tmin) ? t1 : tmin;
tmax = (t0 < tmax) ? t0 : tmax;
}
else
{
tmin = (t0 > tmin) ? t0 : tmin;
tmax = (t1 < tmax) ? t1 : tmax;
}

// same for y and z axes

converted by Web2PDFConvert.com
if (tmin <= tmax && tmin <= m_TimeMin)
{
m_TimeMin = tmin;
m_Hit = a_Other;
return true;
}

return false;
}

But still, every ray needs to check every voxel. This results in horrible performance!

The obvious optimization is to update the algorithm so we don’t need to check so many voxels. But this is
supposed to be an example in data-oriented design.

The Data-Oriented Approach

Instead of thinking in terms of voxels, what is the minimum amount of data we need per voxel? Well,
they’re all the same size. So all we need is a position and a color. So what we have now is a container for
all our voxels.

struct VoxelData
{
float* x;
float* y;
float* z;

float* r;
float* g;
float* b;
};

What next? Well, our voxel data needs to be transformed into triangles on the screen. But what is the
data that the VBO’s need to put stuff on the screen? It’s a position and a color. So we need a container for
the vertex data.

struct VertexData
{
Vec3* vertex;
Vec3* color;
};

As the final step, we need to transform voxel data into vertex data. We’re going to use a function for
that acts like a factory. Here is the diagram:

The internal structure of the function is much the same as the OOP version. However, I have split the
saving of position and color data to improve caching. You can find it in DODVoxels::GenerateFaces.

So, what needs to change when we want to add raycasting? Obviously we’ll need a structure to hold our
ray data.

struct RayCastData
{
float* ox;
float* oy;
float* oz;
float* dx;
float* dy;
float* dz;
};

And we’ll need a function that takes voxel data and generates ray data.

void GenerateRayCast(
RayCastData* a_OutRays, unsigned int& a_OutRayCount,
VoxelData* a_InVoxels, unsigned int a_VoxelCount,
Vec3 a_InTarget
);

Finally, we’ll need a function that converts ray data into vertex data.

void SolveRayCast(
VoxelData* a_OutVoxels, unsigned int& a_OutVoxelCount,
RayCastData* a_InRays, unsigned int a_InRayCount,
VoxelData* a_InVoxels, unsigned int a_InVoxelCount
);

This is, in my opinion, the biggest advantage of DOD over OOP. It’s clear where your data is and in what
way it needs to be converted. The DODVoxels::SolveRayCast function is the biggest change from the
OOP version. It has become massive. But the algorithm is the same.

Now, putting it all together:

void DODVoxels::PostTick()

converted by Web2PDFConvert.com
{
if (Framework::s_RayCast)
{
Vec3 target(
Math::CosDeg(Framework::s_TargetRotation) * 500.f,
10.f,
Math::SinDeg(Framework::s_TargetRotation) * 500.f
);

GenerateRayCast(
m_VoxelRayCast, m_VoxelRayCastTotal,
m_VoxelData, Framework::s_VoxelCurrentTotal,
target
);

SolveRayCast(
m_VoxelRenderData, m_VoxelRenderTotal,
m_VoxelRayCast, m_VoxelRayCastTotal,
m_VoxelData, Framework::s_VoxelCurrentTotal
);
m_VoxelRenderDataUsed = m_VoxelRenderData;
}
else
{
m_VoxelRenderDataUsed = m_VoxelData;
m_VoxelRenderTotal = Framework::s_VoxelCurrentTotal;
}
}

But, what if we wanted to multi-thread? For instance, what if we wanted to split up the solving of raycasts
into multiple jobs? With DOD it becomes extremely simple. We extend the SolveRayCast function to not
only take a count of data it needs to chew, but also an offset into the array. Because we don’t read and
write to the same array, we can split it up without race conditions.

A crazy cool benefit, wouldn’t you say?

Results

I made this example with an idea: I’m going to take an algorithm that abuses the CPU’s cache and apply
data-oriented design to it to make it faster. The results are… disappointing. Here is the graph for just
dumping voxels on screen:

Mind the gap between 5000 and 10000 voxels. As you can see, the DOD version is consistently faster
than the OOP version. Even at 90,000 voxels it is 1 fps faster.

But then we get the graph for the version with raycasting:

What happened? The OOP version is consistently faster than the DOD version! To be honest, I haven’t a
clue. The DOD version is aligned better in memory and with improved caching comes better performance.
But I don’t get any.

Conclusion

I really like data-oriented design. Forcing you to think in terms of data instead of objects means you’re
taking multithreading into account from the ground up. It can, however, be extremely tricky to think in
terms of data. But I’m positive that will change with experience.

converted by Web2PDFConvert.com
The reason I wrote this tutorial was because there wasn’t any proper literature on the subject. I hope my
endeavors have saved you some time.

1 Comment
Jump to comment form | comments rss [?] | trackback uri [?]

Kiaran Ritchie Thanks for the article. This topic needs more concrete examples and less ‘head in the clouds’
6.27.11 / 4am discussion.

I could be wrong, but I think I know why your DOD is not performing better than the OO
version.

In your OO voxel class, the data is allocated on the stack (no pointers / new/delete). I
believe this will make it contiguous in memory.

But in your DOD example, you appear to be making structs that contain pointers to floats. I
haven’t looked at the code, but I’m guessing this means you are allocating on the heap which
make be fragmented. Also, you are probably dereferencing those pointers all over the place
which is hardly cache friendly.

Why not declare your structs to contain plain old floats, then create a big array VoxelData
voxels[MAXSIZE] on the stack. Or even split the VoxelData structure into multiple arrays to
process the position/color/etc separately.

I may be wrong here, this is just my initial thoughts. Thanks for the article, please keep toying
with it and find the performance barrier!!

Have your say


XHT ML: You can use these tags: <a href="" title=""> <abbr title=""> <acrony m title="">
<b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite="">
<strike> <strong>

Name (required)

Email (required)

Website

Add your comment

About Recently Categories

This is an example of a WordPress page, you 07.10 TinyImageLoader 1.7.0 Minecraft (1)
could edit this to put information about yourself
06.26 Tutorial: A practical example of Data- Rants (2)
or your site so readers know where you are Oriented Design
coming from. You can create as many pages like Reviews (1)
04.07 Capture the Glowstone in Minecraft
this one or sub-pages as you like and manage all Tutorials (1)
04.04 Installing YoghurtGum
of your content inside of WordPress. Uncategorized (2)
03.31 YoghurtGum: Two Years of Work
YoghurtGum (5)
06.11 Why Picasa sucks at tagging
04.30 YoghurtGum 0.2
04.16 On CMake and priority lists
01.24 Now with less crappy code!
01.23 YoghurtGum 1.0 beta

Powered by Hemingway flavored Wordpress. Entries RSS Comments RSS

converted by Web2PDFConvert.com

S-ar putea să vă placă și