Graphics Processing Unit

GRAPHICS
PROCESSING UNIT
PRESENTED BY
LEKSHMI P A
ROLL NO:19
08/30/08 1
Presentation
Overview
Definition
Comparison with CPU
Architecture
GPU-CPU Interaction
GPU Memory
08/30/08 2
Why GPU?
 To provide a separate dedicated graphics
resources including a graphics processor
and memory.
 To relieve some of the burden of the
main system resources, namely the
Central Processing Unit, Main Memory, and
the System Bus, which would otherwise
get saturated with graphical operations
and I/O requests.
08/30/08 3
There comes
GPU
08/30/08 4
What is a GPU?
 A Graphics Processing Unit or GPU
(also occasionally called Visual
Processing Unit or VPU) is a dedicated
processor efficient at manipulating and
displaying computer graphics .
 Like the CPU (Central Processing Unit), it
is a single-chip processor.
08/30/08 5
HOWEVER,
The abstract goal of a GPU, is to

enable a representation of a 3D
world as realistically as possible. So
these GPUs are designed to provide
additional computational power that
is customized specifically to perform
these 3D tasks.
08/30/08 6
GPU vs CPU
 A GPU is tailored for highly parallel
operation while a CPU executes programs
serially.
 For this reason, GPUs have many parallel
execution units , while CPUs have few
execution units .
 GPUs have singificantly faster and more
advanced memory interfaces as they need
to shift around a lot more data than CPUs.
 GPUs have much deeper pipelines
(several thousand stages vs 10-20 for
CPUs).
08/30/08 7
BRIEF HISTORY
 First-Generation GPUs
– Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6
feature set.
 Second-Generation GPUs
– 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s
Radeon7500, and S3’s Savage3D; T&L; OpenGL and
DX7;Configurable.
 Third-Generation GPUs
– 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB,
DX7/8; Vertex Programmability + ASM
 Fourth-Generation GPUs
– 2002 onwards; GeForce FX family, Radeon 9700;
OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL;
0.13μ Process, 125M T/C, 200M T/S.
 Fifth-Generation GPUs
- GeForce 8X:DirectX10.
08/30/08 8
GPU Architecture
How many processing units?
How many ALUs?
Do you need a cache?
What kind of memory?
08/30/08 9
GPU Architecture

– Lots.
How many ALUs?
08/30/08 10
GPU Architecture

– Lots.
How many ALUs?
– Hundreds.
08/30/08 11
GPU Architecture

– Lots.
How many ALUs?
– Hundreds.
– Sort of.
08/30/08 12
GPU Architecture
– Lots.
How many ALUs?
– Hundreds.
– Sort of.
– very fast.
08/30/08 13
The difference…….
Without GPU With GPU

08/30/08 14
The GPU pipeline
 The GPU receives geometry

information from the CPU as an input
and provides a picture as an output
 Let’s see how that happens…
host vertex triangle pixel memory

interface processing setup processing interface
08/30/08 15
Details………..
08/30/08 16
Host Interface
The host interface is the communication
bridge between the CPU and the GPU.
 It receives commands from the CPU and
also pulls geometry information from
system memory.
 It outputs a stream of vertices in object
space with all their associated information
(texture coordinates, per vertex color etc) .
08/30/08 17
Vertex Processing
The vertex processing stage receives
vertices from the host interface in object
space and outputs them in screen space
This may be a simple linear
transformation, or a complex operation
involving morphing effects
No new vertices are created in this stage,
and no vertices are discarded
(input/output has 1:1 mapping)

08/30/08 18
Triangle setup
In this stage geometry information
becomes raster information (screen space
geometry is the input, pixels are the
output)
Prior to rasterization, triangles that are
backfacing or are located outside the
viewing frustrum are rejected

08/30/08 19
Triangle Setup (cont…..)
A pixel is generated if and only if its center is
inside the triangle
Every pixel generated has its attributes
computed to be the perspective correct
interpolation of the three vertices that make
up the triangle
08/30/08 20
Pixel Processing
Each pixel provided by triangle setup is
fed into pixel processing as a set of
attributes which are used to compute the
final color for this pixel
The computations taking place here
include texture mapping and math
operations

08/30/08 21
Memory Interface
Pixel colors provided by the previous
stage are written to the framebuffer
Used to be the biggest bottleneck before
pixel processing took over
Before the final write occurs, some pixels
are rejected by the zbuffer .On modern
GPUs z is compressed to reduce
framebuffer bandwidth (but not size).

08/30/08 22
Programmability in GPU
pipeline
In current state of the art GPUs, vertex
and pixel processing are now
programmable
The programmer can write programs that
are executed for every vertex as well as
for every pixel
This allows fully customizable geometry
and shading effects that go well beyond
the generic look and feel of older 3D
applications
08/30/08 23
GPU Pipelined Architecture
(simplified view)
GPU
…110010100100…
C
Vertex Vertex Pixel Frame
P Rasterizer
Setup Shader Shader buffer
U
Texture
Storage +
Filtering
Vertices Pixels
08/30/08 24
GPU Pipelined Architecture
(simplified view)
GPU
C
Vertex Vertex Pixel Frame
P Rasterizer
Setup Shader Shader buffer
U
Texture
Storage +
Filtering
One unit can limit the speed of the pipeline…
08/30/08 25
CPU/GPU interaction
The CPU and GPU inside the PC work
in parallel with each other
There are two “threads” going on,
one for the CPU and one for the GPU,
which communicate through a
command buffer: GPU reads commands from here
Pending GPU commands
CPU writes commands here
08/30/08 26
CPU/GPU interaction (cont)
If this command buffer is drained
empty, we are CPU limited and the
GPU will spin around waiting for new
input. All the GPU power in the
universe isn’t going to make your
application faster!
If the command buffer fills up, the
CPU will spin around waiting for the
GPU to consume it, and we are
effectively GPU limited
08/30/08 27
Synchronization issues
In the figure below, the CPU must
not overwrite the data in the
“yellow” block until the GPU is done
with the “black” command, which
references that data:
GPU reads commands from here
CPU writes commands here data

08/30/08 28
Inlining data
One way to avoid these problems is
to inline all data to the command
buffer and avoid references to
separate data: GPU reads commands from here
 However, this is also bad for performance, since we may need to copy seve
instead of merely passing around a pointer
08/30/08 29
GPU readbacks
The output of a GPU is a rendered image
on the screen, what will happen if the CPU
tries to read it? GPU reads commands from here
Pending GPU commands
 GPU must be synchronized with the CPU, ie it must

drain its entire command buffer, and the CPU must wait
while this happens
08/30/08 30
GPU readbacks (cont)
We lose all parallelism, since first
the CPU waits for the GPU, then the
GPU waits for the CPU (because the
command buffer has been drained)
Both CPU and GPU performance take
a nosedive
Bottom line: the image the GPU
produces is for your eyes, not for the
CPU (treat the CPU -> GPU highway
as a one way street)
08/30/08 31
About GPU memory…..
08/30/08 32
Memory Hierarchy
CPU and GPU Memory Hierarchy
Disk
CPU Main
Memory
GPU Video
Memory
CPU Caches
GPU Caches
CPU Registers GPU Constant GPU Temporary

Registers Registers
08/30/08 33
Where is GPU Data Stored?
– Vertex buffer
– Frame buffer
– Texture
Texture
Vertex Fragment
Vertex Buffer Processor
Rasterizer
Processor
Frame
Buffer(s)
08/30/08 34
CPU memory vs GPU
memory
CPU GPU
Registers Read/write Read/write
Local Mem Read/write stack None
Global Mem Read/write heap Read-only during

computation.
Write-only at end
(to pre-computed
address)
Disk Read/write disk None
08/30/08 35
It looks like…..
08/30/08 36
Some applications…..
Computer generated holography

using a graphics processing unit
Improve the performance of CAD
tools.
Computer graphics in games
08/30/08 37
New…..
NVIDIA's new graphics processing

unit, the GeForce 8X ULTRA, said to
represent the very latest in visual
effects technologies.
08/30/08 38
THANK
YOU
08/30/08 39

Graphics Processing Unit

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Graphics Processing Unit

Încărcat de

Drepturi de autor:

Formate disponibile

GRAPHICS

The abstract goal of a GPU, is to

How many processing units?

How many ALUs?

Do you need a cache?

What kind of memory?

How many processing units?

Do you need a cache?

What kind of memory?

How many processing units?

What kind of memory?

How many processing units?

Without GPU With GPU

 The GPU receives geometry

host vertex triangle pixel memory

host vertex triangle pixel memory

host vertex triangle pixel memory

host vertex triangle pixel memory

host vertex triangle pixel memory

One unit can limit the speed of the pipeline…

Pending GPU commands

CPU writes commands here

GPU reads commands from here

CPU writes commands here data

CPU writes commands here

Pending GPU commands

CPU writes commands here

 GPU must be synchronized with the CPU, ie it must

CPU Registers GPU Constant GPU Temporary

Local Mem Read/write stack None

Global Mem Read/write heap Read-only during

Disk Read/write disk None

Computer generated holography

NVIDIA's new graphics processing

S-ar putea să vă placă și