Sunteți pe pagina 1din 39

GRAPHICS

PROCESSING UNIT
PRESENTED BY

LEKSHMI P A
ROLL NO:19

08/30/08 1
Presentation
Overview
Definition
Comparison with CPU
Architecture
GPU-CPU Interaction
GPU Memory

08/30/08 2
Why GPU?
 To provide a separate dedicated graphics
resources including a graphics processor
and memory.
 To relieve some of the burden of the
main system resources, namely the
Central Processing Unit, Main Memory, and
the System Bus, which would otherwise
get saturated with graphical operations
and I/O requests.

08/30/08 3
There comes

GPU

08/30/08 4
What is a GPU?
 A Graphics Processing Unit or GPU
(also occasionally called Visual
Processing Unit or VPU) is a dedicated
processor efficient at manipulating and
displaying computer graphics .
 Like the CPU (Central Processing Unit), it
is a single-chip processor.

08/30/08 5
HOWEVER,

The abstract goal of a GPU, is to


enable a representation of a 3D
world as realistically as possible. So
these GPUs are designed to provide
additional computational power that
is customized specifically to perform
these 3D tasks.

08/30/08 6
GPU vs CPU
 A GPU is tailored for highly parallel
operation while a CPU executes programs
serially.
 For this reason, GPUs have many parallel
execution units , while CPUs have few
execution units .
 GPUs have singificantly faster and more
advanced memory interfaces as they need
to shift around a lot more data than CPUs.
 GPUs have much deeper pipelines
(several thousand stages vs 10-20 for
CPUs).
08/30/08 7
BRIEF HISTORY
 First-Generation GPUs
– Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6
feature set.

 Second-Generation GPUs
– 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s
Radeon7500, and S3’s Savage3D; T&L; OpenGL and
DX7;Configurable.

 Third-Generation GPUs
– 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB,
DX7/8; Vertex Programmability + ASM

 Fourth-Generation GPUs
– 2002 onwards; GeForce FX family, Radeon 9700;
OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL;
0.13μ Process, 125M T/C, 200M T/S.
 Fifth-Generation GPUs
- GeForce 8X:DirectX10.
08/30/08 8
GPU Architecture

How many processing units?

How many ALUs?

Do you need a cache?

What kind of memory?

08/30/08 9
GPU Architecture

How many processing units?


– Lots.
How many ALUs?

Do you need a cache?

What kind of memory?

08/30/08 10
GPU Architecture

How many processing units?


– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?

What kind of memory?

08/30/08 11
GPU Architecture

How many processing units?


– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?
– Sort of.
What kind of memory?

08/30/08 12
GPU Architecture
How many processing units?
– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?
– Sort of.
What kind of memory?
– very fast.

08/30/08 13
The difference…….

Without GPU With GPU


08/30/08 14
The GPU pipeline

 The GPU receives geometry


information from the CPU as an input
and provides a picture as an output
 Let’s see how that happens…

host vertex triangle pixel memory


interface processing setup processing interface

08/30/08 15
Details………..

08/30/08 16
Host Interface
The host interface is the communication
bridge between the CPU and the GPU.
 It receives commands from the CPU and
also pulls geometry information from
system memory.
 It outputs a stream of vertices in object
space with all their associated information
(texture coordinates, per vertex color etc) .
host vertex triangle pixel memory
interface processing setup processing interface

08/30/08 17
Vertex Processing
The vertex processing stage receives
vertices from the host interface in object
space and outputs them in screen space
This may be a simple linear
transformation, or a complex operation
involving morphing effects
No new vertices are created in this stage,
and no vertices are discarded
(input/output has 1:1 mapping)

host vertex triangle pixel memory


interface processing setup processing interface

08/30/08 18
Triangle setup
In this stage geometry information
becomes raster information (screen space
geometry is the input, pixels are the
output)
Prior to rasterization, triangles that are
backfacing or are located outside the
viewing frustrum are rejected

host vertex triangle pixel memory


interface processing setup processing interface

08/30/08 19
Triangle Setup (cont…..)
A pixel is generated if and only if its center is
inside the triangle
Every pixel generated has its attributes
computed to be the perspective correct
interpolation of the three vertices that make
up the triangle

08/30/08 20
Pixel Processing
Each pixel provided by triangle setup is
fed into pixel processing as a set of
attributes which are used to compute the
final color for this pixel
The computations taking place here
include texture mapping and math
operations

host vertex triangle pixel memory


interface processing setup processing interface

08/30/08 21
Memory Interface
Pixel colors provided by the previous
stage are written to the framebuffer
Used to be the biggest bottleneck before
pixel processing took over
Before the final write occurs, some pixels
are rejected by the zbuffer .On modern
GPUs z is compressed to reduce
framebuffer bandwidth (but not size).

host vertex triangle pixel memory


interface processing setup processing interface

08/30/08 22
Programmability in GPU
pipeline
In current state of the art GPUs, vertex
and pixel processing are now
programmable
The programmer can write programs that
are executed for every vertex as well as
for every pixel
This allows fully customizable geometry
and shading effects that go well beyond
the generic look and feel of older 3D
applications
host vertex triangle pixel memory
interface processing setup processing interface

08/30/08 23
GPU Pipelined Architecture
(simplified view)
GPU

…110010100100…

C
Vertex Vertex Pixel Frame
P Rasterizer
Setup Shader Shader buffer
U

Texture
Storage +
Filtering

Vertices Pixels

08/30/08 24
GPU Pipelined Architecture
(simplified view)

GPU

C
Vertex Vertex Pixel Frame
P Rasterizer
Setup Shader Shader buffer
U

Texture
Storage +
Filtering

One unit can limit the speed of the pipeline…

08/30/08 25
CPU/GPU interaction
The CPU and GPU inside the PC work
in parallel with each other
There are two “threads” going on,
one for the CPU and one for the GPU,
which communicate through a
command buffer: GPU reads commands from here

Pending GPU commands

CPU writes commands here

08/30/08 26
CPU/GPU interaction (cont)
If this command buffer is drained
empty, we are CPU limited and the
GPU will spin around waiting for new
input. All the GPU power in the
universe isn’t going to make your
application faster!
If the command buffer fills up, the
CPU will spin around waiting for the
GPU to consume it, and we are
effectively GPU limited
08/30/08 27
Synchronization issues
In the figure below, the CPU must
not overwrite the data in the
“yellow” block until the GPU is done
with the “black” command, which
references that data:

GPU reads commands from here

CPU writes commands here data


08/30/08 28
Inlining data
One way to avoid these problems is
to inline all data to the command
buffer and avoid references to
separate data: GPU reads commands from here

CPU writes commands here

 However, this is also bad for performance, since we may need to copy seve
instead of merely passing around a pointer

08/30/08 29
GPU readbacks
The output of a GPU is a rendered image
on the screen, what will happen if the CPU
tries to read it? GPU reads commands from here

Pending GPU commands

CPU writes commands here

 GPU must be synchronized with the CPU, ie it must


drain its entire command buffer, and the CPU must wait
while this happens
08/30/08 30
GPU readbacks (cont)
We lose all parallelism, since first
the CPU waits for the GPU, then the
GPU waits for the CPU (because the
command buffer has been drained)
Both CPU and GPU performance take
a nosedive
Bottom line: the image the GPU
produces is for your eyes, not for the
CPU (treat the CPU -> GPU highway
as a one way street)
08/30/08 31
About GPU memory…..

08/30/08 32
Memory Hierarchy
CPU and GPU Memory Hierarchy
Disk

CPU Main
Memory

GPU Video
Memory
CPU Caches
GPU Caches

CPU Registers GPU Constant GPU Temporary


Registers Registers
08/30/08 33
Where is GPU Data Stored?
– Vertex buffer
– Frame buffer
– Texture

Texture

Vertex Fragment
Vertex Buffer Processor
Rasterizer
Processor
Frame
Buffer(s)

08/30/08 34
CPU memory vs GPU
memory
CPU GPU
Registers Read/write Read/write

Local Mem Read/write stack None

Global Mem Read/write heap Read-only during


computation.
Write-only at end
(to pre-computed
address)

Disk Read/write disk None

08/30/08 35
It looks like…..

08/30/08 36
Some applications…..

Computer generated holography


using a graphics processing unit
Improve the performance of CAD
tools.
Computer graphics in games

08/30/08 37
New…..

NVIDIA's new graphics processing


unit, the GeForce 8X ULTRA, said to
represent the very latest in visual
effects technologies.

08/30/08 38
THANK
YOU

08/30/08 39

S-ar putea să vă placă și