Sunteți pe pagina 1din 42

Authors:

BEVILACQUA Hugo
EL FATAYRI Oussama
FAUXPOINT Guillaume
SAHIN Serdar

4th year AE-SE 4


PROMO 48

Mentored Project

Game Console: Graphics Part


Bibliographic Report
2013/2014
Mentors : DI MERCURIO Sebastien, DRAGOMIRESCU Daniela

Project Summary
The purpose of this project is to design a NES/SNES graphics card on a FPGA card.
This study can be summarized in two main points:
The knowledge of the techniques, architectures, algorithms in order to design this type of
cards while studying what has been done in the past and what is being done today.
The choice of the best technique while taking into consideration the material limitations in
order to get the best out of the performance/quality.

Table of contents
Chapter 1

Review of the state of the art in 2D graphics acceleration ........................................2

1.1

A brief history of video hardware: the increasing need of accelerators..............................2

1.2

Video displayer controllers (VDC) towards video display processors (VDP).................... 3

1.3

Study and comparison of some 2D graphics accelerator architectures ..............................5

1.4

Graphics Accelerators Limitations ................................................................................... 10

Chapter 2

Major algorithms required by accelerators ............................................................... 13

2.1

Introduction ...................................................................................................................... 13

2.2

Processor and memory interfaces ..................................................................................... 13

2.3

Raster Graphics Management .......................................................................................... 14

2.4

Primitive Generation Algorithms ..................................................................................... 23

2.5

Image Processing Algorithms ........................................................................................... 29

2.5.1

Elementary geometric transformations......................................................................... 29

2.5.2

Homogeneous coordinates and composition ................................................................. 30

2.5.3

Anti-aliasing ................................................................................................................. 30

Chapter 3

Our upcoming design objectives ............................................................................... 33

3.1

Specifications of our hardware ......................................................................................... 33

3.2

Algorithm and architecture that we will retain ............................................................... 34

Bibliography .................................................................................................................................... 35

Introduction
We wish to develop a portable gaming console on two boards. One is microcontroller board,
it will host the operating system, and the other one, is a graphics card synthetised on an FPGA.
This GPU should match the performances of the display of 1980s 16-bit gaming systems like
NES/SNES.
Therefore, we will start our study by looking into the graphics systems used back then and
their architectures, the components, memory management etc.
Once this step is completed, we will analyse the common algorithms used in graphics
prcessing, their use in GPUs, and advantages of newer operations compared to the techniques used
until 90s.
And to conclude, we will make a conclusion about what we can do basen on the hardware
and time we will have, and chose most effective solutions.

Chapter 1
Review of the state of the art in 2D graphics
acceleration
1.1 A brief history of video hardware: the increasing need of
accelerators
A graphics accelerator is a system that allows executing graphical calculation more efficiently than
a generic microprocessor.
The seventies are the starting point of these concepts, with the birth of the video game console
market, a system in true need of complex graphics processing, and also with more common use of
personal computers (Apple, Commodore, and IBM at first). Indeed, computers were becoming more
accessible to the general public and in order to satisfy ergonomic criteria, these had to possess
graphical human-machine interfaces. These interfaces evolved and became more complex each year,
and the need to execute graphical operations faster and more efficiently become more significant
very quickly.
According to [1], [2] video hardware devices used to be simple video
interfacing circuits, built with discrete components to create the
required logic, but they quickly ended up becoming separate
integrated circuits. Actually, from the seventies to the nineties, there
has been three major kind of video generators: discrete custom
architectures, programmable logic devices and integrated custom
designs. The devices within the first two categories are more flexible,
but the latter integrated circuits, also called video display controllers
(VDC), are cheaper and can do more complex calculations. During
early nineties, Texas Instruments (TI) commercialised the very first
processor with integrated graphical instructions (basically the
equivalent of what we would call graphics card today).
Figure 1 : TI-00/4A home
1
computer

An historian [3] explains that Nvidia and ATI were not the same graphics card leaders they are
today, they didnt even exist until the late eighties. Therefore, when a PC or gaming console
manufacturer needed graphical power, they either developed it by themselves (most often custom
derivatives of popular ICs on the market) or used the devices that were on the market which was
mostly dominated by Texas Instruments, Yamaha and IBM. This is the reason why graphical
1

TI-99/4A - The Texas Instruments home computer. Because it's not just for ABCs. It's also for PhDs,
1983.
[Online].
Available:
http://oldcomputers.net/oldads/old-computer-ads.html
[Accessed
29/01/2014].

acceleration units back then were less generic and sometimes built specifically for some machines
(NES, Atari).
This gives us a vague classification of modern graphics card predecessors.

1.2 Video displayer controllers (VDC) towards video display


processors (VDP)
As mentioned in the preceding section, VDCs are integrated circuits (mostly LSI or VLSI ASIC),
and they are used to generate video signals that will be fed to the screen. So we can find some
hardware acceleration in any VDC at least for generating signals by reading image data from the
memory. As opposed to the video display processors (VDP, also called GPU), VDCs main objective
is supposed to be exclusively signal generation, but some of them include graphics processing
capabilities.
Actually, the related Wikipedia article [4] separates video display controllers into four categories:
- Video shift registers: a shift register which gets video data with help from the CPU and output a
video signal bitrate with the associated clocks, usually low resolution. We can take the example of
RCA CDP1861C [5], a CMOS device that designed to help the RCA 1802 microprocessor generate
video signals using the processors DMA and an external RAM.

Figure 2 : Typical CDP1802 microprocessor system using the CDP1862C

RCA CMOS LSI Products. Typical CDP1802 microprocessor system using the CDP1862C. [Online]
Available: http://pdf.datasheetarchive.com/indexerfiles/Scans-054/DSAIH000100740.pdf [Accessed
29/01/2014]

- Cathode Ray Tube Controllers (CRTC): shares independent access to the RAM with the CPU
and reads video data and generates video signals (sometimes with accelerated text support mode).
Motorolas MC6845 [6] can be a good example to illustrate a CRTCs role: it generates both the
video memory addresses, video synchronisation clocks and basically frees the microprocessor from
bothering with their timings. No DMA is required and the memory is multiplexed between this
chip and the CPU. Also it can provide specific screen density for alphanumeric format and
acceleration for cursor display.

Figure 3 : Typical MC6845 CRTC Application

- Video interface controllers: an evolved CRTC, it usually supports text generation, sprites, video
RAM dedicated to colour palette. We can cite MC6847 [7], used in many 8-bit home computers,
and also VIC-II used in Commodore 64 [8]. Both devices have multiple graphics and alphanumeric
resolution modes, internal colour sets (9 colours for the Motorola chip and 16 for VIC-II). VIC-II
can also handle up to 8 sprites (it has a 16kB dedicated memory for screen, characters, and sprites).
- Coprocessor: can do specific calculations, reads and writes to the RAM, has its own image buffer
and can manipulate it with character generation, sprite management etc. the main CPU simply
gives the instructions... Comes very close to be considered as a VDP. Notable 8-bit or 16-bit gaming
platforms like SNES, Atari 5200, and Sega Megadrive used video coprocessors for graphics
acceleration. Some platforms used custom architectures like the Picture Processing Unit (PPU) in
NES/SNES [11] or ANTIC in Atari 5200 [9], others used designs based on existing common chips
like TMS9918 [10].
G. Singer explains that during late eighties [3], the acceleration need increases even higher with
home computer GUIs, and Video Display Processors (VDP) (also known as Graphics Processing

Motorola semiconductors. TYPICAL CRT CONTROLLER APPLICATION [Online]. Available:


http://pdf1.alldatasheet.fr/datasheet-pdf/view/4159/MOTOROLA/MC6845.html
[Accessed
29/01/2014].

Unit (GPU)) start replacing VDCs. They usually come with independent processing units and go
as far as doing 3D graphics rendering. Texas Instruments TMS34010 was the first programmable
GPU on the market. Its successor TMS34020 even provided a possibility to be connected to a
floating-point unit coprocessor to very quickly render 3D graphics. This chip was commonly used
on Windows, Amiga and Commodore home computers of that time.
We will focus on coprocessor VDCs and on VDPs, but video signal and memory management of
simpler chips could also be useful for our project.

1.3 Study and comparison of some 2D graphics accelerator


architectures
In order to be able to make an accelerator architecture choice, we will study some existing
architectures, find their common points, and point out their particular advantages and drawbacks.
We will start by studying simple rookie architectures mostly made by other students or hobbyists
and then we will look into more and more complex, professional architectures.

Architecture of the 2011/2012 project group at INSA:

Figure 4 : Architecture implemented on FPGA by the previous project group

A group of three students worked on a project [12] quite similar to ours in which they developed a
graphics coprocessor to be used with VGA interfaced screen.
In this architecture, implemented on an FPGA, the device generates the video display signals for
the VGA format and has a dedicated video RAM in which there are two scrollable display layers.
The image manipulation instructions come from the CPU through a serial SPI bus. The instruction
decoder can control the image buffer to manage transparency between layers, manage scrolling, blit
sprites, and draw lines or rectangles.
A VGA controller which generates the video display signals towards a DAC, and generates required
pixel coordinates towards the image buffer who provides the RGB data. Image buffer manages to
provide this data by accessing the video RAM through a RAM controller.

Q. PERRET, S. THIEBAUT and K. LE SAYEC, Graphic coprocessor on FPGA, INSA


Toulouse, Project Report, June 2012.
5

A custom architecture described in an IJESA article:


The authors, K. S. AY and A. DOGAN of [13] build an architecture that relies both on
reprogrammable hardware (FPGA) and software. They view 2D display process as a three-step
procedure of modelling, rendering and monitoring.

Figure 5: Hardware/software full architecture of the graphics system

Their hardware architecture uses pre-existing IP cores for the VGA/LCD interface controller and
the RAM controller, and it uses a PowerPC microprocessor as the CPU of the system. A PLB bus
is used to establish communication between these IP cores. There are two custom IP cores on this
architecture: a Bresenham IP core which uses the Bresenham line drawing algorithm and a BitBLT
IP core for fast data transfers and image blit operations, alpha blending, scan filling etc.
The hardware architecture is completed with some software API that can draw polygons or write
characters etc. by generating vector graphics which are then rasterized by IP core drivers and
displayed.
The use of PLB bus gives uniform IP core inputs and outputs, usually with instruction FIFO
buffers, which allows each block to interpret very different kinds of commands without needing
more I/O interface.

Nintendo Entertainment Systems Picture Processing Unit (PPU):


It is hard to get official documentations for a chip like the Nintendo Entertainment Systems PPU;
it is a chip made by Ricoh specifically for NES and its original documentation is confidential, only
a vague idea of its architecture is public with Nintendos PPU patent [14]. However, many
passionate fans or hackers wrote their own documentations for this chip and completed each other
(basing themselves on leaked original Japanese datasheets of Nintendo), especially we have P.
Diskins technical documentation [11], NESdevs open source wiki [15], and we will take reference
of these documents because they are accurate enough to emulate the complete NES system.

K. S. AY, A. DOGAN, Hardware/Software Co-Design of a 2D Graphics System on FPGA, Dept. of


Electrical and Electronics Engineering, Anadolu University, Eskisehir, Turkey, March 2013. [Online]
Available: http://airccse.org/journal/ijesa/papers/3113ijesa02.pdf [Accessed Jan. 31 2014]

Figure 6: The architecture described in Nintendo's patent, upon which PPU was designed

PPU is a VDC with 16kB video RAM and 256 byte sprite RAM (SPR-RAM) and with a fast DMA
access for fast data transfer from CPU memory to the video memory. This chip can generate the
required video signal for display by its own, and it accelerates features like scrolling; pixel colours
and multi-layer display with motion picture (sprites).
The particular thing about this device, is the way it organises its 16kB video memory to compress
the video data, and accelerate scrolling feature; according to [16] there are three main zones in the
memory:
Pattern Tables: this zone contains 8x8 pixel tile data where each tile can contain up to 4
different colours and is defined by a string of 16 bytes (64 bits) where 2 bits are used to
define a pixel. This pattern table is further separated into two zones:
o Sprite Memory : this zone contains tiles that are to be used as sprites,
o B/G Tiles: zone for tiles used for the background,
Name & Attribute Tables: this zone is used to create the background by combining B/G
tiles and colour palettes.
o Name Tables: There are four name tables on the PPU (but only two of them are
useable), it is basically a matrix of references to the tiles in the pattern tables, where
a byte represents a 8x8 pixel tile and each name table has 30x32 tiles, therefore
making a total of 256x240 pixels (NES full-screen resolution). Depending on which
two of these zones are used, either horizontal or vertical scrolling is possible by
continuously updating the unseen parts of the name tables.

T. HIROO UEDA and I. HIROMITSU YAGI, T.V. Game System Having Reduced Memory Needs,
U. S. Patent 4,824,106, 25 Apr. 1989. [Online] Available:
http://www.freepatentsonline.com/4824106.pdf [Accessed Jan. 31 2014]

Attribute Table: associated to a particular name table, and makes the association
between tiles and the colour palette.
Colour Palettes: NES has a 52 colour system palette but this zone in the memory has two
16-colour palettes that can be configured. One of these palettes is for background tiles, the
other one is for sprites.
o

Figure 7: Scrolling in the PPU

ANTIC:
Alphanumeric Television Interface Controller is a custom VDC chip used in the Atari 5200 gaming
console. ANTIC cannot generate video signals, it is used together with the CTIA/GTIA chip who
manages colours, frame buffer and the video display. We managed to find some information about
its architecture in its leaked datasheet [9] and the Wikipedia article [17].
ANTICs hardware capabilities are mostly the acceleration of the playfield (background), players
and missiles (sprites) and text display and also of operations like scrolling etc. It can also generate
interrupts during specific scanning positions, or on collision detection with other pixels. It has
fourteen different display modes, combining some of these features.

TMS9918:
Functionalities offered by this Texas Instruments chip are quite similar to Ataris ANTIC, it
supports four different display modes that can support either character or bitmap display with
limitation on the number of different colours.
This chip [10] can have up to 16kB video RAM, it has a 16-color palette but one of the colours is
used for transparency and there are up to 32 monochrome 8x8 or 16x16 pixel sprites (but a limited
number of four per horizontal line). It supports alphanumeric text display acceleration, thanks to
its 8x8 pixel pattern generator.

P. DISKIN, Nintendo Entertainment System Documentation, Aug. 2004. [Online] Available:


http://nesdev.com/NESDoc.pdf [Accessed Jan. 31 2014]
8

Figure 8 : TMS9928 used in a circuit

It interfaces with the CPU through an 8-bit data bus, three control signals and an interrupt, it
allows the CPU to read or write to the VRAM and to the internal registers. VRAM can be organised
into sprite memory, pattern name tables and color tables using the VDCs internal registers.
This chip was used in many gaming consoles (especially by Sega for Master System and Mega
Drive) with small modifications (additional registers and display modes).

TMS3410:
This is a full feature programmable microprocessor [18, 19] with integrated support for graphics
operations. It only provides 2D acceleration, however, it was easily used for rendering 3D animations
using an appropriate API as it can be seen in [20].

Figure 9: TMS architecture

Texas Instruments TMS9918, Wikipedia, the free encyclopedia. 26-janv-2014. [Online].


Available: http://en.wikipedia.org/wiki/List_of_home_computers_by_video_hardware
[Accessed 29/01/2014].
9
Texas Instruments, Graphics System Processors, TMS34010 datasheet, Houston, Jun. 1986 (Revised
June 1991). [Online] Available: http://www.ti.com/lit/ds/symlink/tms34010.pdf [Accessed Jan. 31
2014]

On an architectural point of view, it has an embedded 32-bit processor that can execute graphics
instruction as well as general purpose instructions. There is a 32-bit data bus between different
entities.

10

Figure 10: TMS demonstration

The main graphics instructions supported by this chip are PIXBLT/FILL for manipulating 2D
pixel arrays (copy, fill etc.)., LINE instruction to execute Bresenham line drawing algorithm, DRAV
instruction that allows the implementation of more complex primitive generation and PIXT for
transferring pixels inside the memory. It can also accelerate features like text display, line clipping,
image clipping, and collision detection (with windows).

1.4 Graphics Accelerators Limitations


Whether it is for gaming applications, 3D modeling software or video editing, it is very
important to have the maximum graphics performance. Of course maximizing the performance of
a graphics card can be done, but it will be facing a lot of limitations due to the hardware that we
are using. In this section we will be developing the three main graphics performance limits.

Graphics Memory Bandwidth:


As the author explains in [21], the graphics Memory Bandwidth stands for the speed with which
the graphics accelerator can store its output in the memory. Companies like ATI have been trying
to create RAM technology more performing than the SDRAM and the RDRAM. Their newest
creation was the VRAM which was a total failure because it did not perform better than the other
RAM and it was very expensive.
Other companies had other strategies in mind. The Chromatic Company created the MPACT 2
architecture with the idea of dividing the RAM over multiple busses. In this way, any memory
request can be performed more rapidly than in previous architectures. The only down side of this
architecture is its high cost.

10

Texas Instruments, TMS-3410 Promo Video. , 2007 (publication) [Online]. Available:


http://www.youtube.com/watch?v=730tmDmzeDE. [Accessed Jan. 31 2014]
10

The second strategy is to divide the RAM in large sequential banks. It was created by the company
S3 under the name 3DFX in order to reach the same performance as of the Chromatics MPACT2.
So if we want to have a bandwidth memory exceeding the RAM limitations, we will have to use
one of these two strategies, which is what graphics accelerator companies are using nowadays. So
if we check the board of a portable pc for example, we can see that the RAM is divided into multiple
chips, which makes the board design even more complex.

Host, accelerator communication:


According to [21], in order for the host to communicate with the graphics accelerator, a command
queue is created to stack the operations by order. The graphics accelerator will then choose the
operation to execute by its position in the queue (FIFO).
The main problem would be if the queue is really small. In that case the host will be stalled in
order to wait for a free space in the queue and then send its operation to the accelerator. Nowadays,
queues have usually above 512 entries, which is enough compared to earlier graphics accelerator
with 64 entries queue.
When we are dealing with multiple access arbitration, the best solution is to add to our hardware,
an operation counter that is incremented every time an operation is done. So when comparing the
value of this counter to the size of the queue, we can know if there is any operation waiting in the
queue.

Monitor Refresh:
As we said earlier, the biggest problem in graphics accelerators is its memory bandwidth. And since
a simple update of a monitor can take a certain amount of this memory, we have to find a way to
make this process consume less memory space and avoid visible fading between cycles. In order to
do so, the author in [21] confirms that a lot of applications perform double buffering. To explain
double buffering, we are going to show the following Petri net with
is for reading from the
number buffer and
is for writing in the number buffer:

11

Figure 11: Double Buffering Petri Net

11

File:Double Buffering Petri Net.png, Wikipedia, the free encyclopedia. [Online] Available:
http://en.wikipedia.org/wiki/Multiple_buffering [Accessed Jan. 31 2014]

11

If we understand this Petri net, we can see that double buffering is a simple task to comprehend.
At first, the only operation available is writing in the first buffer. When we do so, two operations
become available: writing in the second buffer or reading from the first buffer. If we read from the
first buffer, writing in it becomes once again available. If we write in the second buffer, reading in
it becomes available. And the cycle continues like this.
This process is used, like we said earlier to avoid visible fading between cycles, or more precisely
flickering. So the question is how exactly double buffering will help us with this problem. The
answer is by using one buffer as the one displayed on the monitor, and the second one will be drawn
in the background. On the next cycle, the two buffers will change roles, as interpreted in [22].

12

Chapter 2
Major algorithms required by accelerators
2.1 Introduction
In this chapter, we will describe different methods and algorithms to use in order to draw a graphic
pixel on the monitor. These algorithms should be able to handle any kind of geometric figures,
starting from the simplest (lines, circles) to more complicated like polygons and ellipses. We will
focus only on 2D function.
In the following sections, we will explain four main types of algorithms which could help us complete
our project. These algorithms are for interfacing, rendering, primitive generation and image
processing.

2.2 Processor and memory interfaces


In this project, a lot of different entities should be able to communicate between each other. Our
objective is to design and organize different interfaces between all these elements while trying to
optimize the speed in which informations are transferred. We have to find algorithms in order to
establish interactions between the CPU, the GPU, the RAM memory, the SD card, the DMA
(Direct Memory Access), the VGA (Video Graphic Array) etc., as shown in [23].
Instead of connecting the LCD displayTech DT035TFT directly to the MCU MCBSTM32F400
board, we will connect it to the 2D GPU board (Nexus 3) via a parallel bus. In the reference manual
of this screen, it is specified that the driver is Novatek NT39016D with a display capacity of 16.7
million colors.
However, the Nexys 3 will be directly interfaced with the microcontroller. This will help us increase
the speed of the data transfer. The LCD screen and the graphic card will be connected via a specific
parallel bus and an analogic VGA port will be used for debugging. The document [24] shows that
the protocol must handle the 5 signals: blue, red, green, horizontal, and vertical.
In order to establish a connection between the CPU and the GPU, different interfaces are possible:
OpenGL (open graphic library): we can see in [25] that OpenGL is developed by SGI (Silicon
Graphics), an API created for the graphic development of 2D and 3D applications. It
contains different image processing methods for this type of applications.
CUDA: NVIDIA created this architecture in order to process data in parallel with other
data being processed by the CPU. We can see more details in [26].
These interfaces will help us generate our own interface to connect these two entities. We will have
to use much simpler algorithms for our project. We will also have to choose between parallel or
series interfaces.

13

There are a lot of examples regarding types of bus. For example there is the SPI (Serial Peripheral
Interface) bus by Motorola: the master queries the slave every x seconds. In this communication
channel, the transported data can circulate in both ways (full-duplex). The microcontroller has 3
SPIs, SPI1 reaching 42Mbits/s of speed. It can be controlled by the DMA.
We also have the I2C by Philips which will be a little slow for our project. In a matter of fact, this
bus has a maximum speed of 400 kbits/s.
Regarding the parallel bus, they are a lot faster but more difficult to work with. In this area we
have the ISA bus (Industry Standard Architecture), or the PCI bus (Peripheral Component
Interconnect) to connect other cards to the main one.
We will have to compare the maximum speed of all the bus in order to find the one the suits us
the better.
The DMA can work in different modes, as explained in [27]:
Burst mode: a transfer without interruption when the DMA take the control over the bus.
Single-cycle: a transfer limited in size that requires a new bus request in order to continue
to send data.
The SD card will be connected to the microcontroller via a bus dedicated to this type of cards: the
SDIO Bus (SD Input/Output).
You can see the data sheets of the LCD and the microcontroller in respectively in [28] and [29].

2.3 Raster Graphics Management


In this section, we will talk about the rendering procedure, which is the step where all of our
different layers and sprites are brought together, to prepare the final display. This operation is
carried out in what we call a frame buffer. How can we display a player on a given background
while making sure that the background does not suppress the player sprite? Priority management:
who has to be displayed first? How should we organize the video memory? How can we overlay
two images by managing transparency or other criteria?
During our research, we found in [30] some algorithms that were interesting for our project: blending
and clipping.
Blending:
The expression blending is used on algorithms that bring different layers of image and sprites
together in order to render the final display. More precisely, these algorithms allow us to directly
change a bitmap image, using another bitmap image, the bitmap being an image format.
Different layers will build the background of the display, anything that isnt in interaction with
the player, and that would change quite slowly over time (scrolling).
Sprites on the other hand are anything that needs to be quickly animated, interact with its
environment. For instance, when you move your video game character it should not teleport from
one point to another, it should move smoothly, maybe even with some animation. This kind of
14

constraint is satisfied by quickly changing sprites with decomposed images of the movement, in
order to give a smooth movement illusion.

To be able to satisfy all these requirements, we found in [13] two major algorithms that we
will study and discuss their doability in our system: bit blit (BITBLT) and alpha blending.

Bit Blit
Lets start with the BITBLT, we found in [30] that it will allow us to overlay an image over another
one which is already being displayed, at a given coordinate. It is indeed a simple data transfer from
one point of the memory to another, but there are different ways to combine the source and the
destination, depending on the wished effect.
Before getting into details, here is a quick reminder of how 2D pixels are stored in the memory: in
the RGB color system, three bytes are used to store color-related information. Each byte is a
percentage of colors red, green and blue for a pixel, giving us a large palette of possible colors.
Lets consider an example where we have to display Mario on a gray background. In order to carry
out this operation, we require two bitmaps in the memory: an image of Mario on a black background
and a mask for this image.
This mask is another image at the same dimensions with the one with Mario, but with the property
of having black pixels on pixels where Mario is located, and white pixels everywhere else.

Figure 12: The sprite of Mario (from Nintendo)

We start by applying a logical AND operation between the pixel values of our mask and the
destination in the video memory. A white pixel would have the code (255, 255, 255), it would not
affect at all the configuration of the original pixels, but the black pixels in the mask have the code
(0, 0, 0) which would replace the destination pixels by black pixels forming the shape of Mario.

Figure 13: First step of blit

All that is needed now is to take the original image of Mario and carry out the logical OR operation
at the same destination address; Mario will appear at the destination without its original
background color.
15

Its me Mario ! Here we go !

Figure 14: The final step of bit Blit

We have only mentioned a few logical operations but all of them are available for rendering different
effects on the result.

12

Figure 15: Different Bit Blit operations

Alpha blending
Let us move on to the second algorithm of this sub-section: the alpha blending. According to S.S.
Pedersen [31] this algorithm allows us to get the same results as a BITBLT but it can also manage
12

Texas instrument, TMS34010


User's Guide,page 161, 1986. [Online]. Available: http://www.transputer.net/mtw/rg750/doc/tms34010/t34010ug.pdf. [Accessed 29/01/2014].
16

transparency of the visible pixels. This transparency, also called, the alpha channel is coded in a
byte while coding a pixel in the memory (so it requires 4 bytes per pixel now). A pixel is completely
visible of this values is at 100%, and it is completely transparent if is at 0%.
The main objective of this algorithm is not get a more or less transparent image but is to be able
to manage interactions between two overlaid images using this parameter. We will indeed see that
the alpha channel allows us to have quite a number of effects.
We will call source the image we want to copy over another image that similarly will be called
destination.
During the process of bringing two images together, each resulting pixel will not be affected in the
same way. Let us take a pixel from the source image at the destination position (x,y), if the pixel
that used to occupy this location was completely transparent, the result would be a simple copy of
the original pixel, we denote this kind of results to be in the Source zone (the opposite case is
called Destination zone). But if we used to have a completely visible pixel at that location, then
the result would be a mix of both, this is the Both zone. The last kind of zones we can get is when
two completely transparent pixels are brought together: Neither zone.

13

Figure 16: Different regions of Alpha-Blending

The main interest of this technique lies within the huge number of results we can get at the both
zones, we can do a lot more than simple overlays. We could also choose to display only source or
destination zone pixels in order to get even more possible outputs.

13

Diagram [Online]. Available on: http://ssp.impulsetrain.com/porterduff/diagram.png


[accessed 29/01/2014]

17

Here is a summary of all these effects:

14

Figure 17: Summary of different effects of Alpha Blending

14

colordodge-table [Online]. Available on : http://ssp.impulsetrain.com/porterduff/colordodgetable.png [accessed 29/01/2014]


18

This table works in the following way: the tag at left is the name of the blending algorithm that
will be applied to the both zone: for instance for zero, we chose to make all pixels of the both
zone fully transparent, for source we chose to put all pixels of the source on that zone.
Tags above the table are associated to the zones that we decided to keep for the final image. For
the neither tag, we only keep both zones pixels, other pixels being set to full transparency. For
the source tag, only pixels from the source and both zones are kept, similarly for destination
tag only pixels from destination and both will remain. And for the both tag all pixels will
remain.
The logical operations that appears on some images are actually the operations that we will use
between these images to get the desired effect.
This allows us to have effects more interesting than BITBLT but requires a little bit more
calculations and memory than that one.

Clipping
This technique allows removing a part of an image to display only a smaller portion of the original
image. This allows to save resources by displaying only whatever is needed.
We possess multiple algorithms that can carry out different kinds of clippings, we will study some
of them in detail.

1) The Cohen-Sutherland algorithm


Department of Computer Science at the University of Helsinki [32] explains that this algorithm is
used to check quickly whether a line that we would like to draw fits inside a rectangular window,
if it is not the case, the algorithm clips the line to fit it inside the window. For this algorithm to
work, we only need to give information about extreme points of the line and those of the selected
window.
The figure below shows different zones that associated with some 4-bit error codes. In the middle
we have the zone where we would like to restrain the like (error code = 0b000), for the other zones,
the error codes are chosen in a way that zones on the same line or column have a common bit at
1.

Figure 18: Error Code Representation

19

) and (
Lets assume we have a rectangular zone defined by (
,
have two points P1(x1, y1) and P2(x2,y2), for drawing the line P1P2.

), and that we

The algorithm tests very quickly the two coordinates for P1 and P2 and returns the associated
error code (see above) for each one of them. Then we have the following possibilities:
the logical OR of the error codes return 0000 : line is inside the windows, no clipping is
needed,
the logical AND of the error codes return a none zero value: line is completely outside the
window, it can be rejected,
Otherwise at least one of the extremities of the line is outside the window and the
intersection of the line with a window frame has to be calculated, and then the process
can be repeated. For instance if P1 return the error code 0010, it is below the window and
the original line coordinates must be used to calculate its intersection point with y=ymin.

2) Sutherland-Hodgman's polygon-clipping algorithm


H.Yangk [33] and C.ERICSON [34] explain that this algorithm will clip an image following the
outline of a polygon. This requires, to use corners (vertices) of the polygon as a parameter named
vertex_clip and to use vertices of the image to be clipped as another argument named vertex_input.
(The order of these corners is the order in which the figures will be drawn, in other words if we
connect these points in this order, we should end up drawing the original figure.)
The result of this clipping will be another vertex array called vertex_output which can have a
bigger size than the original inputs.
The algorithms basis is to test the relative position of each pair of corners, two by two, for every
single different edges of the clipping zone, in order to find which corners lay inside the zone, and
which ones need to be replaced.
When an edge of the clipping zone is selected, for each pair there are four possible outcomes:
Let P and S be two consecutive vertices of the image to be clipped,
- case 1: P and S are inside the zone, then S is added to vertex_output (we only add S to avoid
doubles, P will end up in the array if it is not already in it),
Inside

Outside

20

- case 2: P is inside the zone but S is outside, then add the intersection of the segment [PS] in to
the output array.
Inside

Outside

- case 3: P and S are both outside the zone, no action is taken,


Inside

Outside
P

- case 4: P is outside the zone while S remains inside. The intersection of the segment [PS] and S
will be added in to the output array.
Inside

Outside

21

On the next iteration S becomes P, and S is the next vertex in the vertex_input. Once this
operation is done for all consecutive pairs in vertex_input, this algorithm restarts with this
vertex_output becoming the input.
Lets give an example to clarify this algorithm: we wish to clip the following image, drawn in gray,
with the red polygon:
B

1
D

A
C

Lets take the edge n1, and we will start testing with the vertices A and B (first ones in
vertex_input), we extend this edge infinitely and we check for the four possible outcomes.

1
E

A
C
A is obviously outside this zone and B remains inside, therefore this is case n 4 and points I1 and
B are added to vertex_output. We restart with vertices B and C, they are both inside the zone,
and therefore C is added to the output. This is also the case for vertices C and D and for D and E.
For E and A we find ourselves in the case n2 and we add I2 to the output list.

This output list becomes the new input and we move on the the next edge until we have executed
the algorithm on each one of them. The final output array is the vertices of the clipped image.

22

2.4 Primitive Generation Algorithms


In this section we will describe some algorithms for drawing geometric figures and writing texts.
We will start by the easiest ones like the algorithm for circles and lines and then we will describe
more complicated algorithms like form filling and text writing.
Bresenhams Line Algorithm:
To draw an image, computers tend to use tiny dots called pixel. So by filling black pixels on a
white board, the image can be displayed on the monitor. To draw a line, we should know which
pixel to draw on the board. The Bresenhams Line algorithm explains in [36] how to find the pixel
to highlight in order to draw the line.
Lets see for example the following grid:

15

Figure 19: Pixel Grid

The grid represents the computer monitor in which the pixels are drawn to display an image. So if
we want to draw a line between the pixel A and the pixel B, we should follow the Bresenhams line

15

Joshua Scott, Drawing Lines with Pixels, March 2012

23

algorithm. Before starting the algorithm, the author in [37] shows that three values must be
calculated (A, B and P).
The first thing to do is to find the distance between A and B according to the horizontal
scale (X variation) and according to the vertical scale (Y variation). In this case, the Y
variation is 5 and the X variation is 13.
Then we find the value of A and B by using the variations we found earlier. So A will be
equal to two times the Y variation, and B is equal to A minus two times the X variation.
In this case A = 2*5 = 10 and B = 10 - 2*13 = -16.
And finally P is equal to A minus the X variation which is in this case -3.

And then all we do is follow the algorithm. So we start from the first pixel (A). If P is negative,
we draw a pixel on the next column same line. And we add A to P. If P is positive, we draw a
pixel on the next column one line higher. And we add B to P. We continue with this process until
we reach the last pixel (B).
So by following this simple algorithm, we draw the following line between A and B:

Figure 20: Pixels forming a line16

We cannot apply this algorithm if the line we are trying to draw has a slope that is greater than 1
or less than 0. In order to do so, we should add a couple of rules to the algorithm, as explained in
[37]:

16

If the slope of the line is less than 0 and P is greater or equal to 0, than we draw the next
pixel on the following column below the previous pixel.

Joshua Scott, Drawing Lines with Pixels, March 2012

24

In the first example, we did the calculation for A, B and P when the X variation is lesser
than the Y variation. If this is not the case, we should use X when we would have used Y
and vice versa to calculate A, B and P.

Midpoint Circle Algorithm:


As for line drawing, circles have their own algorithm which is a little bit similar to Bresenhams
line Algorithm. It is the Midpoint Circle Algorithm, explained in [37].
If we take for example the following grid:

17

Figure 21: Two Pixels of a circle forming the radius

To draw a circle with CR being the radius, we should start by calculating three values, as detailed
in [37]:

X which is equal to the radius, in this case X = 7.


E which is equal to -X, so in this case E = -7.
And then we should set down the variable Y to 0.

And then we start following the Midpoint Circle Algorithm:


As long as the value of Y is lesser than X, we do the following steps:
We draw the pixel which has the following coordinate (X + X coordinate of C , Y + Y
coordinate of C). In this case we start by drawing a pixel in the box with the coordinate
(7+9, 0+9) = (16,9).
In the second step, we add 2 times Y plus 1 to E.
Then we increase Y by 1.
When E becomes greater or equal to 0, we subtract 2 times X minus 1 from E, and then
we subtract 1 from X.
17

Joshua Scott, Drawing Lines with Pixels, March 2012

25

We said earlier that we follow this algorithm when Y is lesser than X. So when Y becomes greater
than X, of the circle will be completed, as shown on the figure below.

18

Figure 22: 1/8 of a circle in pixels

The next step now will help us draw half of the circle. In [37], the author explains that to do so, all
we have to do is to reflect the pixels that we already have along a vertical axis passing through C,
than we reflect the pixel along an horizontal axis passing through C. And now we have half of the
circle, more precisely, we have the following parts of the circle displayed in the figure 17: (y,x),(y,x),(-y,x),(-y,-x).

19

Figure 23: Circle divided in 8

To draw the whole circle, the author in [37] confirms that the final step consists of reflecting the
pixels along two diagonal axes with a slope of 1 and -1 and both of them passing through C. And
now we have our circle.

18

A fast circle algorithm for ZX SpectrumI, bn Cereijos Blog. Available:


http://www.eng.utah.edu/~cs3710/
19

rockysatyam, Kumar Satyams Blog: Midpoint Circle Algorithm. Available:


http://www.asksatyam.com/2011/01/midpoint-circle-algorithm.html

26

Form Filling:
Form filling algorithms help us to color an area limited by a number of lines. The main form filling
algorithm is the Scan line Fill Algorithm. In [38] the authors detail this algorithm, and it is actually
not that complicated.

20

Figure 24: Scan line intersections with a polygon

The first step is to find the minimum and the maximum point of the polygon along the
vertical axis (ymin and ymax).
The next step is to draw the scan line, which is a horizontal line that starts from the
bottom of the polygon (ymin) and then moves up to ymax which is his last point.
At each position, we take the x coordinate of each intersection of the scan line with an
edge of the polygon and we sort them by increasing x.
And finally we fill between pairs of intersections without taking in consideration two
times the same intersection.

However, there are some special situations developed by the authors in the [38] that need some
modification of the algorithm. Two of these situations are shown on the drawings below:

21

Figure 25: Situation One

22

Figure 26: Situation Two

20

K.L Ma, ECS175: introduction to computer graphics, UCDAVIS


Available: http://www.cs.ucdavis.edu/~ma/ECS175_S00/Notes/0411_b.pdf
21
K.L Ma, ECS175: introduction to computer graphics, UCDAVIS
Available: http://www.cs.ucdavis.edu/~ma/ECS175_S00/Notes/0411_b.pdf
22
K.L Ma, ECS175: introduction to computer graphics, UCDAVIS

27

In these two situations we have the scan line intersecting with the endpoint of 2 edges. In the figure
on the left, we have 3 intersection points on the scan line. But if we apply the previous algorithm,
we fill between p0 and p1 but since we cannot use the same intersection twice, we will not fill
between p1 and p2. Thats way they added a rule to the algorithm that if we intersect an endpoint
of two edges, we use this intersection twice, so we fill between p0 and p1 then between p1 and p2.
In the figure on the right, we have a similar situation where the scan line intersects the endpoint
of two edges. But this time if we apply the rule that we added earlier, we will fill between p0 and
p1, then between p1 and p2 which is wrong considering that the line between p1 and p2 is outside
the polygon. And this is why they updated the previously added rule, so we can handle these two
types of situations:
If the endpoint of an edge is a local minimum, we consider twice the intersection.
If not we dont take in consideration twice the intersection, as explained in [38] and [39].

We should note that any intersection between the scan line and any horizontal line should not be
taken into consideration.
Form Filling:
There are different algorithms in order to generate texts on a monitor. The most simple consists of
storing different characters in a RAM, and different fonts in a ROM, as explained in [40]. This is
called character generation which takes the ASCII code of a character from the memory and
generates a matrix of pixels corresponding to the character. Of course depending on the font, this
matrix could have a certain complexity that a certain unit cannot display, depending on the memory
bandwidth. In the figure below you can find an example of a character generator table.

Figure 27: Character generation table23

Available: http://www.cs.ucdavis.edu/~ma/ECS175_S00/Notes/0411_b.pdf
23

M. Waite, Computer graphics primer. Indianapolis, Ind.: H.W. Sams, 1979.


28

This the most common technique used in computers nowadays. There is a second strategy to writing
texts explained in [13]. In this strategy, every character is represented by a polygon. Every polygon
has its own shape and position. So by changing the parameters which describe the character, we
can draw any kind of letter and we can change its position so we can add different characters next
to each other so we can form words. This method is not very used because it takes a lot of space
in memory.

2.5 Image Processing Algorithms


2.5.1

Elementary geometric transformations

Accelerating 2D geometric transformations can be very useful for rendering real-time animation on
a video game, without needing to store different frames of the animation and blit them on the
screen.

Figure 28: Basic two-dimensional transformations24

We can see in [36] and in [41] that these transformations can be easily expressed mathematically
using matrix sums and multiplications. Let ( , ) be a point on which we will apply a
transformation and ( , ) is its image after the operation. Here is a list the most common 2D
transformations with the associated matrix operation:

Translation:
If we wish to make a translation using a vector
,
we have = + and = + ,
therefore we have
= + .
Scaling and Symmetries
For scaling a point from the origin, by a factor of on the axis, and a factor of on the axis
0
we have = . , where = 0
.
We can have some special cases of scaling that results in symmetries for certain values of and :

= 1,
= 1 gives us a reflection about the axis,

= 1,
= 1 results in a reflection about the axis,

= 1,
= 1 carries out a symmetry about the origin.
24

D. Salomon, the Computer Graphics Manual. Springer, 2011, page 204


29

Rotation
A rotation around the origin by an angle of

So

with

will yield:

= cos( ) sin( )
= sin( ) + cos( )
cos( ) sin( )
=
.
sin( ) cos( )

Sine and cosine values can be problematic to accelerate, the optimal method for the speed of
execution is to make a memory map of important values of these functions to accelerate their
values.
Another interesting method from [36] is to use parametric functions to generate cosine and sine
values: (1 )/(1 + ) for cosine and 2 /(1 + ) for sine, it would optimize the memory but
there would be still some need to calculate using .

Shearing (Shear Transformations)


This transformation allows creating fake 3D by changing slants of objects but it distorts the images
during the process. It appears from [42] that shearing can be also used to make accelerated rotations.
There are two possibilities in 2D: change slant in x axis or change slant in y axis. We have the
following transformation matrixes:
1
1 0
=
, and
=
0 1
1

2.5.2

Homogeneous coordinates and composition

We would like to be able to mix these transformations to be able to do more complex changes on
the image. Almost all of the transformations can be done with matrix multiplications, except for
the translation. In order to have the same mathematical operation for all transformation, we need
to increase the matrix dimension and introduce the concept of homogeneous coordinates as seen
in [43].
We make a change from ( , ) coordinates to ( , , ) where = / and = / . This
coordinate system is usually normalised with W=1 and it allows writing each of the basic
transformations as a matrix product.
Thanks to this tool, if wish to apply three successive transformations ,
and to a point , we
( . ) = ( . . ) . This allows us to replace three successive transformations
have: =
with a single transformation of matrix = . . , and gaining a very important amount of time
on image processing.

2.5.3

Anti-aliasing

Aliasing is a very common problem that we come upon in every signal processing domains (audio,
control, image), it is basically caused by high frequency (equivalent to high detail in images)
being sampled at a lower frequency (lower resolution) resulting in low quality or even misleading
information (a result of Shannons sampling theorem).

30

25

Figure 29 : Three majors consequences of aliasing

Anti-aliasing is used to remove these effects and smooth contours to give better results during
rendering. It can be seen in [44, 45], there are two major anti-aliasing techniques:
1) Increasing resolution
Increasing the resolution would increase the sampling frequency and naturally get rid of aliasing
problems, but this solution is obviously unusable by us because of our fixed constraint on display
resolution.
2) Filtering
This method can be used when a high resolution image is being downsampled to our resolution or
it can be intrinsically used during primitive generation.
If a high resolution image, for instance at 2 x2 is being downsampled to x , then by taking 2x2
pixel areas and making an average of the pixels colors in this area would give the color of the
resulting image.

Figure 30: Improvements brought by prefiltering on characters

A prefiltering method can be used on image generation by generating primitives in shades of gray
instead of black & white. For instance, Xiaolin Wus line generation algorithm [46] is the adaptation
of Bresenhems line drawing algorithm with antialiasing where generated pixels are not either black
or white, but they are in shades of gray depending on their relative position to the real line. Even
though this algorithm is currently the fastest among anti-aliased line generation algorithms, it is
still slower than Bresenhems algorithm.

25

Dr. K. INKPEN, Aliasing, Simon Fraser University [Online] Available:


http://www.cs.sfu.ca/CourseCentral/361/inkpen/Notes/361_lecture8.pdf [Accessed 31-janv-2014].
31

Figure 31: Anti-aliased line generation

Anti-aliased lines can lead to anti-aliased polygons and to even smoother texts.
Filtering can be also used while overlaying multiple layers to smooth edges, in this case an averaging
filter can be used in conjunction with alpha blending to give better results.

32

Chapter 3
Our upcoming design objectives
3.1 Specifications of our hardware
We are going to develop in this section the specification of the graphic card that we are going to
use through this project; which is the Nexys3 Board [47].
The Nexys 3 is a hardware created to be used by the client directly after manufacturing. It is based
on the Xilinx Spartan-6 LX16 FPGA, which is way more powerful than its predecessor the Nexys
2, in comparison with its capacity, performance and the resources.
Concerning our project, we are going to focus on the necessary features for the development of the
graphic accelerator.
1) Memory:
The Nexys 3 board is equipped with 3 external memories. The first one is 16 Mbyte RAM containing
a 16-bit bus. It includes two main operating systems. One being asynchronous and operating under
70ns of read and write cycles. The other one being a synchronous memory holding an 80 MHz bus.
The second memory is a 16 Mbyte parallel PCM device (Phase Change Memory). It operates under
150ns of read cycle times. This memory contains a 64-byte write buffer that operates under 50ns
cycle times. The transfer time of the write buffer to the flash array is around 120us. It is important
to state that this memory is a nonvolatile one, which means that it can run even when the power
is off. The third memory is a 16 byte serial PCM device, with a bus speed of approximately 50MHz.
Before even starting to use these memories, the PCM devices have only 97% of memory available
for the programmer to use. In a matter of fact, the 3% of the memory is used by the configuration
file built in to help the FPGA in determining which device to use when reading a configuration.
2) VGA port
The Video Graphics Array (VGA) of this card is an HD-DB 15 connector. The 8-bit standard
synchronization signals Horizontal (HS: pin 13) and Vertical (VS: pin 14) and the 8-bit color port
are created thanks to 10 signals (eight levels for those that match the green and red colors, and
four for the blue).
By creating a circuit capable of managing the VGA port signals, 256 colors can be displayed for
one pattern.
3) SPI bus
The 16Mbytes quad-mode Serial Peripheral Interface bus is used by the serial PCM device. The
advantage of this bus is that information can go both ways in the communication channel (Full
Duplex).
33

The Nexys 3 board is an intermediate between the microcontroller that we are going to use and the
LCD screen 320*240, 16 bits. The microcontroller will be giving only high level commands.
4) LCD Display
We are going to use the DT035TFT [28] screen. There is the TFT display which has a 240x320
RGB resolution.

3.2 Algorithms and architecture that we will retain


In conclusion, we defined some priorities over what is the first thing that we have to do.
First of all, we saw that it is very important to choose the best architecture for our project. In
order to do so, we have to choose the right interface controller. In a matter of fact, these are the
most important elements in our system. They help us to execute the functions afterwards in order
to verify their performance.
These algorithms (functions) will be implemented in relation to their complexity. We will start by
the easiest ones and then we will finish with the complex ones. Thats why we will start with the
rendering algorithms, then the primitive generation algorithms, and finally we will implement the
geometrical transformation.

34

Bibliography
[1]
"Digital
history:
Time
line",
[Online].
computers.com/history/timeline.asp [Accessed 29/01/2014].

Available:

http://www.old-

[2]
Forster, Winnie (2005). The Encyclopedia of Game Machines - Consoles, handheld & home
computers 19722005. Gameplan. ISBN 3-00-015359-4.
[3]
G. Singer "The History of the Modern Graphics Processor", March 2013. [Online].
Available: http://www.techspot.com/article/650-history-of-the-gpu/ [Accessed 29/01/2014].
[4]
List of home computers by video hardware, Wikipedia, the free encyclopedia. 26-janv2014. [Online].
Available: http://en.wikipedia.org/wiki/List_of_home_computers_by_video_hardware
[Accessed 29/01/2014].
[5]
RCS CMOS LSI Products. CMOS Color Generator Controller. CDP1862C datasheet.
[Online].
Available:
http://pdf.datasheetarchive.com/indexerfiles/Scans-054/DSAIH000100740.pdf
[Accessed 29/01/2014].
[6]
Motorola semiconductors. TYPICAL CRT CONTROLLER APPLICATION [Online].
Available:
http://pdf1.alldatasheet.fr/datasheet-pdf/view/4159/MOTOROLA/MC6845.html
[Accessed 29/01/2014].
[7]
MOTOROLA Semiconductors. MC6847/MC6847 VIDEO DISPLAY GENERATOR
(VDG), MC6847 datasheet. [Online].
Available:
https://instruct1.cit.cornell.edu/courses/ee476/ideas/mc6847.pdf.
[Accessed
29/01/2014].
[8]

IEEE Spectrum. Design case history: the Commodore 64. March 1985. [Online]. Available:
http://spectrum.ieee.org/ns/pdfs/commodore64_mar1985.pdf[Accessed 29/01/2014].

[9]
Atari Incorporated, ANTIC (NTSC), ANTIC datasheet, Rev. D, Oct. 1982. [Online]
Available: http://www.retromicro.com/files/atari/8bit/antic.pdf [Accessed Jan. 31 2014]
[10]
Texas Instruments, 9900 Video Display Processors, TMS9918A datasheet, Houston, Nov.
1982. [Online] Available: http://www.cs.columbia.edu/~sedwards/papers/TMS9918.pdf [Accessed
Jan. 31 2014]
[11]
P. DISKIN, Nintendo Entertainment System Documentation, Aug. 2004. [Online]
Available: http://nesdev.com/NESDoc.pdf [Accessed Jan. 31 2014]

[12]
Q. PERRET, S. THIEBAUT and K. LE SAYEC, Graphic coprocessor on FPGA, INSA
Toulouse, Project Report, June 2012.
35

[13]
K. S. AY, A. DOGAN, Hardware/Software Co-Design of a 2D Graphics System on FPGA,
Dept. of Electrical and Electronics Engineering, Anadolu University, Eskisehir, Turkey, March
2013. [Online]
Available: http://airccse.org/journal/ijesa/papers/3113ijesa02.pdf [Accessed Jan. 31 2014]
[14]
T. HIROO UEDA and I. HIROMITSU YAGI, T.V. Game System Having Reduced
Memory Needs, U. S. Patent 4,824,106, 25 Apr. 1989. [Online] Available:
http://www.freepatentsonline.com/4824106.pdf [Accessed Jan. 31 2014]
[15]
PPU,
NESdev
Wiki,
May
2013
(last
update),
http://wiki.nesdev.com/w/index.php/PPU [Accessed Jan. 31 2014]

[Online]

Available:

[16]
Dr. Floppy, The NES Picture Processing Unit (PPU), May, 2011. [Online]. Available:
http://badderhacksnet.ipage.com/badderhacks/index.php?view=article&id=270:the-nes-pictureprocessing-unit-ppu [Accessed Jan. 31 2014]
[17]
ANTIC,
Wikipedia,
Jan.
2014
(last
update),
http://en.wikipedia.org/wiki/ANTIC [Accessed Jan. 31 2014]

[Online]

Available:

[18]
Texas Instruments, Graphics System Processors, TMS34010 datasheet, Houston, Jun.
1986 (Revised June 1991). [Online] Available: http://www.ti.com/lit/ds/symlink/tms34010.pdf
[Accessed Jan. 31 2014]
[19]
Texas Instruments, Graphics Processors, TMS34020 datasheet, Houston, Mar. 1990
(Revised Nov. 1993). [Online] Available: http://www.ti.com/lit/ds/symlink/tms34020a.pdf
[Accessed Jan. 31 2014]
[20]
Texas Instruments, TMS-3410 Promo Video. , 2007 (publication) [Online]. Available:
http://www.youtube.com/watch?v=730tmDmzeDE. [Accessed Jan. 31 2014]
[21]
P. HSIEH, Graphics Accelerators - What are they? March, 2001. [Online].
Available: http://www.azillionmonkeys.com/qed/accelerator.html [Accessed Jan. 14, 2014].
[22]
Multiple buffering, Wikipedia, the free encyclopedia. 19-Jan-2014.
http://en.wikipedia.org/wiki/Multiple_buffering [Accessed Jan. 14, 2014].

Available:

[23]
Carte graphique, CommentCaMarche. [Online].
Available: http://www.commentcamarche.net/contents/731-carte-graphique
[Accessed: 31-Jan-2014].
[24]
E. BRUNVAND, Home Page-Instructor Erik Brunvand, 2010,
Available: http://www.eng.utah.edu/~cs3710/ [Accessed: 31-Jan-2014].

[25]
Silicon Graphics,"OpenGL",Silicon Graphics International Corp,2009-2014. [Online].
Available: http://www.sgi.com [Accessed: 31-Jan-2014].
36

[26]
NVIDIA,"CUDA",NVIDIA Corporation,2014. [Online]. Available: http://www.nvidia.fr
[Accessed: 31-Jan-2014].
[27]
Stuart Ball, Introduction to direct memory access, October 14 2003. [online],
Available: http://www.embedded.com/electronics-blogs/beginner-s-corner/4024879/Introductionto-direct-memory-access [Accessed: 31-Jan-2014].
[28]
Displaytech LTD,LCD Module Product Specification,DT035TFT 3.5'' TFT Display
Module (320RGBx240DOTS), 10 Juin 2011
[29]

"STMicroelectronics", "STM32405xx,STM32407xx",15 septembre 2011, 4 juin 2013

[30]
"Bitblit Users Guide", 2012. [Online]. Available :
http://processors.wiki.ti.com/index.php/Bitblit_Users_Guide#Using_Bitblit
29/01/2014].

[accessed

[31]
S.S. PEDERSEN, Porter/Duff Compositing and Blend Modes ,17 mars 2013, [Online] ,
Available
on
http://ssp.impulsetrain.com/2013-0317_Porter_Duff_Compositing_and_Blend_Modes.html[Accessed 29/01/2014]
[32]
Department of Computer Science at the University of Helsinki, Cohen-Sutherland Line
Clipping, [Online], Available on
http://www.cs.helsinki.fi/group/goa/viewing/leikkaus/lineClip.html
[Accessed 29/01/2014]
[33] H.Yangk, The Polygon Clipping Algorithm, Polygon Clipping Background Theory, [Online],
Available : http://www.cs.rit.edu/~icss571/clipTrans/PolyClipBack.html [accessed 29/01/2014]
[34]
C.ERICSON, Real-Time Collision Detection (The Morgan Kaufmann Series in Interactive
3-D Technology) ,CRC Press, 2004
[36]

D. SALOMON, The Computer Graphics Manual. Springer, 2011.

[37]

J. SCOTT, Drawing Lines with Pixels, March 2012

[38]
K.L
Ma,
ECS175:
introduction
to
computer
graphics,
Available: http://www.cs.ucdavis.edu/~ma/ECS175_S00/Notes/0411_b.pdf
[39]

Avelraj, Lecture filling algorithms, 16:57:36 UTC

[40]

M. Waite, Computer graphics primer. Indianapolis, Ind.: H.W. Sams, 1979.

UCDAVIS.

[41]
N. Brown, Computer Graphics Lecture 2: Transformations. The University of Edinburgh,
School
of
Informatics.
[Online].
Available:
http://www.inf.ed.ac.uk/teaching/courses/cg/lectures/lect2cg2x6.pdf. [Accessed 31-janv-2014].

37

[42]
A.W Paeth, A fast algorithm for general raster rotation, Department of computer science
university of Waterloo. [Online].
Available http://www.cipprs.org/papers/VI/VI1986/pp077-081-Paeth-1986.pdf [Accessed 31-janv2014]
[43]
Dr. K. Inkpen, Geometrical Transformations, Simon Fraser University [Online] Available:
http://www.cs.sfu.ca/CourseCentral/361/inkpen/Notes/361_lecture9.pdf
[Accessed 31-janv2014].
[44] Dr. K. INKPEN, Aliasing, Simon Fraser University [Online] Available:
http://www.cs.sfu.ca/CourseCentral/361/inkpen/Notes/361_lecture8.pdf [Accessed 31-janv-2014].
[45]
S. JAMIN, EECS 487: Interactive Computer Graphics, University of Michigan, [Online].
Available:
http://web.eecs.umich.edu/~sugih/courses/eecs487/lectures/08-Anti-Aliasing+Compositing.pdf
[Accessed 31-janv-2014].
[46]
X. Wu, An Efficient Antialiasing Technique , in Proceedings of the 18th Annual
Conference on Computer Graphics and Interactive Techniques, New York, NY, USA, 1991, p. 143
152. [Online].
Available: http://www-users.mat.uni.torun.pl/~gruby/teaching/lgim/1_wu.pdf
[Accessed 31-janv-2014].
[47]
Xilinx,"Nexys3 Board Reference Manual", Xilinx Spartan-6 FPGA (XC6LX16-CS324),
April 3, 2013, [Online],
Available: http://www.digilentinc.com/Data/Products/NEXYS3/Nexys3_rm.pdf
[Accessed: 31-Jan-2014].

38

S-ar putea să vă placă și