Sunteți pe pagina 1din 78

Acknowledgement

We would like to thank our beloved parents for their endless kind support both mentally, financially and for encouraging us, without which we would not be what we are today. At the outset we sincerely thank Mr. Director, for his kind cooperation and

Encouragement for the successful completion of project work and providing the necessary facilities.

We are most obliged and grateful to our Principal, H.O.D ECE Dept, and internal guide, Associate Professor, ECE Dept, for giving us guidance in completing this project successfully.

We are grateful to , Project Guide, Hyderabad, for their sagacious guidance, scholarly advice and the inspiration offered in an amiable and pleasant manner in helping us

completing this project successfully. Last but no the least, we are thankful to our friends and well wishers.

Design and Implementation of A Lottery-based Bandwidth Guaranteed and Low Latency Arbiter for On-Chip Bus Abstract
In the paper, we propose the two-level Lottery-based bus arbitration algorithm, which is called RB_ Lottery arbitration algorithm, where R means real-time, and B means binary group logic for priority selections. The proposed bus arbitration solves the impartiality and starvation problems which exist in the previous Lottery method, and reduces the average latency of bus requests for real-time applications. The software simulation results show that the proposed RB_Lottery algorithm has better performance of bandwidth guarantees, and has less average latency of bus requests than the Lottery arbitration. The bus arbiter decides which master can be granted for bus accesses when the multiple masters issue bus requests at the same time in a system-on-chip. In the previous bus arbitration algorithms, the static fixed priority algorithm, and the time division multiplexing (TDM)/ Roundrobin algorithm , and the Lottery ticket bus algorithm exist some drawbacks in arbitrations, such as the bus starvation problem, and low system performance problem because of bus distribution latency during bus arbitration, and the large latency delay problem because of lower ratio assigned ticket number of the masters. Recently, the real-time issue has been considered for bus arbitrations [5]. In our paper, we propose the two-level static priority bus arbitration algorithm, the RB_Lottery algorithm, which is based on Lottery bus arbitration method. The proposed RB_ Lottery bus arbitration can handle the real-time requirements for all masters, and solve the traditional bus distribution problem, and guarantee the bandwidth requirement of each master, and then reduces the bus arbitration latency. In the first level bus arbitration, we use static priority real-time counters to satisfy the real-time requirements, which is named the static priority real-time handler. In the second level bus arbitration, we adopt a Lottery-based algorithm with binary group partition for avoiding starvation and reducing bus latency.

TABLE OF CONTENTS

INTRODUCTION
Xilinx introduced Field programmable gate arrays, or FPGAs, in 1985. Figure 1 is a conceptual model of an FPGA.

FPGA are constructed of three basic elements: logic blocks, I/O cells, and interconnection resources. A useful analogy for an FPGA is the layout of a city. The logic blocks correspond to city blocks that are occupied by different businesses receiving products from various suppliers within the city, just as the logic blocks receive data from other logic blocks within the FPGA, and processing those products for consumption by other firms or end users, just as logic block outputs are sent to other blocks and ultimately to the device utilizing the FPGA. FPGAs and our mythical city both utilize interconnections between blocks, wire segments for FPGAs and streets and telephone connections for the city, that can be flexibly designed to meet changing needs with routers in both cases and stoplights in

one. The final elements in the model are the mechanisms for interaction with the outside world; I/O cells to the FPGA as airports, freeways, and long distance telephone lines are to the city. The rest of this report will explore in greater detail implementations of this basic three-element model. Configurable Logic Blocks: The heart of the FPGA lies in the CLBs. CLBs appear in rows and columns within all FPGAs and implement the logic functions desired by the programmer. Most CLBs accomplish this with a lookup table2 . Lookup tables (LUTs) are digital memory arrays that contain truth tables for any logic function that can be implemented by the given number of logic inputs for a CLB. The output of the CLB is then the logical result of the function recorded in the lookup table. In order to program the CLBs, truth tables be loaded into the LUTs of each CLB. Refer to page 3 for an example of the CLB architecture for a Xilinx XC5200 chip.2

I/O Blocks: I/O blocks provide for interaction with the outside world. An I/O pin can be used for input or output.4 I/O blocks can contain logic functionality, although high logic utilization decreases pin placement flexibility, as I/O blocks utilized in logic cannot be reassigned mid-design.5

Interconnection (Routing) Architecture: The routing architecture usually covers 60-90% of FPGA chip area2 and fittingly will require the longest description of the three basic FPGA elements. The routing architecture of FPGAs is constructed of wires segmented into various lengths intersecting each other at routing switches3. The most popular programmable switch element (PSE) technology, static RAM, for implementing these routing switches is briefly discussed in the next section. Two types of routing architecture are common:

row based routing, where only horizontal channels are used to connect CLBs, and symmetrical routing, where vertical and horizontal channels are utilized, as in figure 1.Direct connection wires link neighboring CLBs across routing channels. Connections to distant blocks are implemented through programmable switch matrices4 (PSMs), which contain a set of PSEs that switch perpendicular wires. The wires routed through PSM are either single lines, which must pass through one PSE for each CLB bypassed, or double lines, which pass two CLBs for every switch. Long lines skip switching all together. The implementation of complex routing techniques is described later for the Xilinx XC5200.

Static RAM Programmable Switch The most common programmable element used for FPGA implementation is static RAM.FPGAs use permanent memory, usually PROM, to store the logic configuration of the chip. Upon power up, each RAM cell gets a value based upon the PROM configuration. When the cell is high, the transistor is conducting and current flows, when low, the transistor is cutoff and no current flows. Configurable Logic Blocks: The diagram on the left is of one of the four identical logic cells that constitute each CLB. The segment labeled F contains a lookup table for four inputs (F4-F1). The trapezoidal objects are 2:1 mutiplexers. The chip enable (CE), clock (CK) and clear (CLR) signals travel to this cell and all others in the architecture via global long lines. Each cell can be cleared individually or all can be cleared at once. Each logic cell can implement either a D flip-flop or a latch. When the clock transitions high, the D flip-flop (FD) passes the output of the programmed logic operation to the output (Q). From DI to DO, a feed-through path that does not change the logic of the input can be implemented. This is used in routing applications discussed later. In this case, two lookup tables (F) are used for input, each fed with the same four logic lines. The fifth input is used to toggle the 2:1 mux between the lookup tables, adding a fifth bit to the logic function. There are four lookup tables in each CLB, so four independent four-input logic functions

or two independent five input logic functions can be implemented in each block.

I/O Blocks: The I/O blocks of XC5200 are completely decoupled from the internal logic of the CLBs.5 The I/O blocks are attached to the internal logic through a ring of inter-connect cells which form a ring around the chip. The extra routing layer provides connection to nearby CLBs as well as far away CLBs through long lines. The XC5200 can be connected with TTL or CMOS logic.

Interconnection (Routing) Architecture: This chip has six levels of routing hierarchy: single length lines (1), double length lines (2), direct connects (3), long lines/global lines (4), local interconnection matrices (5), and logic cell feed through paths (6). The global routing matrix (GRM) contains the switch matrix architecture discussed earlier in this report. The GRM routs logic signals over the single, double and long lines, then communicates to the CLB via a 24-line interface to4 the LIM. These matrices connect far-away sections of the chip as well as link all CLBs to a global command structure. The remaining routing architecture for the XC5200 is contained within the Versa-Block units. These units are comprised of the CLBs, as well as local interconnection matrices. The local interconnection matrix (LIM) handles connections to neighboring CLBs through direct connect lines that bypass the GRMs. The LIM also handles logic cell feed through paths, which do not perform any calculations, but merely re-power a signal that has faded passing through the chip. This splitting of the routing resources between local and global areas simplifies router design, decreases the chip space necessary for routing, and decreases use of routing switches, which add resistance and capacitance to circuits. A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). Contemporary FPGAs have large resources of logic gates and RAM blocks to implement

complex digital computations. As FPGA designs employ very fast IOs and bidirectional data buses it becomes a challenge to verify correct timing of valid data within setup time and hold time. Floor planning enables resources allocation within FPGA to meet these time constraints. FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial re-configuration of a portion of the design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications.

FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together" somewhat like many (changeable) logic gates that can be inter-wired in (many) different configurations. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory.

Some FPGAs have analog features in addition to digital functions. The most common analog feature is programmable slew rate and drive strength on each output pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise ring unacceptably, and to set stronger, faster rates on heavily loaded pins on highspeed channels that would otherwise run too slow. Another relatively common analog feature is differential comparators on input pins designed to be connected to differential signaling channels. A few "mixed signal FPGAs" have integrated peripheral analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) with analog signal conditioning blocks allowing them to operate as a system-on-a-chip. Such devices blur the line between an FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric, and field-programmable analog array (FPAA), which carries analog values on its internal programmable interconnect fabric.

HISTORY

The FPGA industry sprouted from programmable read-only memory (PROM) and programmable logic devices (PLDs). PROMs and PLDs both had the option of being programmed in batches in a factory or in the field (field programmable), however programmable logic was hard-wired between logic gates. In the late 1980s the Naval Surface Warfare Department funded an experiment proposed by Steve Casselman to develop a computer that would implement 600,000 reprogrammable gates. Casselman was successful and a patent related to the system was issued in 1992. Some of the industrys foundational concepts and technologies for programmable logic arrays, gates, and logic blocks are founded in patents awarded to David W. Page and LuVerne R. Peterson in 1985. Xilinx co-founders Ross Freeman and Bernard Vonderschmitt invented the first commercially viable field programmable gate array in 1985 the XC2064.The XC2064 had programmable gates and programmable interconnects between gates, the beginnings of a new technology and market. The XC2064 boasted a mere 64 configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More than 20 years later, Freeman was entered into the National Inventors Hall of Fame for his invention. Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s, when competitors sprouted up, eroding significant market-share. By 1993, Actel was serving about 18 percent of the market. The 1990s were an explosive period of time for FPGAs, both in sophistication and the volume of production. In the early 1990s, FPGAs were primarily used in telecommunications and networking. By the end of the decade, FPGAs found their way into consumer, automotive, and industrial applications. Modern Developments

A recent trend has been to take the coarse-grained architectural approach a step further by combining the logic blocks and interconnects of traditional FPGAs with embedded microprocessors and related peripherals to form a complete "system on a programmable chip". This work mirrors the architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems Group which combined a reconfigurable CPU architecture on a single chip called the SB24. That work was done in 1982. Examples of such hybrid technologies can be found in the Xilinx Zynq-7000 All Programmable SoC, which includes a 1.0 GHz dualcore ARM Cortex-A9 MPCore processor embedded within the FPGA's logic fabric or in the Altera Arria V FPGA which includes a 800 Mhz dual-core ARM Cortex-A9 MPCore. The Atmel FPSLIC is another such device, which uses an AVR processor in combination with Atmel's programmable logic architecture. The Actel SmartFusiondevices incorporate an ARM Cortex-M3 hard processor core (with up to 512 kB of flash and 64 kB of RAM) and analog peripherals such as a multi-channel ADC and DACs to their flash-based FPGA fabric. In 2010, Xilinx Inc introduced the first All Programable System on a Chip branded Zynq-7000 that fused features of an ARM high-end microcontroller (hard-core implementations of a 32-bit processor, memory, and I/O) with an FPGA fabric to make FPGAs easier for embedded designers to use. By incorporating the ARM processor-based platform into a 28 nm FPGA family, the extensible processing platform enables system architects and embedded software developers to apply a combination of serial and parallel processing to address the challenges they face in designing today's embedded systems, which must meet ever-growing demands to perform highly complex functions. By allowing them to design in a familiar ARM environment, embedded designers benefit from multiple advantages including: decreased timeto-market, significantly reduced power, reduced BOM (bill of materials) cost, etc. These are among many advantages of an All Programmable FPGA platform compared to more traditional design cycles associated with ASICs. An alternate approach to using hard-macro processors is to make use of soft processor cores that are implemented within the FPGA logic. Nios II, MicroBlaze and Mico32 are examples of popular soft core processors.

As previously mentioned, many modern FPGAs have the ability to be reprogrammed at "run time," and this is leading to the idea of reconfigurable computing or reconfigurable systems CPUs that reconfigure themselves to suit the task at hand. Additionally, new, non-FPGA architectures are beginning to emerge. Softwareconfigurable microprocessors such as the Stretch S5000 adopt a hybrid approach by providing an array of processor cores and FPGA-like programmable cores on the same chip. Gates 1987: 9,000 gates, Xilinx 1992: 600,000, Naval Surface Warfare Department Early 2000s: Millions Market size 1985: First commercial FPGA : Xilinx XC2064 1987: $14 million ~1993: >$385 million 2005: $1.9 billion 2010 estimates: $2.75 billion FPGA design starts 2005: 80,000 2008: 90,000 FPGA comparisons Historically, FPGAs have been slower, less energy efficient and generally achieved less functionality than their fixed ASIC counterparts. An older study had shown that designs implemented on FPGAs need on average 40 times as much area, draw 12 times as much dynamic power, and are three times slower than the corresponding ASIC implementations; however, the times are changing. Today's FPGAs such as the Xilinx Virtex-7 or the Altera Stratix 5 rival ASIC and ASSP solutions providing significantly reduced power, increased speed, lower BOM cost, minimal implementation real-estate, and maximum on-the-fly configurability. Where

previously a design may have included 6 to 10 ASICs, today the same design can be achieved using only one FPGA.

A Xilinx Zynq-7000 All Programmable System on a Chip. Advantages include the ability to re-program in the field to fix bugs, and may include a shorter time to market and lower non-recurring engineering costs. Vendors can also take a middle road by developing their hardware on ordinary FPGAs, but manufacture their final version so it can no longer be modified after the design has been committed. Xilinx claims that several market and technology dynamics are changing the ASIC/FPGA paradigm: Integrated circuit costs are rising aggressively ASIC complexity has lengthened development time R&D resources and headcount are decreasing Revenue losses for slow time-to-market are increasing Financial constraints in a poor economy are driving low-cost technologies These trends make FPGAs a better alternative than ASICs for a larger number of highervolume applications than they have been historically used for, to which the company attributes the growing number of FPGA design starts .

Some FPGAs have the capability of partial re-configuration that lets one portion of the device be re-programmed while other portions continue running.

Complex Programmable Logic Devices (CPLD) The primary differences between CPLDs (Complex Programmable Logic Devices) and FPGAs are architectural. A CPLD has a somewhat restrictive structure consisting of one or more programmable sum-of-products logic arrays feeding a relatively small number of clocked registers. The result of this is less flexibility, with the advantage of more predictable timing delays and a higher logic-to-interconnect ratio. The FPGA architectures, on the other hand, are dominated by interconnect. This makes them far more flexible (in terms of the range of designs that are practical for implementation within them) but also far more complex to design for. In practice, the distinction between FPGAs and CPLDs is often one of size as FPGAs are usually much larger in terms of resources than CPLDs. Typically only FPGA's contain more complex embedded functions such as adders, multipliers, memory, and serdes. Another common distinction is that CPLDs contain embedded flash to store their configuration while FPGAs usually, but not always, require an external flash memory.

Security considerations With respect to security, FPGAs have both advantages and disadvantages as compared to ASICs or secure microprocessors. FPGAs' flexibility makes malicious modifications during fabrication a lower risk. Previously, for many FPGAs, the design bitstream is exposed while the FPGA loads it from external memory (typically on every power-on). All major FPGA vendors now offer a spectrum of security solutions to designers such as bitstream encryption and authentication. For example, Altera and Xilinx offer AES (up to 256 bit) encryption for bitstreams stored in an external flash memory. FPGAs that store their configuration internally in nonvolatile flash memory, such as Microsemi's ProAsic 3 or Lattice's XP2 programmable devices, do not expose the bitstream and do not need encryption. In addition, flash memory for LUT provides SEU protection for space applications.

Applications Applications of FPGAs include digital signal processing, software-defined radio, ASIC prototyping, medical imaging, computer vision, speech recognition, cryptography, bioinformatics, computer hardware emulation, radio astronomy, metal detection and a growing range of other areas. FPGAs originally began as competitors to CPLDs and competed in a similar space, that of glue logic for PCBs. As their size, capabilities, and speed increased, they began to take over larger and larger functions to the state where some are now marketed as full systems on chips (SoC). Particularly with the introduction of dedicated multipliers into FPGA architectures in the late 1990s, applications which had traditionally been the sole reserve of DSPs began to incorporate FPGAs instead. Traditionally, FPGAs have been reserved for specific vertical applications where the volume of production is small. For these low-volume applications, the premium that companies pay in hardware costs per unit for a programmable chip is more affordable than the development resources spent on creating an ASIC for a low-volume application. Today, new cost and performance dynamics have broadened the range of viable applications.

Common FPGA Applications: Aerospace and Defense Avionics/DO-254 MILCOM Missiles & Munitions Secure Solutions Space ASIC Prototyping Audio Connectivity Solutions Portable Electronics Radio

Digital Signal Processing (DSP) Automotive High Resolution Video Image Processing Vehicle Networking and Connectivity Automotive Infotainment Broadcast Real-Time Video Engine Edge QAM Encoders Displays Switches and Routers Consumer Electronics Digital Displays Digital Cameras Multi-function Printers Portable Electronics Set-top Boxes Data Center Servers Security Routers Switches Gateways Load Balancing High Performance Computing Servers Super Computers SIGINT Systems High-end RADARS

High-end Beam Forming Systems Data Mining Systems Industrial Industrial Imaging Industrial Networking Motor Control Medical Ultrasound CT Scanner MRI X-ray PET Surgical Systems Security Industrial Imaging Secure Solutions Image Processing Video & Image Processing High Resolution Video Video Over IP Gateway Digital Displays Industrial Imaging Wired Communications Optical Transport Networks Network Processing Connectivity Interfaces Wireless Communications Baseband Connectivity Interfaces Mobile Backhaul

Radio

Architecture The most common FPGA architecture consists of an array of logic blocks (called Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on vendor), I/O pads, and routing channels. Generally, all the routing channels have the same width (number of wires). Multiple I/O pads may fit into the height of one row or the width of one column in the array. An application circuit must be mapped into an FPGA with adequate resources. While the number of CLBs/LABs and I/Os required is easily determined from the design, the number of routing tracks needed may vary considerably even among designs with the same amount of logic. For example, a crossbar switch requires much more routing than a systolic array with the same gate count. Since unused routing tracks increase the cost (and decrease the performance) of the part without providing any benefit, FPGA manufacturers try to provide just enough tracks so that most designs that will fit in terms of Lookup tables (LUTs) and IOs can be routed. This is determined by estimates such as those derived from Rent's rule or by experiments with existing designs. In general, a logic block (CLB or LAB) consists of a few logical cells (called ALM, LE, Slice etc.). A typical cell consists of a 4-input LUT, a Full adder (FA) and a D-type flip-flop, as shown below. The LUTs are in this figure split into two 3-input LUTs. In normal mode those are combined into a 4-input LUT through the left mux. In arithmetic mode, their outputs are fed to the FA. The selection of mode is programmed into the middle multiplexer. The output can be either synchronous or asynchronous, depending on the programming of the mux to the right, in the figure example. In practice, entire or parts of the FA are put as functions into the LUTs in

order to save space.

Simplified example illustration of a logic cell

ALMs and Slices usually contains 2 or 4 structures similar to the example figure, with some shared signals. CLBs/LABs typically contains a few ALMs/LEs/Slices. In recent years, manufacturers have started moving to 6-input LUTs in their high performance parts, claiming increased performance.

Since clock signals (and often other high-fan-out signals) are normally routed via specialpurpose dedicated routing networks in commercial FPGAs, they and other signals are separately managed. For this example architecture, the locations of the FPGA logic block pins are shown below.

Logic Block Pin Locations Each input is accessible from one side of the logic block, while the output pin can connect to routing wires in both the channel to the right and the channel below the logic block. Each logic block output pin can connect to any of the wiring segments in the channels adjacent to it. Similarly, an I/O pad can connect to any one of the wiring segments in the channel adjacent to it. For example, an I/O pad at the top of the chip can connect to any of the W wires (where W is the channel width) in the horizontal channel immediately below it. Generally, the FPGA routing is unsegmented. That is, each wiring segment spans only one logic block before it terminates in a switch box. By turning on some of the programmable switches within a switch box, longer paths can be constructed. For higher speed interconnect, some FPGA architectures use longer routing lines that span multiple logic blocks.

Whenever a vertical and a horizontal channel intersect, there is a switch box. In this architecture, when a wire enters a switch box, there are three programmable switches that allow it to connect to three other wires in adjacent channel segments. The pattern, or topology, of switches used in this architecture is the planar or domain-based switch box topology. In this switch box topology, a wire in track number one connects only to wires in track number one in adjacent channel segments, wires in track number 2 connect only to other wires in track number 2 and so on. The figure below illustrates the connections in a switch box.

Switch box topology

Modern FPGA families expand upon the above capabilities to include higher level functionality fixed into the silicon. Having these common functions embedded into the silicon reduces the area required and gives those functions increased speed compared to building them from primitives. Examples of these include multipliers, generic DSP blocks, embedded processors, high speed IO logic and embedded memories. FPGAs are also widely used for systems validation including pre-silicon validation, post-silicon validation, and firmware development. This allows chip companies to validate their design before the chip is produced in the factory, reducing the time-to-market. To shrink the size and power consumption of FPGAs, vendors such as Tabula and Xilinx have introduced new 3D or stacked architectures. Following the introduction of its 28 nm 7-series FPGAs, Xilinx revealed that several of the highest-density parts in those FPGA product lines

will be constructed using multiple dies in one package, employing technology developed for 3D construction and stacked-die assemblies. The technology stacks several (three or four) active FPGA dice side-by-side on a silicon interposer a single piece of silicon that carries passive interconnect.

FPGA design and programming To define the behavior of the FPGA, the user provides a hardware description language (HDL) or a schematic design. The HDL form is more suited to work with large structures because it's possible to just specify them numerically rather than having to draw every piece by hand. However, schematic entry can allow for easier visualization of a design. Then, using an electronic design automation tool, a technology-mapped netlist is generated. The net list can then be fitted to the actual FPGA architecture using a process called place-and-route, usually performed by the FPGA company's proprietary place-and-route software. The user will validate the map, place and route results via timing analysis, simulation, and other verification methodologies. Once the design and validation process is complete, the binary file generated (also using the FPGA company's proprietary software) is used to (re)configure the FPGA. This file is transferred to the FPGA/CPLD via a serial interface (JTAG) or to an external memory device like an EEPROM. The most common HDLs are VHDL and Verilog, although in an attempt to reduce the complexity of designing in HDLs, which have been compared to the equivalent of assembly languages, there are moves to raise the abstraction level through the introduction of alternative languages. National Instrument's Lab VIEW graphical programming language (sometimes referred to as "G") has an FPGA add-in module available to target and program FPGA hardware. To simplify the design of complex systems in FPGAs, there exist libraries of predefined complex functions and circuits that have been tested and optimized to speed up the design process. These predefined circuits are commonly called IP cores, and are available from FPGA vendors and third-party IP suppliers (rarely free, and typically released under proprietary licenses). Other predefined circuits are available from developer communities such as OpenCores (typically released under free and open source licenses such as the GPL, BSD or similar license), and other sources.

In a typical design flow, an FPGA application developer will simulate the design at multiple stages throughout the design process. Initially the RTL description in VHDL or Verilog is simulated by creating test benches to simulate the system and observe results. Then, after the synthesis engine has mapped the design to a netlist, the netlist is translated to a gate level description where simulation is repeated to confirm the synthesis proceeded without errors. Finally the design is laid out in the FPGA at which point propagation delays can be added and the simulation run again with these values back-annotated onto the netlist.

Basic process technology types SRAM - based on static memory technology. In-system programmable and re-programmable. Requires external boot devices. CMOS -Currently in use. Antifuse - One-time programmable. CMOS. PROM - Programmable Read-Only Memory technology. One-time programmable because of plastic packaging. Obsolete. EPROM - Erasable Programmable Read-Only Memory technology. One-time programmable but with window, can be erased with ultraviolet (UV) light. CMOS. Obsolete. EEPROM - Electrically Erasable Programmable Read-Only Memory technology. Can be erased, even in plastic packages. Some but not all EEPROM devices can be in-system programmed. CMOS. Flash - Flash-erase EPROM technology. Can be erased, even in plastic packages. Some but not all flash devices can be in-system programmed. Usually, a flash cell is smaller than an equivalent EEPROM cell and is therefore less expensive to manufacture. CMOS. Fuse - One-time programmable. Bipolar. Obsolete.

Major manufacturers Xilinx and Altera are the current FPGA market leaders and long-time industry rivals. Together, they control over 80 percent of the market.Both Xilinx and Altera provide free Windows and Linux design software which provides limited sets of devices. Other competitors include Lattice Semiconductor (SRAM based with integrated configuration Flash,

instant-on, low power, live reconfiguration), Actel (now Microsemi, antifuse, flash-based, mixed-signal), SiliconBlue Technologies (extremely low power SRAM-based FPGAs with optional integrated nonvolatile configuration memory; acquired by Lattice in 2011), Achronix (SRAM based, 1.5 GHz fabric speed) ,[39] and QuickLogic (handheld focused CSSP, no general purpose FPGAs).

Design and Implementation of A Lottery-based Bandwidth Guaranteed and Low Latency Arbiter for On-Chip Bus 1. Introduction
The bus arbiter decides which master can be granted for bus accesses when the multiple masters issue bus requests at the same time in a system-on-chip. In the previous bus arbitration algorithms, the static fixed priority algorithm , and the time division multiplexing (TDM)/ Round-robin algorithm , and the Lottery ticket bus algorithm exist some drawbacks in arbitrations, such as the bus starvation problem, and low system performance problem because of bus distribution latency during bus arbitration, and the large latency delay problem because of lower ratio assigned ticket number of the masters. Recently, the real-time issue has been considered for bus arbitrations [5]. In our paper, we propose the two-level static priority bus arbitration algorithm, the RB_Lottery algorithm, which is based on Lottery bus arbitration method. The proposed RB_Lottery bus arbitration can handle the real-time requirements for all masters, and solve the traditional bus distribution problem, and guarantee the bandwidth requirement of each master, and then reduces the bus arbitration latency. In the first level bus arbitration, we use static priority real-time counters to satisfy the real-time requirements, which is named the static priority real-time handler. In the second level bus arbitration, we adopt a Lottery-based algorithm with binary group partition for avoiding starvation and reducing bus latency.

2. Previous Bus Arbitration Schemes for SOC Bus Communications


2.1 Static Fixed Priority Algorithm
The static fixed priority algorithm assigns the unique value of the priority in each master, and then the arbiter will periodically check the requirement of each master. When several masters issue requests simultaneously, the master which owns the highest priority will be granted to access the bus. The advantages of the scheme are quick arbitration and simple architecture, but static priority based arbitration allocates the proportion of communication bandwidth to each master according to its own priority, and this causes that the low priority master will have bandwidth starvation if there are many high priority communication traffics on the bus.

2.2 Two-level TDM / Round-robin Algorithm Time division multiplexed (TDM) scheme divides the scheduling execution time on the bus into the time slots, and then allocates the time slots to each master . Each time slot can span several physical transactions on the bus. The arbitration can provide elastic bandwidth assignments, when a master which has reserved more than one slot is potentially granted to access the bus multiple times. The 1st level of arbitration uses a timing wheel, where each slot is statically reserved for a unique master. If the master possesses the current time slot, but the master does not issue a request, the current time slot will be wasted. For repairing this defect, the 2nd level of arbitration, which is called the Round-robin algorithm, can reallocate the available slots to other requesting masters.

2.3 Lottery Bus Algorithm


For the Lottery bus arbitration algorithm , the role of the arbitration is like a lottery manager, which decides which lucky one can win the prize. The lottery manager gathers the requests of bus accesses from all of the masters, and then each master is statically assigned a number of lottery tickets. The lottery manager generates a pseudo random number, which corresponds to one ticket number, and thus the master which owns more tickets is most likely granted. The ticket number in the lottery arbitration algorithm is equal to the weight of each master. The Lottery arbitration algorithm is the probability-based distribution, which can avoid the bus

starvation. Meanwhile, the Lottery arbitration has great control ability of communication bandwidth allocations to each master, but the master which owns lower tickets has more average latency than the other masters. In Figure 1, let us set the bus masters to be C1, C2,, Cn. We define that the number of tickets held by each master is t1, t2, , tn. At any bus cycle, let us define the pending requests to be represented by a set of Boolean variables ri for i=1, 2, ,n, where ri=1 if the master Ci has a pending request, and otherwise ri =0. For the Lottery arbitration, the granted master is chosen by a randomized way, i.e. the probability of granting master Ci.

Lottery bus arbiter for four bus masters

2. Proposed Bus Arbitration Scheme 3.1 Scope of the Arbitration


Since the previous arbitration algorithms can not handle the strict real-time requirements, we propose the two-level arbitration algorithm, which is called the RB_Lottery bus arbitration. The proposed arbiter architecture is shown in Figure 2. In the first level, the static priority real-time handler intends to handle the real-time requirements. In the second level, the binary group partition with Lottery-based scheme intends to guarantee the bandwidth which each master needs, and reduces the distribution latency during bus arbitrations. It notes that once the static priority real-time handler in the first level generates a valid grant output (grant=1) for one bus master, the output of the second level arbitration will be disabled. On the contrary, the grant output of the second level arbitration will be valid only when the first level arbitration does not output a valid grant. The detailed descriptions of the proposed RB_Lottery scheme will be discussed in the following sections.

3.2 Proposed Arbitration Algorithm


The static priority real-time handler sets a priority real-time counter for the real-time requirement of each master. Initially, we can set suitably initial counter values into real-time counters for all bus masters. When a master issues a request, the corresponding real-time counter will be decreased by 1 until the master is granted. Two conditions will happen when a master issues a request for bus grant. On the one hand, when the counter value is decreased to zero, then the first level arbitration will generate a valid grant, and the counter value in the real-timer counter will be reset. On the other hand, when the counter value is not decreased to zero and the second level arbitration generates a corresponding valid grant at the same time, then the corresponding realtime counter will be reset. If several real-time counters are decreased to zero simultaneously, the master which owns the highest priority will be granted.

Two-level arbitration scheme for the proposed RB_ Lottery arbiter

In the aspect of binary group logic, each master gives the identification number for priority request. Then, we group two masters into a binary set. If the higher priority master issues a request signal, then this signal will deliver into Lottery-based block. The binary group logic architecture is shown in Figure . In Figure 3 the input net, Lo priority request, must be connected to the request signal of lower priority master in a binary set. Then, the Hi priority request must be connected to the request signal of higher priority master in the same binary set.

Binary group logic circuit for priority selection

3.3 Proposed Arbiter Architecture


To describe clearly the architecture of the proposed RB_Lottery arbiter, we discuss the proposed bus arbitration with 4-master case in Figure 4 as follows. Let us define that the request signals of Master 1 (M1), Master 2 (M2), Master 3 (M3) and Master 4 (M4) are assigned to r1, r2, r3 and r4, respectively. Then, the priority order is assigned to M1 > M2 > M3 > M4, where M1 owns the highest priority. In Lottery-based part, t1=1 and t2=2. If the master wants to access or communicate data through using bus, the corresponding request signal will be set to high (i.e.,1); otherwise, the corresponding request signal will be set low (i.e.,0). In condition 1, suppose that r1=r3=1, r2=r4=0, the real-time counters of M1 and M3 are decreasing to zero, simultaneously. Since the priority of M3 is more important than that of M1, the M3 will obtain the bus grant, and then the grant signal, gnt[3], will be set to 1 in the static priority real-time handler. At the same time, the grant output of the binary group with Lottery-based block is disabled. In condition 2, suppose that r1=r3=1, r2=r4=0, but the real-time counters of M1 and M3 are not decreasing to zero, then the rb1 signal at the output of the binary group logic, MUX 1, is set to1. Meanwhile, the rb2 signal at the output of the binary group logic, MUX 2, is also set to 1. Thus, the grant output of the Lottery-based scheme will be active. Table 1 shows the truth table of grant decoder in the RB_Lottery arbiter for 4-master case. The generated random number is compared in parallel with two partial sums, where the outputs of the comparators are b0 and b1. The comparator will output a 1 if the value of the random number is less than the partial sum at the other input.

The architecture of proposed RB_ Lottery bus arbiter for 4-master case

AMBA AXI4 architecture


AMBA AXI4 [3] supports data transfers up to 256 beats and unaligned data transfers using byte strobes. In AMBA AXI4 system 16 masters and 16 slaves are interfaced. Each master and slave has their own 4 bit ID tags. AMBA AXI4 system consists of master, slave and bus (arbiters and decoders). The system consists of five channels namely write address channel, write data channel, read data channel, read address channel, and write response channel. The AXI4 protocol supports the following mechanisms: Unaligned data transfers and up-dated write response requirements. Variable-length bursts, from 1 to 16 data transfers per burst. A burst with a transfer size of 8, 16, 32, 64, 128, 256, 512 or 1024 bits wide is supported. Updated AWCACHE and ARCACHE signalling details. Each transaction is burst-based which has address and control information on the address channel that describes the nature of the data to be transferred. The data is transferred between master and slave using a write data channel to the slave or a read data channel to the master. Table gives the information of signals used in the complete design of the protocol. The write operation process starts when the master sends an address and control information on the write address channel as shown in fig. 1. The master then sends each item of write data over the write data channel. The master keeps the VALID signal low until the write data is available. The master sends the last data item, the WLAST signal goes HIGH. When the slave has accepted all the data items, it drives a write response signal BRESP[1:0] back to the master to indicate that the write transaction is complete. This signal indicates the status of the write transaction. The allowable responses are OKAY, EXOKAY, SLVERR, and DECERR. After the read address appears on the address bus, the data transfer occurs on the read data channel as shown in fig. The slave keeps the VALID signal LOW until the read data is available. For the final data transfer of the burst, the slave asserts the RLAST signal to show that the last data item is being transferred. The RRESP[1:0] signal indicates the status of the read transfer. The allowable responses are OKAY, EXOKAY, SLVERR, and DECERR.

The work carried out in this project is the achievement of communication between one master and one slave. AMBA AXI4 slave is designed with operating frequency of 100MHz, which gives each clock cycle of duration 10ns. To access slave interconnect is needed, hence interconnect signals are also studied. Master block functions are assumed to be available and the slave characteristics are studied. The AMBA AXI4 system components consists of 1) Master 2) AMBA AXI4 Interconnect 2.1) Arbiters 2.2) Decoders 3) Slave The master is connected to the interconnect using a slave interface and the slave is connected to the interconnect using a master interface as shown in fig. The AXI4 master gets connected to the AXI4 slave interface port of the interconnect and the AXI slave gets connected to the AXI4 Master interface port of the interconnect. The parallel capability of this interconnects enables master M1 to access one slave at the same as master M0 is accessing the other.

AMBA AXI4 slave Read/Write block Diagram.

SOFTWARE TOOLS Verilog HDL


Verilog is used to model the design in this project. It is a hardware description language (HDL) used to model electronic systems. It is sometimes called Verilog HDL, which supports the design, verification, and implementation of analog, digital, and mixed-signal circuits at various levels of abstraction. Verilog was originally developed and owned by Gate Way design in 1984. After this, Cadence Design Systems purchased Gate Way and continued selling Verilog-XL as a VerilogHDL simulator with PLI support in 1990. In 1995 Cadence released the specs for Verilog-HDL

and they were accepted as IEEE -1364 standard which included the PLI1.0 (TF/ACC) routines as a standard for all Verilog Simulators. In 1993 PLI2.0 (VPI) routines were released as a standard by OVI and in 1999 IEEE will vote on updating the 1364 standard to include PLI2.0. In 2001 IEEE accepted the updated Verilog standard commonly known as Verilog 2001 and today, there are a dozen simulators that simulate Verilog HDL. Verilog was generated as a language for the industry rather than academia. It is very C like programming style that closely represents hardware. VHDL supports 9 values logic, where as Verilog supports 7 strengths on 3 values. Compared to VHDL, VHDL offers more programming constructs where as Verilog is closer to hardware. Verilog HDL is a general purpose hardware description language that is easy to learn and easy to use. It is similar to C programming language. Designers with C programming experience will find it easy to learn Verilog HDL, and will be comfortable with its syntax. Besides, Verilog allows different levels of abstraction to be mixed in the same models. Thus, a designer can define a hardware model in terms of switches, gates, RTL, or behavioral code. Also, a designer needs to learn only one language for stimulus and hierarchical design. On top of that, most popular logic synthesis tools support Verilog HDL. This makes it the language of choice for many ASIC companies. More important is, all fabrication vendors provide Verilog HDL libraries for post logic synthesis simulation. Thus, designing a chip in Verilog HDL allows the widest choice of vendors.

3.3 INTRODUCTION TO XILINX ISE 12.1 EDA TOOL:


ISE (Integrated Software Environment) continues to be the design tool of choice for FPGA designers based on independent media surveys. Xilinx once again earned top ranking in this year in all FPGA EDA categories and scored higher than any other FPGA vendor in user satisfaction.

Xilinx ISE Overview


The Integrated Software Environment (ISE) is the Xilinx design software suite that allows you to take your design from design entry through Xilinx device programming. The ISE

Project Navigator manages and processes your design through the following steps in the ISE design flow. A simplified version of design flow is given in the flowing diagram.

Figure 3.5: FPGA Design Flow

Design Entry
There are different techniques for design entry. Schematic based, Hardware Description Language and combination of both etc. Selection of a method depends on the design and designer. If the designer wants to deal more with Hardware, then Schematic entry is the better choice. When the design is complex or the designer thinks the design in an algorithmic way then HDL is the better choice. Language based entry is faster but lag in performance and density. HDLs represent a level of abstraction that can isolate the designers from the details of the hardware implementation. Schematic based entry gives designers much more visibility into the hardware. It is the better choice for those who are hardware oriented. Another method but rarely used is state-machines. It is the better choice for the designers who think the design as a series of

states. But the tools for state machine entry are limited. In this documentation we are going to deal with the HDL based design entry.

Synthesis
The process which translates VHDL or Verilog code into a device netlist format. i.e a complete circuit with logical elements (gates, flip flops, etc) for the design.If the design contains more than one sub designs, ex. to implement a processor, we need a CPU as one design element and RAM as another and so on, then the synthesis process generates netlist for each design element Synthesis process will check code syntax and analyze the hierarchy of the design which ensures that the design is optimized for the design architecture, the designer has selected. The resulting netlist(s) is saved to an NGC( Native Generic Circuit) file (for Xilinx Synthesis Technology (XST)).

Implementation
This process consists a sequence of three steps 1. Translate 2. Map 3. Place and Route Translate process combines all the input netlists and constraints to a logic design file. This information is saved as a NGD (Native Generic Database) file. This can be done using NGD Build program. Here, defining constraints is nothing but, assigning the ports in the design to the physical elements (ex. pins, switches, buttons etc) of the targeted device and specifying time

requirements of the design. This information is stored in a file named UCF (User Constraints File). Tools used to create or modify the UCF are PACE, Constraint Editor etc.

Map process divides the whole circuit with logical elements into sub blocks such that they can be fit into the FPGA logic blocks. That means map process fits the logic defined by the NGD file into the targeted FPGA elements (Combinational Logic Blocks (CLB), Input Output Blocks (IOB)) and generates an NCD (Native Circuit Description) file which physically represents the design mapped to the components of FPGA. MAP program is used for this purpose.

Place and Route PAR program is used for this process. The place and route process places the sub blocks from the map process into logic blocks according to the constraints and connects the logic blocks. Ex. if a sub block is placed in a logic block which is very near to IO pin, then it may save the time but it may effect some other constraint. So trade off between all the constraints is taken account by the place and route process

The PAR tool takes the mapped NCD file as input and produces a completely routed NCD file as output. Output NCD file consists the routing information.

Device Programming
Now the design must be loaded on the FPGA. But the design must be converted to a format so that the FPGA can accept it. BITGEN program deals with the conversion. The routed NCD file is then given to the BITGEN program to generate a bit stream (a .BIT file) which can be used to configure the target FPGA device. This can be done using a cable. Selection of cable depends on the design.

Design Verification
Verification can be done at different stages of the process steps.

Behavioral Simulation (RTL Simulation) This is first of all simulation steps; those are
encountered throughout the hierarchy of the design flow. This simulation is performed before synthesis process to verify RTL (behavioral) code and to confirm that the design is functioning

as intended. Behavioral simulation can be performed on either VHDL or Verilog designs. In this process, signals and variables are observed, procedures and functions are traced and breakpoints are set. This is a very fast simulation and so allows the designer to change the HDL code if the required functionality is not met with in a short time period. Since the design is not yet synthesized to gate level, timing and resource usage properties are still unknown.

Functional simulation (Post Translate Simulation) Functional simulation gives information


about the logic operation of the circuit. Designer can verify the functionality of the design using this process after the Translate process. If the functionality is not as expected, then the designer has to made changes in the code and again follow the design flow steps.

Static Timing Analysis This can be done after MAP or PAR processes Post MAP timing
report lists signal path delays of the design derived from the design logic. Post Place and Route timing report incorporates timing delay information to provide a comprehensive timing summary of the design.

FPGA AND ISE DEVELOPMENT SOFTWARE BASICS


The Spartan-3 EDK Board provides a powerful, self-contained development platform for designs targeting the new Spartan-3 FPGA from Xilinx. It features a 200K gate Spartan-3, onboard I/O devices, and 1MB fast asynchronous SRAM, making it the perfect platform to experiment with any new design, from a simple logic circuit to an embedded processor core. The board also contains a Platform Flash JTAG-programmable ROM, so designs can easily be made nonvolatile.

Xilinx Spartan3 FPGA:


200,000-gate Xilinx Spartan 3 FPGA in a 144-TQFP (XC3S200-4TQG144C)

4,320 logic cell equivalents Twelve 18K-bit block RAMs (216K bits) Twelve 18x18 hardware multipliers Four Digital Clock Managers (DCMs) Up to 97 user-defined I/O signals External Peripherals Modules 2x16 LCD with Contrast adjusts 2-Nos. of common anode seven segment display 8-Nos. General purpose point LEDs 8-Nos of Toggle switches (Digital inputs) 4-Nos of Push Button PS/2 Keyboard or Mouse Interface Communication protocols Full Duplex UART (EIA RS232) Other Features: VGA Interface Connector On-board 4 MB Platform Flash Memory (PROM) 8 MB On Board SRAM JTAG Interface Connector for parallel programming Spartan3 FPGA 50 MHz crystal oscillator clock source

SPARTAN3 (EDK) Board Components placement top view

Block Diagram

Figure 2. Xilinx Spartan3Advanced Development Board Block Diagram On-board Peripherals The Spartan3FPGA Lab Kit comes with many interfacing options 2 Nos. of Seven-segment display 8-Nos. of Toggle switches (Digital Inputs) 4-Nos. of Push Button (Digital Inputs)

8-Nos. of Point LEDs (Digital Outputs) 2x16 Character LCD UART for serial port communication through PC PS/2 keyboard Interface 3-Bit VGA Interface

GETTING STARTED
Software Requirements To use this tutorial, you must install the following software: ISE 12.1 For more information about installing Xilinx software, see the ISE Release Notes and Installation Guide at: http://www.xilinx.com/support/software_manuals.htm. Hardware Requirements To use this tutorial, you must have the following hardware: Spartan-3 Startup Kit, containing the Spartan-3 Startup Kit Demo Board8 www.xilinx.com ISE

STARTING THE ISE SOFTWARE


To start ISE, double-click the desktop icon

or start ISE from the Start menu by selecting: Start All Programs Xilinx ISE 12.1 Project Navigator Note: Your start-up path is set during the installation process and may differ from the one above

ACCESSING HELP
At any time during the tutorial, you can access online help for additional information about the ISE software and related tools. To open Help, do either of the following: Press F1 to view Help for the specific tool or function that you have selected or highlighted. Launch the ISE Help Contents from the Help menu. It contains information about creating and maintaining your complete design flow in ISE.

Figure 1: ISE Help Topics Create a New Project Create a new ISE project which will target the FPGA device on the Spartan-3 Startup Kit demo board.

To create a new project:


1. Select File > New Project... The New Project Wizard appears. 2. Type tutorial in the Project Name field. 3. Enter or browse to a location (directory path) for the new project. A tutorial subdirectory is created automatically. 4. Verify that HDL is selected from the Top-Level Source Type list. 5. Click Next to move to the device properties page. 6. Fill in the properties in the table as shown below: Product Category: All Family: Spartan3 Device: XC3S200 Package: TQ144 Speed Grade: -4 Top-Level Source Type: HDL Synthesis Tool: XST (VHDL/Verilog) Simulator: ISE Simulator (VHDL/Verilog) Preferred Language: Verilog (or VHDL)

Verify that Enable Enhanced Design Summary is selected. Leave the default values in the remaining fields. When the table is complete, your project properties will look like the following:

Figure 2: Project Device Properties Creating a Verilog Source Create the top-level Verilog source file for the project as follows: 1. Click New Source in the New Project dialog box. 2. Select Verilog Module as the source type in the New Source dialog box. 3. Type in the file name counter. 4. Verify that the Add to Project checkbox is selected. 5. Click Next. 6. Declare the ports for the design by filling in the port information. The source file containing the row DWT module displays in the Workspace, and the counter displays in the Sources tab, as shown below:

Checking the Syntax of the New Counter Module When the source files are complete, check the syntax of the design to find errors and typos. 1. Verify that Implementation is selected from the drop-down list in the Sources window. 2. Select the row DWT design source in the Sources window to display the related processes in the Processes window. 3. Click the + next to the Synthesize-XST process to expand the process group. 4. Double-click the Check Syntax process. Note: You must correct any errors found in your source files. You can check for errors in the Console tab of the Transcript window. If you continue without valid syntax, you will not be able to simulate or synthesize your design. 5.Close the HDL file.

BEHAVIORAL SIMULATION
ISIM SETUP
ISim is automatically installed and set up with the ISE Design Suite 12 installer on supported operating systems. To see a list of operating systems supported by ISim, please see the ISE Design Suite 12: Installation, Licensing, and Release Notes available from the Xilinx website.

Getting Started
The following sections outline the requirements for performing behavioral simulation in this tutorial.

Required Files
The behavioral simulation flow requires design files, a test bench file, and Xilinx simulation libraries.

Design Files (VHDL, Verilog, or Schematic)


This chapter assumes that you have completed the design entry tutorial in either Chapter 2, HDL-Based Design, or Chapter 3, Schematic-Based Design. After you have completed one of these chapters, your design includes the required design files and is ready for simulation.

Test Bench File


To simulate the design, a test bench file is required to provide stimulus to the design. VHDL and Verilog test bench files are available with the tutorial files. You may also create your own test bench file.

Simulation Libraries
Xilinx simulation libraries are required when a Xilinx primitive or IP core is instantiated in the design. The design in this tutorial requires the use of simulation libraries because it contains instantiations of a digital clock manager (DCM) and a CORE Generator software component. For information on simulation libraries and how to compile them, see the next section, Xilinx Simulation Libraries.

XILINX SIMULATION LIBRARIES


To simulate designs that contain instantiated Xilinx primitives, CORE Generator software components, and other Xilinx IP cores you must use the Xilinx simulation libraries. These libraries contain models for each component. These models reflect the functions of each component, and provide the simulator with the information required to perform simulation. For a detailed description of each library, see the Synthesis and Simulation Design Guide. This guide is available from the ISE Software Manuals collection, automatically installed with your ISE software. To open the Software Manuals collection, select Help > Software Manuals. The Software Manuals collection is also available from the Xilinx website.

Adding an HDL Test Bench


To add an HDL test bench to your design project, you can either add a test bench file provided with this tutorial, or create your own test bench file and add it to your project. Adding the Tutorial Test Bench File This section demonstrates how to add an existing test bench file to the project. A VHDL test bench and Verilog test fixture are provided with this tutorial. Note: To create your own test bench file in the ISE software, select Project > New Source, and select either VHDL Test Bench or Verilog Text Fixture in the New Source Wizard. An empty stimulus file is added to your project. You must define the test bench in a text editor. Verilog Simulation To add the tutorial Verilog test fixture to the project, do the following: 1. In Project Navigator, select Project > Add Source. 2. Select the file tb_rowDWT.v. 3. Click Open. 4. Ensure that Simulation is selected for the file association type. 5. Click OK.

Behavioral Simulation Using ISim


Follow this section of the tutorial if you have skipped the previous section, Behavioral Simulation Using ModelSim. Now that you have a test bench in your project, you can perform behavioral simulation on the design using ISim. The ISE software has full integration with ISim. The ISE software enables ISim to create the work directory, compile the source files, load the design, and perform simulation based on simulation properties. To select ISim as your project simulator, do the following: 1. In the Hierarchy pane of the Project Navigator Design panel, right-click the device line (xc3s700A-4fg484), and select Design Properties. 2. In the Design Properties dialog box, set the Simulator field to ISim (VHDL/Verilog). Locating the Simulation Processes The simulation processes in the ISE software enable you to run simulation on the design

using ISim. To locate the ISim processes, do the following: 1. In the View pane of the Project Navigator Design panel, select Simulation, and select Behavioral from the drop-down list. 2. In the Hierarchy pane, select the test bench file (tb_rowDWT). 3. In the Processes pane, expand ISim Simulator to view the process hierarchy. Performing Simulation After the process properties have been set, you are ready to run ISim to simulate the design. To start the behavioral simulation, double-click Simulate Behavioral Model. ISim creates the work directory, compiles the source files, loads the design, and performs simulation for the time specified. Adding Signals To view signals during the simulation, you must add them to the Waveform window. The ISE software automatically adds all the top-level ports to the Waveform window. Additional signals are displayed in the Instances and Processes panel. The following procedure explains how to add additional signals in the design hierarchy. For the purpose of this tutorial, add the DCM signals to the waveform. To add additional signals in the design hierarchy, do the following: 1. In the Instances and Processes panel, expand tb_rowDWT, and expand UUT. The following figure shows the contents of the Instances and Processes panel for the verilog flow. .

Drag all the selected signals to the waveform. Alternatively, right click on a selected signal and select Add To Waveform.

Notice that the waveforms have not been drawn for the newly added signals. This is because ISim did not record the data for these signals. By default, ISim records data only for the signals that are added to the waveform window while the simulation is running. Therefore, when new signals are added to the waveform window, you must rerun the

simulation for the desired amount of time. Rerunning Simulation To rerun the simulation in ISim, do the following: 1. Click the Restart Simulation icon. Figure : ISim Restart Simulation Icon 2. At the ISim command prompt in the Console, enter run 2000 ns and press Enter. The simulation runs for 2000 ns. The waveforms for the counter block are now visible in the Waveform window.

Running a Simulation in ISim Simulation, the process of verifying the logic and timing of a design, can be run from ISim using functions in the interface or at the command line. To Run a Simulation From the ISim GUI The following GUI menu commands can be used to run simulation. Simulation > Restart - Stops simulation and sets simulation time back to 0. Use the Run All, Run For or Step command to run the simulation over again without reloading the design. See restart Tcl command. Simulation > Run All - Runs simulation until all events are executed. You can also use the Run Tcl command with the all option. Simulation > Run - Runs simulation for 100ns or for specified amount of time in the toolbar. Time and time unit are entered in the Value box in the toolbar. You can also use the run Tcl command with a length and unit specified. Simulation > Step - Runs simulation for one executable HDL instruction at a time. See Stepping a Simulation. See also step Tcl command. In addition, you can run simulation until a specific point in your HDL source code is reached. To do so, use breakpoints and the Run All command. See Source Level Debugging Overview. Note The current simulation time is displayed on the status bar in the lower right corner. Pausing a Simulation While running a simulation for any length of time, you can pause a simulation using the Break command, which leaves the simulation session open.

To close the session of ISim, see Closing ISim. To Pause a Running Simulation You can pause a running simulation using the Break command as follows: Select Simulation > Break. Click the Break toolbar button . Enter Ctrl+C at the command line only. The simulator stops at the next executable HDL line. The line at which the simulation stopped is displayed in the text editor. Note This behavior applies to designs that have not been compiled with the -nodebug switch. The simulation can be resumed at any time by using the Run All, Run, or Step commands. See Running a Simulation in ISim for details. Closing ISim You can terminate a simulation and close the ISim session. To Close ISim Select File > Exit. Enter the quit -f command in the Console panel at the prompt. This will prevent an are you sure dialog box from opening. Click the X at the top-right corner of the main window. The simulation terminates and the session of ISim closes. Changing the Radix To Set the Default Radix The default radix controls the bus radix displayed in the wave configuration, Objects panel, and the Console panel. To change the default radix from the default binary: 1. Select Edit > Preferences. 2. In the Preferences dialog box, click ISim Simulator in the left pane. 3. Select a radix from the Default Radix field drop-down list. 4. Click Apply, and click OK. To Change an Individual Radix You can change the radix of an individual signal (HDL object) in the Object panel as follows:

1. Right-click on a bus in the Objects panel. 2. Select Radix, and the desired format from submenu menu: Binary Hexadecimal Unsigned Decimal Signed Decimal Octal ASCII vcd Command The vcd command generates simulation results in VCD format. This command enables you to dump specified instances to a VCD file, to name the VCD file, to start and stop the dump process, and other functions. See also Writing Activity Data of the Design. Note This command is case sensitive.

Syntax

vcd

(option)

Examples The vcd command can be used as follows. Following are the commands you would use to write the VCD simulation values of the

module UUT to a VCD file after running simulation for 1000 ns. Specify which file to write: vcd dumpfile adder.vcd Specify which module net activities to write: vcd dumpvars -m /UUT Run simulation for given time: run 1000 ns Dump the activity data to the vcd file. vcd dump flush

Synthesis Report of Lottery-based Bandwidth Guaranteed and Low Latency Arbiter for On-Chip Bus
Release 12.1 - xst M.53d (nt64) Copyright (c) 1995-2010 Xilinx, Inc. All rights reserved. --> Parameter TMPDIR set to xst/projnav.tmp Total REAL time to Xst completion: 0.00 secs Total CPU time to Xst completion: 0.22 secs --> Parameter xsthdpdir set to xst Total REAL time to Xst completion: 0.00 secs Total CPU time to Xst completion: 0.23 secs --> Reading design: msarbitery_top.prj TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Parsing 3) HDL Elaboration 4) HDL Synthesis 4.1) HDL Synthesis Report 5) Advanced HDL Synthesis 5.1) Advanced HDL Synthesis Report 6) Low Level Synthesis 7) Partition Report 8) Design Summary 8.1) Primitive and Black Box Usage 8.2) Device utilization summary 8.3) Partition Resource Summary 8.4) Timing Report 8.4.1) Clock Information 8.4.2) Asynchronous Control Signals Information 8.4.3) Timing Summary 8.4.4) Timing Details ========================================================================= * Synthesis Options Summary * =========================================================================

---- Source Parameters Input File Name Input Format Ignore Synthesis Constraint File ---- Target Parameters Output File Name Output Format Target Device ---- Source Options Top Module Name Automatic FSM Extraction FSM Encoding Algorithm Safe Implementation FSM Style RAM Extraction RAM Style ROM Extraction Shift Register Extraction ROM Style Resource Sharing Asynchronous To Synchronous Shift Register Minimum Size Use DSP Block Automatic Register Balancing ---- Target Options LUT Combining Reduce Control Sets Add IO Buffers Global Maximum Fanout Add Generic Clock Buffer(BUFG) Register Duplication Optimize Instantiated Primitives Use Clock Enable Use Synchronous Set Use Synchronous Reset Pack IO Registers into IOBs Equivalent register Removal ---- General Options Optimization Goal Optimization Effort Power Reduction Library Search Order Keep Hierarchy Netlist Hierarchy RTL Output Global Optimization Read Cores Write Timing Constraints Cross Clock Analysis Hierarchy Separator Bus Delimiter Case Specifier Slice Utilization Ratio BRAM Utilization Ratio DSP48 Utilization Ratio Auto BRAM Packing Slice Utilization Ratio Delta

: "msarbitery_top.prj" : mixed : NO : "msarbitery_top" : NGC : xc6slx100t-3-csg484 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : msarbitery_top YES Auto No lut Yes Auto Yes YES Auto YES NO 2 auto No auto auto YES 100000 16 YES NO Auto Auto Auto auto YES Speed 1 NO msarbitery_top.lso NO as_optimized Yes AllClockNets YES NO NO / <> maintain 100 100 100 NO 5

========================================================================= ========================================================================= * HDL Parsing * ========================================================================= Parsing Verilog file "D:\ARBITERY\random.v" into library work Parsing module <random>. Parsing Verilog file "D:\ARBITERY\IntMemAddrGen4.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 30. Parsing module <IntMemAddrGen4>. Parsing Verilog file "D:\ARBITERY\IntMemAddrGen3.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 30. Parsing module <IntMemAddrGen3>. Parsing Verilog file "D:\ARBITERY\IntMemAddrGen2.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 30. Parsing module <IntMemAddrGen2>. Parsing Verilog file "D:\ARBITERY\IntMemAddrGen1.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 30. Parsing module <IntMemAddrGen1>. Parsing Verilog file "D:\ARBITERY\IntMemUnpackAddr4.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 30. Parsing module <IntMemUnpackAddr4>. Parsing Verilog file "D:\ARBITERY\IntMemUnpackAddr3.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 30. Parsing module <IntMemUnpackAddr3>. Parsing Verilog file "D:\ARBITERY\IntMemUnpackAddr2.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 31. Parsing module <IntMemUnpackAddr2>. Parsing Verilog file "D:\ARBITERY\IntMemUnpackAddr1.v" into library work Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 30. Parsing module <IntMemUnpackAddr1>. Parsing Verilog file "D:\ARBITERY\comparator.v" into library work Parsing module <comparator>. Parsing Verilog file "D:\ARBITERY\arbiter.v" into library work Parsing module <arbiter>. Parsing Verilog file "D:\ARBITERY\realtimehandler.v" into library work Parsing module <realtimehandler>. Parsing Verilog file "D:\ARBITERY\lotterytop.v" into library work Parsing module <lotterytop>. Parsing Verilog file "D:\ARBITERY\IntMemAxi4.v" into library work Parsing module <IntMemAxi4>. Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 165. Parsing Verilog file "D:\ARBITERY\IntMemAxi3.v" into library work Parsing module <IntMemAxi3>. Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 170. Parsing Verilog file "D:\ARBITERY\IntMemAxi2.v" into library work Parsing module <IntMemAxi2>. Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 169. Parsing Verilog file "D:\ARBITERY\IntMemAxi1.v" into library work Parsing module <IntMemAxi1>. Parsing verilog file "D:\ARBITERY\/Axi.v" included at line 172. Parsing Verilog file "D:\ARBITERY\arbiterfsm.v" into library work Parsing module <arbiterfsm>. Parsing Verilog file "D:\ARBITERY\AXISlave.v" into library work Parsing module <AXISlave>. Parsing Verilog file "D:\ARBITERY\Aximaster4.v" into library work Parsing module <Aximaster4>. Parsing Verilog file "D:\ARBITERY\Aximaster3.v" into library work

Parsing module <Aximaster3>. Parsing Verilog file "D:\ARBITERY\Aximaster2.v" into library work Parsing module <Aximaster2>. Parsing Verilog file "D:\ARBITERY\Aximaster1.v" into library work Parsing module <Aximaster1>. Parsing Verilog file "D:\ARBITERY\arbitery_top.v" into library work Parsing module <arbitery_top>. Parsing Verilog file "D:\ARBITERY\msarbitery_top.v" into library work Parsing module <msarbitery_top>. ========================================================================= * HDL Elaboration * ========================================================================= Elaborating module <msarbitery_top>. Elaborating module <Aximaster1>. Elaborating module <IntMemAxi1>. Elaborating module <IntMemUnpackAddr1>. Elaborating module <IntMemAddrGen1>. ========================================================================= * HDL Synthesis * ========================================================================= Synthesizing Unit <msarbitery_top>. Related source file is "d:/arbitery/msarbitery_top.v". INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 170: Output port <BID1> of the instance <m1> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 170: Output port <RID1> of the instance <m1> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 170: Output port <WREADY1> of the instance <m1> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 170: Output port <RRESP1> of the instance <m1> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <BID2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <RID2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <AWREADY2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <WREADY2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <BVALID2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <ARREADY2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <RRESP2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <RLAST2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 175: Output port <RVALID2> of the instance <m2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <BID3> of the instance <m3> is unconnected or connected to loadless signal.

INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <RID3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <AWREADY3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <WREADY3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <BVALID3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <ARREADY3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <RRESP3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <RLAST3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 180: Output port <RVALID3> of the instance <m3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <BID4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <RID4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <AWREADY4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <WREADY4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <BVALID4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <ARREADY4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <RRESP4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <RLAST4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 185: Output port <RVALID4> of the instance <m4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/msarbitery_top.v" line 195: Output port <RDATA> of the instance <Slave> is unconnected or connected to loadless signal. Summary: no macro. Unit <msarbitery_top> synthesized. Synthesizing Unit <Aximaster1>. Related source file is "d:/arbitery/aximaster1.v". INFO:Xst:3010 - "d:/arbitery/aximaster1.v" line 158: Output port <MEMWEn1> of the instance <uIntMemAxi1> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster1.v" line 158: Output port <MEMADDR1> of the instance <uIntMemAxi1> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster1.v" line 158: Output port <MEMCEn1> of the instance <uIntMemAxi1> is unconnected or connected to loadless signal.

Summary: no macro. Unit <Aximaster1> synthesized. Synthesizing Unit <IntMemAxi1>. Related source file is "d:/arbitery/intmemaxi1.v". DATA_WIDTH1 = 64 ID_WIDTH1 = 7 NUM_RD_WS1 = 1'b0 IS_ROM1 = 1'b0 STRB_WIDTH1 = 8 STRB_MAX1 = 7 ID_MAX1 = 6 Found 1-bit register for signal <WsCount1>. Found 1-bit register for signal <iWREADY1>. Found 1-bit register for signal <iBVALID1>. Found 1-bit register for signal <iRVALID1>. Found 1-bit register for signal <RvalidEnReg1>. Found 1-bit register for signal <iRLAST1>. Found 1-bit register for signal <WriteSel1>. Found 7-bit register for signal <BID1>. Found 7-bit register for signal <AridReg1>. Found 7-bit register for signal <iRID1>. Found 2-bit subtractor for signal <GND_3_o_GND_3_o_sub_2_OUT>. Found 1-bit 4-to-1 multiplexer for signal <WriteSelNxt1> created at line 275. Summary: inferred 1 Adder/Subtractor(s). inferred 27 D-type flip-flop(s). inferred 7 Multiplexer(s). Unit <IntMemAxi1> synthesized. Synthesizing Unit <IntMemUnpackAddr1>. Related source file is "d:/arbitery/intmemunpackaddr1.v". Found 1-bit register for signal <iAddrLast1>. Found 1-bit register for signal <iAddrValid1>. Found 2-bit register for signal <BurstReg1>. Found 4-bit register for signal <LenReg1>. Found 4-bit register for signal <AddrCount1>. Found 32-bit register for signal <iAddrOut1>. Found 3-bit register for signal <SizeReg1>. Found 4-bit subtractor for signal <AddrCount1[3]_GND_4_o_sub_10_OUT>. Summary: inferred 1 Adder/Subtractor(s). inferred 47 D-type flip-flop(s). inferred 15 Multiplexer(s). Unit <IntMemUnpackAddr1> synthesized. Synthesizing Unit <IntMemAddrGen1>. Related source file is "d:/arbitery/intmemaddrgen1.v". Found 12-bit adder for signal <IncrAddr1> created at line 95. Found 12-bit 4-to-1 multiplexer for signal <OffsetAddr1> created at line 63. Found 12-bit 4-to-1 multiplexer for signal <CalcAddr1> created at line 67. Summary: inferred 1 Adder/Subtractor(s). inferred 4 Multiplexer(s). Unit <IntMemAddrGen1> synthesized. Synthesizing Unit <Aximaster2>.

Related source file is "d:/arbitery/aximaster2.v". INFO:Xst:3010 - "d:/arbitery/aximaster2.v" line 172: Output port <MEMWEn2> of the instance <uIntMemAxi2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster2.v" line 172: Output port <MEMADDR2> of the instance <uIntMemAxi2> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster2.v" line 172: Output port <MEMCEn2> of the instance <uIntMemAxi2> is unconnected or connected to loadless signal. Summary: no macro. Unit <Aximaster2> synthesized. Synthesizing Unit <IntMemAxi2>. Related source file is "d:/arbitery/intmemaxi2.v". DATA_WIDTH2 = 64 ID_WIDTH2 = 7 NUM_RD_WS2 = 1'b0 IS_ROM2 = 1'b0 STRB_WIDTH2 = 8 STRB_MAX2 = 7 ID_MAX2 = 6 Found 1-bit register for signal <WsCount2>. Found 1-bit register for signal <iWREADY2>. Found 1-bit register for signal <BvalidNxt2>. Found 1-bit register for signal <iRVALID2>. Found 1-bit register for signal <RvalidEnReg2>. Found 1-bit register for signal <iRLAST2>. Found 1-bit register for signal <WriteSel2>. Found 7-bit register for signal <BID2>. Found 7-bit register for signal <AridReg2>. Found 7-bit register for signal <iRID2>. Found 2-bit subtractor for signal <GND_8_o_GND_8_o_sub_2_OUT>. Found 1-bit 4-to-1 multiplexer for signal <WriteSelNxt2> created at line 272. Summary: inferred 1 Adder/Subtractor(s). inferred 21 D-type flip-flop(s). inferred 5 Multiplexer(s). Unit <IntMemAxi2> synthesized. Synthesizing Unit <IntMemUnpackAddr2>. Related source file is "d:/arbitery/intmemunpackaddr2.v". Found 1-bit register for signal <iAddrLast2>. Found 1-bit register for signal <iAddrValid2>. Found 2-bit register for signal <BurstReg2>. Found 4-bit register for signal <LenReg2>. Found 4-bit register for signal <AddrCount2>. Found 32-bit register for signal <iAddrOut2>. Found 3-bit register for signal <SizeReg2>. Found 4-bit subtractor for signal <AddrCount2[3]_GND_9_o_sub_10_OUT>. Summary: inferred 1 Adder/Subtractor(s). inferred 47 D-type flip-flop(s). inferred 15 Multiplexer(s). Unit <IntMemUnpackAddr2> synthesized. Synthesizing Unit <IntMemAddrGen2>. Related source file is "d:/arbitery/intmemaddrgen2.v". Found 12-bit adder for signal <IncrAddr2> created at line 95. Found 12-bit 4-to-1 multiplexer for signal <OffsetAddr2> created at line 63.

Found 12-bit 4-to-1 multiplexer for signal <CalcAddr2> created at line 67. Summary: inferred 1 Adder/Subtractor(s). inferred 4 Multiplexer(s). Unit <IntMemAddrGen2> synthesized. Synthesizing Unit <Aximaster3>. Related source file is "d:/arbitery/aximaster3.v". INFO:Xst:3010 - "d:/arbitery/aximaster3.v" line 176: Output port <MEMWEn3> of the instance <uIntMemAxi3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster3.v" line 176: Output port <MEMADDR3> of the instance <uIntMemAxi3> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster3.v" line 176: Output port <MEMCEn3> of the instance <uIntMemAxi3> is unconnected or connected to loadless signal. Summary: no macro. Unit <Aximaster3> synthesized. Synthesizing Unit <IntMemAxi3>. Related source file is "d:/arbitery/intmemaxi3.v". DATA_WIDTH3 = 64 ID_WIDTH3 = 7 NUM_RD_WS3 = 1'b0 IS_ROM3 = 1'b0 STRB_WIDTH3 = 8 STRB_MAX3 = 7 ID_MAX3 = 6 Found 1-bit register for signal <WsCount3>. Found 1-bit register for signal <iWREADY3>. Found 1-bit register for signal <BvalidNxt3>. Found 1-bit register for signal <iRVALID3>. Found 1-bit register for signal <RvalidEnReg3>. Found 1-bit register for signal <iRLAST3>. Found 1-bit register for signal <WriteSel3>. Found 7-bit register for signal <BID3>. Found 7-bit register for signal <AridReg3>. Found 7-bit register for signal <iRID3>. Found 2-bit subtractor for signal <GND_13_o_GND_13_o_sub_2_OUT>. Found 1-bit 4-to-1 multiplexer for signal <WriteSelNxt3> created at line 271. Summary: inferred 1 Adder/Subtractor(s). inferred 21 D-type flip-flop(s). inferred 5 Multiplexer(s). Unit <IntMemAxi3> synthesized. Synthesizing Unit <IntMemUnpackAddr3>. Related source file is "d:/arbitery/intmemunpackaddr3.v". Found 1-bit register for signal <iAddrLast3>. Found 1-bit register for signal <iAddrValid3>. Found 2-bit register for signal <BurstReg3>. Found 4-bit register for signal <LenReg3>. Found 4-bit register for signal <AddrCount3>. Found 32-bit register for signal <iAddrOut3>. Found 3-bit register for signal <SizeReg3>. Found 4-bit subtractor for signal <AddrCount3[3]_GND_14_o_sub_10_OUT>. Summary: inferred 1 Adder/Subtractor(s). inferred 47 D-type flip-flop(s).

inferred 15 Multiplexer(s). Unit <IntMemUnpackAddr3> synthesized. Synthesizing Unit <IntMemAddrGen3>. Related source file is "d:/arbitery/intmemaddrgen3.v". Found 12-bit adder for signal <IncrAddr3> created at line 95. Found 12-bit 4-to-1 multiplexer for signal <OffsetAddr3> created at line 63. Found 12-bit 4-to-1 multiplexer for signal <CalcAddr3> created at line 67. Summary: inferred 1 Adder/Subtractor(s). inferred 4 Multiplexer(s). Unit <IntMemAddrGen3> synthesized. Synthesizing Unit <Aximaster4>. Related source file is "d:/arbitery/aximaster4.v". INFO:Xst:3010 - "d:/arbitery/aximaster4.v" line 160: Output port <MEMWEn4> of the instance <uIntMemAxi4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster4.v" line 160: Output port <MEMADDR4> of the instance <uIntMemAxi4> is unconnected or connected to loadless signal. INFO:Xst:3010 - "d:/arbitery/aximaster4.v" line 160: Output port <MEMCEn4> of the instance <uIntMemAxi4> is unconnected or connected to loadless signal. Summary: no macro. Unit <Aximaster4> synthesized. Synthesizing Unit <IntMemAxi4>. Related source file is "d:/arbitery/intmemaxi4.v". DATA_WIDTH4 = 64 ID_WIDTH4 = 7 NUM_RD_WS4 = 1'b0 IS_ROM4 = 1'b0 STRB_WIDTH4 = 8 STRB_MAX4 = 7 ID_MAX4 = 6 Found 1-bit register for signal <iWREADY4>. Found 1-bit register for signal <BvalidNxt4>. Found 1-bit register for signal <iRVALID4>. Found 1-bit register for signal <RvalidEnReg4>. Found 1-bit register for signal <iRLAST4>. Found 1-bit register for signal <WriteSel4>. Found 32-bit register for signal <WsCount4>. Found 7-bit register for signal <BID4>. Found 7-bit register for signal <AridReg4>. Found 7-bit register for signal <iRID4>. Found 32-bit subtractor for signal <WsCount4[31]_GND_18_o_sub_2_OUT>. Found 1-bit 4-to-1 multiplexer for signal <WriteSelNxt4> created at line 268. Summary: inferred 1 Adder/Subtractor(s). inferred 52 D-type flip-flop(s). inferred 5 Multiplexer(s). Unit <IntMemAxi4> synthesized. Synthesizing Unit <IntMemUnpackAddr4>. Related source file is "d:/arbitery/intmemunpackaddr4.v". Found 1-bit register for signal <iAddrLast4>. Found 1-bit register for signal <iAddrValid4>. Found 2-bit register for signal <BurstReg4>. Found 4-bit register for signal <LenReg4>.

Found 4-bit register for signal <AddrCount4>. Found 32-bit register for signal <iAddrOut4>. Found 3-bit register for signal <SizeReg4>. Found 4-bit subtractor for signal <AddrCount4[3]_GND_19_o_sub_10_OUT>. Summary: inferred 1 Adder/Subtractor(s). inferred 47 D-type flip-flop(s). inferred 15 Multiplexer(s). Unit <IntMemUnpackAddr4> synthesized. Synthesizing Unit <IntMemAddrGen4>. Related source file is "d:/arbitery/intmemaddrgen4.v". Found 12-bit adder for signal <IncrAddr4> created at line 95. Found 12-bit 4-to-1 multiplexer for signal <OffsetAddr4> created at line 63. Found 12-bit 4-to-1 multiplexer for signal <CalcAddr4> created at line 67. Summary: inferred 1 Adder/Subtractor(s). inferred 4 Multiplexer(s). Unit <IntMemAddrGen4> synthesized. Synthesizing Unit <arbitery_top>. Related source file is "d:/arbitery/arbitery_top.v". Found 4-bit register for signal <grant1>. Summary: inferred 4 D-type flip-flop(s). Unit <arbitery_top> synthesized. Synthesizing Unit <arbiterfsm>. Related source file is "d:/arbitery/arbiterfsm.v". s0 = 2'b01 s1 = 2'b10 s2 = 2'b11 Summary: no macro. Unit <arbiterfsm> synthesized. Synthesizing Unit <lotterytop>. Related source file is "d:/arbitery/lotterytop.v". Summary: no macro. Unit <lotterytop> synthesized. Synthesizing Unit <arbiter>. Related source file is "d:/arbitery/arbiter.v". INFO:Xst:3010 - "d:/arbitery/arbiter.v" line 11: Output port <rand> of the instance <randomgen> is unconnected or connected to loadless signal. Found 5-bit register for signal <sum2>. Found 5-bit register for signal <sum3>. Found 5-bit register for signal <sum4>. Found 5-bit register for signal <sum1>. Found 5-bit adder for signal <n0045> created at line 24. Found 5-bit adder for signal <n0048> created at line 25. Found 5-bit adder for signal <BUS_0001_GND_25_o_add_7_OUT> created at line 26. Found 1x3-bit multiplier for signal <r1_t1[2]_MuLt_1_OUT> created at line 23. Found 1x3-bit multiplier for signal <r2_t2[2]_MuLt_2_OUT> created at line 24. Found 1x3-bit multiplier for signal <r3_t3[2]_MuLt_4_OUT> created at line 25.

Found 1x3-bit multiplier for signal <r4_t4[2]_MuLt_6_OUT> created at line 26. Summary: inferred 4 Multiplier(s). inferred 3 Adder/Subtractor(s). inferred 20 D-type flip-flop(s). Unit <arbiter> synthesized. Synthesizing Unit <random>. Related source file is "d:/arbitery/random.v". Found 5-bit register for signal <rand>. Summary: inferred 5 D-type flip-flop(s). Unit <random> synthesized. Synthesizing Unit <comparator>. Related source file is "d:/arbitery/comparator.v". Found 4-bit register for signal <grant>. Found 5-bit comparator greater for signal <rand[4]_sum1[4]_LessThan_2_o> created at line 9 Found 5-bit comparator greater for signal <sum1[4]_rand[4]_LessThan_3_o> created at line 11 Found 5-bit comparator greater for signal <rand[4]_sum2[4]_LessThan_4_o> created at line 11 Found 5-bit comparator greater for signal <sum2[4]_rand[4]_LessThan_5_o> created at line 13 Found 5-bit comparator greater for signal <rand[4]_sum3[4]_LessThan_6_o> created at line 13 Summary: inferred 4 D-type flip-flop(s). inferred 5 Comparator(s). inferred 1 Multiplexer(s). Unit <comparator> synthesized. Synthesizing Unit <realtimehandler>. Related source file is "d:/arbitery/realtimehandler.v". Found 4-bit register for signal <count2>. Found 4-bit register for signal <count3>. Found 4-bit register for signal <count4>. Found 4-bit register for signal <grant>. Found 1-bit register for signal <dissable>. Found 4-bit register for signal <count1>. Found 4-bit subtractor for signal <GND_29_o_GND_29_o_sub_27_OUT<3:0>>. Found 4-bit subtractor for signal <GND_29_o_GND_29_o_sub_28_OUT<3:0>>. Found 4-bit subtractor for signal <GND_29_o_GND_29_o_sub_29_OUT<3:0>>. Found 4-bit subtractor for signal <GND_29_o_GND_29_o_sub_30_OUT<3:0>>. Summary: inferred 4 Adder/Subtractor(s). inferred 21 D-type flip-flop(s). inferred 3 Multiplexer(s). Unit <realtimehandler> synthesized. Synthesizing Unit <AXISlave>. Related source file is "d:/arbitery/axislave.v". ID_MAX = 6 RUSER_MAX = 31 BUSER_MAX = 31 DATA_MAX = 7 STRB_MAX = 7 MEM_SIZE = 63 S_WAIT_REQ = 4'b0001

S_PRE_DATA = 4'b1100 S_SEND_DATA = 4'b0010 S_RECV_DATA = 4'b0100 S_SEND_ACK = 4'b1000 Found 64x64-bit single-port RAM <Mram_mem> for signal <mem>. Found 1-bit register for signal <RVALID>. Found 1-bit register for signal <WREADY>. Found 1-bit register for signal <BVALID>. Found 1-bit register for signal <RLAST>. Found 1-bit register for signal <ARREADY>. Found 1-bit register for signal <AWREADY>. Found 4-bit register for signal <len>. Found 32-bit register for signal <org_addr>. Found 4-bit register for signal <state>. Found finite state machine <FSM_0> for signal <state>. ----------------------------------------------------------------------| States | 5 | | Transitions | 12 | | Inputs | 8 | | Outputs | 5 | | Clock | ACLK (rising_edge) | | Reset | ARESETn (negative) | | Reset type | asynchronous | | Reset State | 0001 | | Encoding | auto | | Implementation | LUT | ----------------------------------------------------------------------Found 4-bit subtractor for signal <len[3]_GND_30_o_sub_29_OUT>. Found 32-bit adder for signal <org_addr[31]_GND_30_o_add_27_OUT> created at line 235. Found 32-bit shifter logical right for signal <ARADDR[31]_ARSIZE[2]_shift_right_19_OUT> created at line 207 Found 32-bit shifter logical left for signal <ARADDR[31]_ARSIZE[2]_shift_left_20_OUT> created at line 207 Found 32-bit shifter logical right for signal <AWADDR[31]_AWSIZE[2]_shift_right_21_OUT> created at line 212 Found 32-bit shifter logical left for signal <AWADDR[31]_AWSIZE[2]_shift_left_22_OUT> created at line 212 Found 32-bit shifter logical left for signal <n0149> created at line 235 Summary: inferred 1 RAM(s). inferred 2 Adder/Subtractor(s). inferred 42 D-type flip-flop(s). inferred 25 Multiplexer(s). inferred 5 Combinational logic shifter(s). inferred 1 Finite State Machine(s). Unit <AXISlave> synthesized. ========================================================================= HDL Synthesis Report Macro Statistics # RAMs : 1 64x64-bit single-port RAM : 1 # Multipliers : 4 3x1-bit multiplier : 4 # Adders/Subtractors : 29 12-bit adder : 8 2-bit subtractor : 3

32-bit adder : 1 32-bit subtractor : 1 4-bit subtractor : 13 5-bit adder : 3 # Registers : 113 1-bit register : 49 2-bit register : 8 3-bit register : 8 32-bit register : 10 4-bit register : 24 5-bit register : 5 7-bit register : 9 # Comparators : 5 5-bit comparator greater : 5 # Multiplexers : 203 1-bit 2-to-1 multiplexer : 130 1-bit 4-to-1 multiplexer : 4 12-bit 2-to-1 multiplexer : 16 12-bit 4-to-1 multiplexer : 16 32-bit 2-to-1 multiplexer : 17 4-bit 2-to-1 multiplexer : 20 # Logic shifters : 5 32-bit shifter logical left : 3 32-bit shifter logical right : 2 # FSMs : 1 ========================================================================= ========================================================================= * Advanced HDL Synthesis * ========================================================================= Synthesizing (advanced) Unit <AXISlave>. INFO:Xst:3031 - HDL ADVISOR - The RAM <Mram_mem> will be implemented on LUTs either because you have described an asynchronous read or because of currently unsupported block RAM features. If you have described an asynchronous read, making it synchronous would allow you to take advantage of available block RAM resources, for optimized device usage and improved timings. Please refer to your documentation for coding guidelines. ----------------------------------------------------------------------| ram_type | Distributed | | ----------------------------------------------------------------------| Port A | | aspect ratio | 64-word x 64-bit | | | clkA | connected to signal <ACLK> | rise | | weA | connected to internal node | high | | addrA | connected to signal <org_addr<8:3>> | | | diA | connected to signal <(addr[5]_mask[63]_and_7_OUT<63:8>,_n0212)> | | | doA | connected to signal <RDATA> | | ----------------------------------------------------------------------Unit <AXISlave> synthesized (advanced). Synthesizing (advanced) Unit <arbiter>. Multiplier <Mmult_r4_t4[2]_MuLt_6_OUT> in block <arbiter> and adder/subtractor <Madd_BUS_0001_GND_25_o_add_7_OUT> in block <arbiter> are combined into a MAC<Maddsub_r4_t4[2]_MuLt_6_OUT>. The following registers are also absorbed by the MAC: <sum4> in block <arbiter>.

Multiplier <Mmult_r3_t3[2]_MuLt_4_OUT> in block <arbiter> and adder/subtractor <Madd_n0048> in block <arbiter> are combined into a MAC<Maddsub_r3_t3[2]_MuLt_4_OUT>. Multiplier <Mmult_r2_t2[2]_MuLt_2_OUT> in block <arbiter> and adder/subtractor <Madd_n0045> in block <arbiter> are combined into a MAC<Maddsub_r2_t2[2]_MuLt_2_OUT>. Unit <arbiter> synthesized (advanced). Synthesizing (advanced) Unit <realtimehandler>. The following registers are absorbed into counter <count2>: 1 register on signal <count2>. The following registers are absorbed into counter <count3>: 1 register on signal <count3>. The following registers are absorbed into counter <count1>: 1 register on signal <count1>. The following registers are absorbed into counter <count4>: 1 register on signal <count4>. Unit <realtimehandler> synthesized (advanced). ========================================================================= Advanced HDL Synthesis Report Macro Statistics # RAMs : 1 64x64-bit single-port distributed RAM : 1 # MACs : 3 3x1-to-5-bit MAC : 3 # Multipliers : 1 3x1-bit multiplier : 1 # Adders/Subtractors : 22 1-bit subtractor : 3 12-bit adder : 8 32-bit adder : 1 32-bit subtractor : 1 4-bit subtractor : 9 # Counters : 4 4-bit down counter : 4 # Registers : 571 Flip-Flops : 571 # Comparators : 5 5-bit comparator greater : 5 # Multiplexers : 474 1-bit 2-to-1 multiplexer : 418 1-bit 4-to-1 multiplexer : 4 12-bit 2-to-1 multiplexer : 16 12-bit 4-to-1 multiplexer : 16 32-bit 2-to-1 multiplexer : 9 4-bit 2-to-1 multiplexer : 11 # Logic shifters : 5 32-bit shifter logical left : 3 32-bit shifter logical right : 2 # FSMs : 1 ========================================================================= ========================================================================= * Low Level Synthesis * ========================================================================= Analyzing FSM <MFsm> for best encoding. Optimizing FSM <Slave/state> on signal <state[1:3]> with gray encoding. ------------------State | Encoding

------------------0001 | 000 1100 | 001 0010 | 011 0100 | 010 1000 | 110 ------------------Optimizing unit <msarbitery_top> ... Optimizing unit <IntMemAxi1> ... Optimizing unit <IntMemUnpackAddr1> ... Optimizing unit <IntMemAddrGen1> ... Optimizing unit <AXISlave> ... Optimizing unit <IntMemAxi2> ... Optimizing unit <IntMemUnpackAddr2> ... Optimizing unit <IntMemAddrGen2> ... Optimizing unit <IntMemAxi3> ... Optimizing unit <IntMemUnpackAddr3> ... Optimizing unit <IntMemAddrGen3> ... Optimizing unit <IntMemAxi4> ... Optimizing unit <IntMemUnpackAddr4> ... Optimizing unit <IntMemAddrGen4> ... Optimizing unit <arbitery_top> ... Optimizing unit <arbiter> ... Optimizing unit <realtimehandler> ... Final Macro Processing ... ========================================================================= Final Register Report Macro Statistics # Registers : 43 Flip-Flops : 43 ========================================================================= ========================================================================= * Partition Report * ========================================================================= Partition Implementation Status ------------------------------No Partitions were found in this design. ------------------------------========================================================================= * Design Summary * ========================================================================= Top Level Output File Name : msarbitery_top.ngc Primitive and Black Box Usage: -----------------------------# BELS : 54 # GND : 1 # INV : 5 # LUT2 : 11 # LUT3 : 7 # LUT4 : 14 # LUT5 : 5 # LUT6 : 10 # VCC : 1 # FlipFlops/Latches : 43 # FD : 7 # FDC : 14 # FDE : 16

# FDP : 1 # FDR : 4 # FDS : 1 # Clock Buffers : 1 # BUFG : 1 # IO Buffers : 86 # IBUF : 14 # OBUF : 72 Device utilization summary: --------------------------Selected Device : 6slx100tcsg484-3 Slice Logic Utilization: Number of Slice Registers: 43 out of 126576 0% Number of Slice LUTs: 52 out of 63288 0% Number used as Logic: 52 out of 63288 0% Slice Logic Distribution: Number of LUT Flip Flop pairs used: 53 Number with an unused Flip Flop: 10 out of 53 18% Number with an unused LUT: 1 out of 53 1% Number of fully used LUT-FF pairs: 42 out of 53 79% Number of unique control sets: 8 IO Utilization: Number of IOs: 490 Number of bonded IOBs: 86 out of 296 29% Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 1 out of 16 6% --------------------------Partition Resource Summary: --------------------------No Partitions were found in this design. --------------------------========================================================================= Timing Report NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT GENERATED AFTER PLACE-and-ROUTE. Clock Information: ------------------------------------------------------------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -------------------------------------------------------------------+------------------------+-------+ ACLK | IBUF+BUFG | 40 | a1/l2/a1/clk_dissable_AND_197_o(a1/l2/a1/clk_dissable_AND_197_o1:O)| NONE(*)(a1/l2/a1/sum1_2)| 3 | -------------------------------------------------------------------+------------------------+-------+ (*) This 1 clock signal(s) are generated by combinatorial logic, and XST is not able to identify which are the primary clock signals. Please use the CLOCK_SIGNAL constraint to specify the clock signal(s) generated by combinatorial logic. INFO:Xst:2169 - HDL ADVISOR - Some clock signals were not automatically buffered by XST with BUFG/BUFR resources. Please use the buffer_type

constraint in order to insert these buffers to the clock signals to help prevent skew problems. Asynchronous Control Signals Information: ---------------------------------------No asynchronous control signals found in this design Timing Summary: --------------Speed Grade: -3 Minimum period: 2.997ns (Maximum Frequency: 333.695MHz) Minimum input arrival time before clock: 3.570ns Maximum output required time after clock: 3.856ns Maximum combinational path delay: No path found Timing Details: --------------All values displayed in nanoseconds (ns) ========================================================================= Timing constraint: Default period analysis for Clock 'ACLK' Clock period: 2.997ns (frequency: 333.695MHz) Total number of paths / destination ports: 169 / 38 ------------------------------------------------------------------------Delay: 2.997ns (Levels of Logic = 2) Source: m1/uIntMemAxi1/WsCount1 (FF) Destination: m1/uIntMemAxi1/uArIntMemUnpackAddr1/AddrCount1_3 (FF) Source Clock: ACLK rising Destination Clock: ACLK rising Data Path: m1/uIntMemAxi1/WsCount1 to m1/uIntMemAxi1/uArIntMemUnpackAddr1/AddrCount1_3 Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------FDC:C->Q 5 0.525 1.145 m1/uIntMemAxi1/WsCount1 (m1/uIntMemAxi1/WsCount1) LUT5:I0->O 6 0.254 0.745 m1/uIntMemAxi1/WriteSel1_iRVALID1_AND_48_o11 (m1/uIntMemAxi1/MemArReady1) LUT5:I4->O 1 0.254 0.000 m1/uIntMemAxi1/uArIntMemUnpackAddr1/Mmux_AddrCountNxt111 (m1/uIntMemAxi1/uArIntMemUnpackAddr1/AddrCountNxt1<0>) FDC:D 0.074 m1/uIntMemAxi1/uArIntMemUnpackAddr1/AddrCount1_0 ---------------------------------------Total 2.997ns (1.107ns logic, 1.890ns route) (36.9% logic, 63.1% route) ========================================================================= Timing constraint: Default OFFSET IN BEFORE for Clock 'ACLK' Total number of paths / destination ports: 78 / 39 ------------------------------------------------------------------------Offset: 3.570ns (Levels of Logic = 2) Source: ARESETn (PAD) Destination: m1/uIntMemAxi1/WsCount1 (FF) Destination Clock: ACLK rising Data Path: ARESETn to m1/uIntMemAxi1/WsCount1 Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------IBUF:I->O 1 1.228 0.579 ARESETn_IBUF (ARESETn_IBUF) INV:I->O 18 0.255 1.049 m1/uIntMemAxi1/ARESETn_inv1_INV_0 (Slave/ARESETn_inv)

FDC:CLR 0.459 Slave/WREADY ---------------------------------------Total 3.570ns (1.942ns logic, 1.628ns route) (54.4% logic, 45.6% route) ========================================================================= Timing constraint: Default OFFSET IN BEFORE for Clock 'a1/l2/a1/clk_dissable_AND_197_o' Total number of paths / destination ports: 9 / 6 ------------------------------------------------------------------------Offset: 3.570ns (Levels of Logic = 2) Source: ARESETn (PAD) Destination: a1/l2/a1/sum1_2 (FF) Destination Clock: a1/l2/a1/clk_dissable_AND_197_o rising Data Path: ARESETn to a1/l2/a1/sum1_2 Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------IBUF:I->O 1 1.228 0.579 ARESETn_IBUF (ARESETn_IBUF) INV:I->O 18 0.255 1.049 m1/uIntMemAxi1/ARESETn_inv1_INV_0 (Slave/ARESETn_inv) FDR:R 0.459 a1/l2/a1/sum1_0 ---------------------------------------Total 3.570ns (1.942ns logic, 1.628ns route) (54.4% logic, 45.6% route) ========================================================================= Timing constraint: Default OFFSET OUT AFTER for Clock 'ACLK' Total number of paths / destination ports: 3 / 3 ------------------------------------------------------------------------Offset: 3.856ns (Levels of Logic = 1) Source: Slave/WREADY (FF) Destination: WREADY (PAD) Source Clock: ACLK rising Data Path: Slave/WREADY to WREADY Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- -----------FDC:C->Q 2 0.525 0.616 Slave/WREADY (Slave/WREADY) OBUF:I->O 2.715 WREADY_OBUF (WREADY) ---------------------------------------Total 3.856ns (3.240ns logic, 0.616ns route) (84.0% logic, 16.0% route) ========================================================================= Total REAL time to Xst completion: 33.00 secs Total CPU time to Xst completion: 33.03 secs --> Total memory usage Number of errors Number of warnings Number of infos is 304816 kilobytes : 0 ( 0 filtered) : 777 ( 0 filtered) : 47 ( 0 filtered)

SIMULATION RESULTS

Arbiter Simulation Result

LOTTERY ARBITER SIMULATION RESULT

MASTER SIMULATION RESULT

RTL Schematics For A Lottery-based Bandwidth Guaranteed and Low Latency Arbiter for On-Chip Bus

RTL Schematics For A Lottery-based Bandwidth Guaranteed and Low Latency Arbiter for On-Chip Bus

CONCLUSION
The two-level Lottery-based bus arbitration algorithm, which is called RB_Lottery arbitration algorithm, is proposed in this paper. The proposed bus arbitration solves the impartiality, starvation and real-time problems, which exist in the previous Lottery method, and reduces the average latency for bus requests. Within hardware verifications, the proposed arbiter processes higher operation frequency than the Lottery arbiter. Although the proposed arbiter pays more chip area and power consumptions than the Lottery arbiter, the software simulation results show that the proposed RB_ Lottery algorithm has better performance of bandwidth guarantees, and has less average latency of bus requests than the Lottery arbitration.

REFERENCES
www.xilinx.com www.ieeexplore.ieee.org

S-ar putea să vă placă și