Sunteți pe pagina 1din 22

1

1) Differentiate between. a) Harvard and Princeton architecture. b) Microcontrollers and DSPs.

a. Harvard and Princeton architecture


Harvard Princeton

Data memory and program memory is Data and program words share the same distinct memory space It requires two connections which results in It results in simple hardware connection to improved performance. memory, since only one connection is required More memory wires Fewer memory wires

Simultaneous program and data memory Program and data memory are accessed access separately

b.Microcontroller and DSPs


Microcontroller It special purpose controller DSP It is a processor that is highly optimized for processing large amount of data

It consists of ADCs, timers,serial It consists of ADCs ,DACs,PWM,timers communication devices on same IC . ,counters,direct memory access controllers on same IC They provide specialized instructions for They provide instructions that are central to common embedded system control DSPs such as filtering and transformation operations such as bit- manipulations

2) What are the characteristics of an embedded system?


Nagaraj N.K. 1MS05EC048

Embedded systems have several common characteristics:


1) Single-functioned: An embedded system usually executes only one program,

repeatedly. For example, a pager is always a pager. In contrast, a desktop system executes a variety of programs, like spreadsheets, word processors, and video games, with new programs added frequently. 2) Tightly constrained: All computing systems have constraints on design metrics, but those on embedded systems can be especially tight. A design metric is a measure of an implementations features, such as cost, size, performance, and power. Embedded systems often must cost just a few dollars, must be sized to fit on a single chip, must perform fast enough to process data in real-time, and must consume minimum power to extend battery life or prevent the necessity of a cooling fan. 3) Reactive and real-time: Many embedded systems must continually react to changes in the systems environment, and must compute certain results in real time without delay. For example, a car's cruise controller continually monitors and reacts to speed and brake sensors. It must compute acceleration or decelerations amounts repeatedly within a limited time; a delayed computation result could result in a failure to maintain control of the car. In contrast, a desktop system typically focuses on computations, with relatively infrequent (from the computers perspective) reactions to input devices. In addition, a delay in those computations, while perhaps inconvenient to the computer user, typically does not result in a system failure.

2) Compare GPP, SPP and ASIP along with their block diagrams and any two differences.
General purpose Processor(GPP)

Control logic and Controlle state rregister

IR

P C

Regist er File Data path Gener al ALU Data Memory

Program Memory Assembl Progr y code for : Total=0 for I=1 to Nagaraj N.K.

1MS05EC048

Programmable device used in a variety of applications -also known as microprocessor

Features -Program memory -General data path with large register file and general ALU

User benefits -Low time to market and NRE costs -High flexibility

Pentium the most well known, but there are hundreds of others

Single Purpose processor(SPP):

Control Logic Controlle State Logic

Inde x Data path Total +

Data Memory

Digital circuit(Hardware) Designed to execute exactly one program Ex: JPEG codec Also known as coprocessor accelerator or peripheral

Features Contains only the components needed to execute a single program No program memory

Nagaraj N.K.

1MS05EC048

Benefits Fast Low power Small size

Application Specific Instruction Set Processor (ASIP):

Control logic and Controlle state rregister

IR

P C

Regist er File Data path Gener al ALU Data Memory

Program Memory Assembl y code for : Total=0 for I=1 to

Programmable processor optimized for a particular class of applications having common characteristics such as embedded control digital signal processing or telecommunications Ex. Microcontrollers and digital signal processor. Compromise between general purpose and signal purpose processors Features Program memory Optimized datapath Special functional units Benefits Some flexibility good performance size and power

Nagaraj N.K.

1MS05EC048

4) Explain the various metrics that need to be optimized while designing an embedded system?
A design metric is a measurable feature of a systems implementation. Commonly used metrics include: a) NRE Cost (Non-recurring engineering cost): The one time monetary cost of designing the system. Once the systems is designed any number of units can be manufactured without incurring any additional design cost; hence the term nonrecurring.

b) Unit cost : The physical space required by the system, often measured in bytes for software, and gates or transistors for hardware. c) Performance: The execution time of the system.

d) Power : The amount of power consumed by the system, which may determine the lifetime of a battery, or the cooling requirements of the IC, since more power means more heat. e) Flexibility : The ability to change the functionality of the system without incurring heavy NRE cost. Software is typically considered very flexible. f) Time to prototype: The time needed to build a working version of the system, which may be bigger or more expensive than the final system implementation, but it can be used to verify the systems usefulness and correctness and to refine the systems functionality. g) Time to market :
Nagaraj N.K. 1MS05EC048

The time required to develop a system to the point that it can be released and sold to customers. The main contributors are design time, manufacturing time, and testing time. h) Maintainability: The ability to modify the system after its initial release, especially by designers who did not originally design the system. i) Correctness: Our confidence that we have implemented the systems functionality correctly. We can check the functionality throughout the process of designing the system and we can insert test circuitry to check that manufacturing was correct. j) Safety : The probability that the system will not cause harm.

5)

What is a market window? Why is it important for products to reach early in the market? Justify

The time-to market constraint has become especially demanding in recent years. Introducing an embedded system to the marketplace early can make a big difference in the systems profitability, since market time-windows for products are becoming quite short, often measured in months. Missing this window can mean significant loss in sales. In some cases, each day that a product is delayed from introduction to the market can translate to a one million dollar loss.The average time-to-market constraint has been reported as having shrunk to only 8 months.

Nagaraj N.K.

1MS05EC048

6) Assume 8 bit encoding of input voltage ranging from -5v to +5v. find encoding for 1.2v then trace using the successive approximation approach. Find resolution of the conversion. Extend the ratio and resolution equations to any voltage in the range of Vmin to Vmax.
a. Expected output:

d/ ((2^n)-1) = (e-Vmin) / (Vmax-Vmin)

d=158.1 =9E h =10011110

b. Resolution = (Vmax-Vmin) / ((2^n)-1)

=>10/255 =0.0392

c.

Successive approximation approach


E
1.2 > (5-5)/2 1.2 < (5+0)/2 1.2 < (2.5+0)/2 1.2 > (1.25+0)/2 1.2 > (1.25+0.625)/2 1.2 > (1.25+0.9375)/2 1.2 > (1.25+1.09375)/2

d(8bit encoding)
10000000 10000000 10000000 10010000 10011000 10011100 10011110

Nagaraj N.K.

1MS05EC048

8 1.2 < (1.25+1.171875)/2 10011110

7) Explain the features of flash memory, SRAM and OTP ROM


Flash memory is an extension of EEPROM that was developed in the late 1980s.while also using the floating gate principle of EEPROM ,flash memory is designed such that large blocks of memory can be erased all at once, rather than just one word at a time as in traditional EEROM.A block is typically several thousands bytes large .this fast erase ability can vastly improve the performance of embedded systems where large data items must be stored in non volatile memory, systems like digital cameras, cell phones and tv stb. It can also speed manufacturing throughtput,since programming the complete contents of flash may be faster than programming ,a similar sized EEROM .like EEPROM ,each block in a flash memory can typically be erased and reprogrammed ten thousand times . SRAM : static ram uses a memory cell, it consists of flip flop to store a bit. Eaxh bit thus requires about six transistors .this ram type is called static because it will hold its data as long as power is supplied, in contrast to dynamic ram.static ram is typically used for high performance parts of a system. OTP ROM :The most basic PROM uses a fuse for each programmable connection. To program a PROM device, the user provides a file that indicates the desired rom contents .a piece of equipment called a rom programmer then configures each programmable connection according to the file . note that here the programmer is a piece of equipment ,not a person who writes software. the rom programmer blows fuses by passing a large current wherever a connection should not exist. However, once a fuse is blown ,the connection can never be re established .for this reason ,the basic PROM is often referred to as one time programmable rom or OTP ROM . OTP ROMS have the lowest write ability of all PROMS ,since they can be written only once and they require programmer device. They have high storage permanence hence they are used in final products . OTP ROM are cheaper hence make more attractive in final products.

8) Mov rn,direct Mov direct,rn Mov @rn,rm Mov rn,#immed


Nagaraj N.K. 1MS05EC048

Add rn,rm Sub rn,rm Jz rn, relative Add one instruction to the instruction set shown that would reduce the size of summing assembly program by one instruction .show the reduced program
Reduced program is a follows Mov ro,#0 Mov r1,#10 Mov r2,#01 Mov r3,#0 Loc1: jz r1,next Add r0,r1 Sub r1,r2 Jz r3,loc1 Next: // next instruction

9) Explain the terms:


Dhrystone Benchmark: One attempt to provide a means for a fairer comparison is the Dhrystone benchmark. A benchmark is a program intended to be run on different processors to compare their performance. The Dhrystone benchmark was developed in 1984 by Reinhold Weicker specifically as a performance benchmark; it performs no useful work. It focuses on exercising a processors integer arithmetic and string-handling capabilities. It is written in C and in the public domain. Since most processors can execute it in milliseconds, it is typically executed thousands of times, and thus a processor is said to be able to execute so many Dhrystones per second. Linker: A linker allows a programmer to create program in separately assembled or complied files. It combines the machine instructions of each into a single program, perhaps
Nagaraj N.K. 1MS05EC048

10

incorporating instructions from standard library routines. A linker designed for embedded processors will also try to eliminate binary code associated with uncalled procedures and functions as well as memory alloated to unused variables in order to reduce the overall program footprint. Moores law:

A trend related to ICs : IC transistor capacity has doubled roughly every 18 months for the past several decades. This trend is shown in the fig below. It was actually predicted way back in 1965 by Intel cofounder Gordon Moore. He predicted that semiconductor transistor density would double every 18 to 24 months. This trend is therefore known as Moores Law. Moore recently predicted about another decade before such growth slows down. This trend is mainly caused by improvement in IC manufacturing that results in smaller parts, such as transistor parts and wires, on the surface of the IC. The minimum part size, commonly known as feature size, for a CMOS IC in 2002 is about 130nanometers.

10) In a successive approximation ADC, calculate the correct of 5v given an analog input voltage range from 0 to +15v and an 8bit digital encoding. Also determine the resolution of this ADC.
a. Expected output:

d/ ((2^n)-1) = (e-Vmin) / (Vmax-Vmin)

d=85 or 01010101

b. Resolution= (Vmax-Vmin) / ((2^n)-1)

=>15 /255
Nagaraj N.K. 1MS05EC048

11

=>0.0588

c. Successive approximation approach

E 5 < (15+0)/2 5 > (7.5+0)/2 5 < (7.5+3.75)/2 5 > (5.625+3.75)/2 5 < (5.625+4.6875)/2 5 > (5.15625+4.6875)/2 5 < (5.15625+4.921875)/2 5 > (5.0390625+4.921875)/2

d(8bit encoding) 00000000 01000000 01000000 01010000 01010000 01010100 01010100 01010101

11) Determine the range and resolution of a 16 bit timer which operates at a clock frequency of 10MHz and generates and overflow signal when it reaches FFFF. Calculate the terminal count value for measuring a 3msec time interval. What is the minimum division needed in a prescalar for measuring 100ms.

Nagaraj N.K.

1MS05EC048

12

Resolution = 1/10MHz = 100ns.

Maximum count duration = 216*100n= 6.5536ms.

The terminal count = (3m/100n)-1= 0.03 M-1 = 30000-1 = 29999.

The minimum prescalar required to measure 100ms duration:

100m/6.5536m = 15.258 Therefore we need a clock division of 16 times to the original clock.

12) With diagram explain the direct mapping technique for cache.
Cache mapping is the method for assigning main memory addresses to the far fewer number of available cache addresses, and for determining whether a particular main memory address contents are in the cache. Direct mapping: In this technique, the main memory address is divided into two fields, the index and the tag. The index represents the cache address, and thus the number of index bits is determined by the cache size, i.e., index size = log2(cache size). Note that many different main memory addresses will map to the same cache address. When we store a main memory address content in the cache, we also store the tag. To determine if a desired main memory address is in the cache, we go to the cache address indicated by the index, and we then compare the tag there with the desired tag. Direct-mapped caches are easy to implement, but may result in numerous misses if two or more words with the same index are accessed frequently, since each will bump the other out of the cache. Fully-associative caches on the other hand are fast but the comparison logic is expensive to implement. Set-associative caches can reduce missescompared to direct-mapped caches, without requiring nearly as much comparison logic as fully-associative caches. Caches are usually designed to treat collections of a small number of adjacent main memory addresses as one indivisible block, typically consisting of about 8 addresses.

Nagaraj N.K.

1MS05EC048

13

13) Explain how UART is used for communication highlighting the advantages of UART.
A UART (Universal Asynchronous Receiver/Transmitter) receives serial data and stores it as parallel data (usually one byte), and takes parallel data and transmits it as serial data. The principles of serial communication appear in a later chapter. Such serial communication is beneficial when we need to communicate bytes of data between devices separated by long distances, or when we simply have few available I/O pins. Principles of serial communication will be discussed in a later chapter. For our purpose in this section, we must be aware that we must set the transmission and reception rate, called the baud rate, which indicates the frequency that the signal changes. Common rates include 2400, 4800, 9600, and 19.2k. We must also be aware that an extra bit may be added to each data word, called parity, to detect transmission errors -- the parity bit is set to high or low to indicate if the word has an even or odd number of bits. Internally, a simple UART may possess a baud-rate configuration register, and two independently operating processors, one for receiving and the other for transmitting. The transmitter may possess a register, often called a transmit buffer, that holds data to be sent. This register is a shift register, so the data can be transmitted one bit at a time by shifting at the appropriate rate. Likewise, the receiver receives data into a shift register, and then this data can be read in parallel. Note that in order to shift at the appropriate rate, based on the configuration register, a UART requires a timer. To use a UART, we must configure its baud rate by writing to the configuration register, and then we must write data to the transmit register and/or read data from the received register. Unfortunately, configuring the baud rate is usually not as simple as writing the desired rate (e.g., 4800) to a register. smod corresponds to 2 bits in a special-function register, oscfreq is the frequency of the oscillator, and TH1 is an 8-bit rate register of a built-in timer. Note that we could use a general-purpose processor to implement a UART completely in software. If we used a dedicated general-processor, the implementation would be inefficient in terms of size. We could alternatively integrate the transmit and receive functionality with our main program. This would require creating a routine to send data serially over an I/O port, making use of a timer to control the rate. It would also require using an interrupt service routine to capture serial data coming from another I/O port whenever such data begins arriving. However, as with the timer functionality, adding send and receive functionality can detract from time for other computations. Knowing the number of cycles that each instruction requires, we could write a loop that executed the desired number of instructions; when this loop completes, we
Nagaraj N.K. 1MS05EC048

14

know that the desired time passed. This implementation of a timer on a dedicated generalpurpose processor is obviously quite inefficient in terms of size. One could alternatively incorporate the timer functionality into a main program, but the timer functionality then occupies much of the programs run time, leaving little time for other computations. Thus, the benefit of assigning timer functionality to a special-purpose processor becomes evident.

Nagaraj N.K.

1MS05EC048

15

14) Explain the various events that take place when a processor executes an instruction. Explain how pipelining improves the execution speed.
A microprocessors execution of instructions consists of several basic stages: 1. Fetch instruction: the task of reading the next instruction from memory into the instruction register. 2. Decode instruction: the task of determining what operation the instruction in the instruction register represents (e.g., add, move, etc.). 3. Fetch operands: the task of moving the instructions operand data into appropriate registers. 4. Execute operation: the task of feeding the appropriate registers through the ALU and back into an appropriate register. 5. Store results: the task of writing a register into memory. If each stage takes one clock cycle, then we see that a single instruction may take several cycles to complete. Pipelining is a common way to increase the instruction throughput of a microprocessor.We make a simple analogy of two people approaching the chore of washing and drying 8 dishes. In one approach, the _rst person washes all 8 dishes, and then the second person dries all 8 dishes. Assuming 1 minute per dish per person, this approach requires 16 minutes. The approach is clearly incident since at any time only one person is working and the other.

15) Differentiate between the followings .Single purpose and General purpose processors .Full custom and PLD technologies
Single purpose and General purpose processors

Single purpose processor General purpose processor Is a digital system intended to solve a Is intended to solve a wide variety of specific computation task. Computation task.
Nagaraj N.K. 1MS05EC048

16

Examples of hardware: JPEG codec(joint photographic Exports Group), GCD custom_SPP, timers, counters. User does only processor design; no software, Hence no program memory.

Example of hardware: Microcontroller. User write only software; no processor design

DESIGN METRIC differences: Design metric Performance Size Power consumption Time-to-market NRE cost flexibility GPP Slow Large More Less Less More SPP Fast Small Less More More less

Full custom IC and PLD technologies

Full custom IC

All layers are optimized for an embedded systems particular digital implementation -placing transistors -sizing transistors -routing wires

Benefits -Excellent performance,small size,low power

Draw backs -High NRE cost(e.g.,$300,000),long time-to-market

Nagaraj N.K.

1MS05EC048

17

PLD(programmable logic device)

All layers already exist -Designers can purchase an IC -Connections on the IC are either created or destroyed to implement desired functionality -FPGA very popular

Benefits -low NRE costs,almost instant IC availability

Drawbacks -bigger,expansive(perhaps $30 per unit),power hungry,slower

16) Derive the equation for percentage revenue loss ofr any market rise angle . A product was delayed by 4 weeks in releasing to market.The peak revenue for the product for on time entry to market would occur after 20 weeks for market rise angle of 45 . Determine the revenue loss?
To derive: % revenue loss=D(3W-D)/2W*100% To compute the revenue loss from delayed entery.use the simplified revenue module
Peak revenue On time Market rise Peak revenue from delayed entry Market fall Delayed Time D On time Entry Delayed entry 1MS05EC048 W 2W

Nagaraj N.K.

18

2w=product life time D=Time to market %Revenue loss=(A on time-Adelayed/A on time)*100% ={1/2(2W*Wtan)-1/2[(2W-D)*(W-D)tan] }/1/2[2W-Wtan] ={2Wtan-2Wtan+2WDtan +DWtan-Dtan}/2Wtan ={3WD-D/2W}*100% %Revenue loss= D(3W-D)/2W*100% Given data: D=4weeks 2W=40 weeks %revenue loss=[{4(3*20-4)}/2*20]*100% =28% Thus a delay of just 4 weeks results in a revenue loss of 28%.

17) Describe pipelining technique. What is the speed factor?


Pipelining: It is the common way to increase the instruction throughtput of microprocessor. By using a separate unit for each stage ,we can pipeline instruction execution .after the instruction fetch unit fetches the first instruction ,the decode unit decodes it while the instruction fetch unit fetch unit simultaneously fetches the next instruction . modern pipelined processors have branch predictors built-in. The idea of the pipelining is as shown below

Nagaraj N.K.

1MS05EC048

19

Pipelining to work well, instruction execution must be decomposable into roughly equal length stages and instructions should each require the same number of cycles. Speedup factor : Is a common method of comparing the performance of two systems.the speedup of the system A over system B is determined simply as : Speedup of A over B=performance of A/performance of B. Suppose the speedup of camera A over camera B is 2.then we can also say that A is 2times faster than B and B is 2times slower than A.

18) Explain pipelining. If 6000 instructions are to be executed using a 4 stage pipelined processor at a clock frequency of 12 MHz, determine the speedup of the pipelined processor when compared to non-pipelined processor.
Pipelining is a common way to increase the instruction throughput of a microprocessor. Lets say we split the instruction execution into four stages namely fetch, decode, execute and store. In pipelining, after the instruction fetch unit fetches the first instruction, the decode unit decodes it while the instruction fetch unit simultaneously fetches the next instruction.

For pipelining to work well, Instruction execution must be decomposable into roughly equal length stages. Each instruction must require the same number of cycles.

Branching poses a problem for pipelining as we dont know the next instruction until the execution stage of the branch instruction is complete. A few solutions are as follows

Nagaraj N.K.

1MS05EC048

20

When there is a branch in the pipeline stop fetching and wait for the branch instruction to complete its execution and then fetch the correct instruction. Take a guess which way the branch might go and continue pipelining. If right, then carry on, else, ignore all the instructions fetched after the branch instruction thus incurring a penalty. Modern pipelined microprocessors have a very sophisticated branch predictors built in.

Given: 4 stage pipeline, 6000 instructions, 12 MHz clock Pipelined execution time :6003 cycles required for execution of 6000 instructions. 6003*(1/12MHz)= 500.25s. Non-pipelined execution time:6000*4 cycles required for execution of 6000 instructions. 24000*(1/12MHz)=2ms.

The pipelined processor is( 2ms/500.25s)= 3.998 approx. 4 times faster.

19) Define the following:


Assembler: Assemblers translate assembly instructions to binary machine instructions. In addition to just replacing opcode and operand mnemonics by binary equivalents, an assembler may also translate symbolic labels into actual addresses. For example, a programmer may add a symbolic label END to an instruction A, and may reference END in a branch instruction. The assembler determines the actual binary address of A, and replaces references to END by this address. The mapping of assembly instructions to machine instructions is one-to-one.
Nagaraj N.K. 1MS05EC048

21

Linker: A linker allows a programmer to create program in separately assembled or complied files. It combines the machine instructions of each into a single program, perhaps incorporating instructions from standard library routines. A linker designed for embedded processors will also try to eliminate binary code associated with uncalled procedures and functions as well as memory allocated to unused variables in order to reduce the overall program footprint. Debugger: Debuggers help programmers evaluate and correct their programs. They run on the development processor and support stepwise program execution, executing one instruction and then stopping, proceeding to the next instruction when instructed by the user. They permit execution up to user-specified breakpoints, which are instructions that when encountered cause the program to stop executing. Whenever the program stops, the user can examine values of various memory and register locations. A source-level debugger enables step-by-step execution in the source program language, whether assembly language or a structured language. A good debugging capability is crucial, as todays programs can be quite complex and hard to write correctly. Since debuggers are programs tat run on your development processor but execute code designed for your target processor. These debuggers are also known as instruction-set simulators (ISS) or virtual machines.

Emulator: Emulators support debugging of the program while it executes on the target processor. An emulator typically consists of a debugger coupled with a board connected to the desktop processor via a cable. The board consists of the target processor plus some support circuitry (often another processor). The board may have another cable with a device having the same pin configuration as the target processor, allowing one to plug this device into a real embedded system. Such an in-circuit emulator enables one to control and monitor the programs execution in the actual embedded system circuit. Incircuit emulators are available for nearly any processor intended for embedded use, though they can be quite expensive if they are to run at real speeds.

20) The Analog input voltage range from -5 to +5v for an 8 bit ADC. Determine the resolution and digital output in binary when input is -2v using formula. Also trace using successive
Nagaraj N.K. 1MS05EC048

22

approximation approach for verification. Write it in tabular column.


a. Expected output: d/ ((2^n)-1) = (e-Vmin) / (Vmax-Vmin) d=76.5 d=76 or 77 i.e, 01001100 or 01001101
b. Resolution= (Vmax-Vmin) / ((2^n)-1)

=>5-(-5) /255 =>10/255 =0.0392 c. Successive approximation approach

E -2 < (5-5)/2 -2 > (0-5)/2 -2 < (0-2.5)/2 -2 < (-1.25-2.5)/2 -2 > (-1.875-2.5)/2 -2 > (-1.875-2.1875)/2 -2 < (-1.875-2.03125)/2 -2 < (-1.953125-2.03125)/2

d(8bit encoding) 00000000 01000000 01000000 01000000 01001000 01001100 01001100 01001100

Nagaraj N.K.

1MS05EC048

S-ar putea să vă placă și