5.final Term Paper of Architecture

Pipelined Instruction Processing
In
Computer Organization and Architecture
Vishal Patyal
Reg. No = 3050060122
97806-66742 (M)
(0190) 5230-650 (Home)
patyal.vishal@gmail.com
Term Paper for CSE-211

Computer Arithmetic
Fifth Term 2008
ABSTRACT
As we know this is new introduction of term paper which is nothing but minor
project of computer system organization and architecture in which i deeply
studied about the “performance metrics for parallel system” which is very
helpful to us in the future in the field of IT industry. In this I studied about one
frequently needs to compare the performance of two or more parallel computers, but how should
this be done? This report discusses some of these issues so that the problems they can cause may
be avoided in the future. We review the many performance metrics that have been proposed for
parallel systems (i.e., program- architecture combinations). These include the many variants of
speedup, efficiency, and isoefficiency. We give reasons why none of these metrics should be
used independent of the run time of the parallel system.
Keywords: - Principle, Problems and Solution, Advantages & Disadvantages, Examples,

Implementation, Development, Conclusion, References
Acknowledgements:-
First of all I would like to express my sincere gratitude to the almighty for encouraging me to
complete this term paper. The following are some important people in my life who gave me
strength and valuable suggestions to complete the task.
First, my parents, friends, whose love and affection give me strength and encouragement to face
challenges of life.
Second, my sir Bijoy Chatterjee, whose inspiration, motivation spiritual guidance always keeps
me in the right path and keeps my faith in God almighty without whose blessings nothing is
possible.
Finally, thanks for the Lovely Professional University which gave me great opportunity to make
the term paper.
DATED: VISHAL PATYAL
3050060122
CONTENTS:
An introduction to performance metrics
1. Performance Metrics for Parallel Systems
1. Execution time
2. Total parallel overload
1. Performance metrics for parallel system: Execution time
2. Performance metrics for parallel systems: Total parallel overhead
Performance metrics for parallel systems: Speed up
Performance metrics for parallel systems: Super linear Speedups
Performance metrics for parallel systems: Efficiency
File systems for parallel I/O

Evaluating performance metrics for parallel systems:
An introduction to performance metrics
1. The term performance
The performance can be considered as a sub-characterstic of dependability, although the

classical literature does not include it in their definition. If user perceive the performance of a
system as bad, for example because of high response times or denied requests due to
overload.
Some of the most commonly used performance metrics are response time, throughput and
utilization.
Response time is defined as the time interval between a user request of a service and the
response of the system. This metric is sometimes called responsiveness. As responces often
cannot be delivered instantaneously, there are two possible implication of this definition.
Either the response time is defined as the time interval between the user issuing a request and
the system starting to respond or finishing .
Performance metrics are measures of an organization's activities and performance. Performance

metrics should support a range of stakeholder needs from, customers, shareholders to employees
while traditionally many metrics are financed based, inwardly focusing on the performance of
the organization, metrics may also focus on the performance against customers requirements and
value.
Developing performance metrics usually follows a process of:
1. Establishing critical processes/customer requirements,

2. Developing measures,
3. Establishing targets which the results can be scored against.
A criticism of performance metrics is that when the value of information is computed using
mathematical methods, it shows that even performance metrics professionals choose measures
that have little value. This is referred to as the "measurement inversion". For example, metrics
seem to emphasize what organizations find immediately measurable — even if those are low
value — and tend to ignore high value measurements simply because they seem harder to
measure (whether they are or not).
Performance Metrics for Parallel Systems
It is important to study the performance of parallel programs with a view to determining the best
algorithm, evaluating hardware platforms, and examining the benefits from parallelism. A
number of metrics have been used based on the desired outcome of performance analysis.
1. Execution time
The serial runtime of a program is the time elapsed between the beginning and the end of its
execution on a sequential computer. The parallel runtime is the time that elapses from the
moment a parallel computation starts to the moment the last processing element finishes
execution. We denote the serial runtime by TS and the parallel runtime by TP.
2.Total parallel overload
The overheads incurred by a parallel program are encapsulated into a single expression referred
to as the overhead function. We define overhead function or total overhead of a parallel system
as the total time collectively spent by all the processing elements over and above that required by
the fastest known sequential algorithm for solving the same problem on a single processing
element. We denote the overhead function of a parallel system by the symbol To.
The total time spent in solving a problem summed over all processing elements is pTP. TS units
of this time are spent performing useful work, and the remainder is overhead. Therefore, the
overhead function (To) is given by
1. Performance metrics for parallel system: Execution time
•Serial runtime of a program is the time elapsed between the

beginning and the end of its execution on a sequential computer.
•The parallel runtime is the time that elapses from the moment the
first processor starts to the moment the last processor finishes
execution.
•We denote the serial runtime by Ts
and the parallel runtime by Tp
2. Performance metrics for parallel systems: Total parallel overhead
•Let T all be the total time collectively spent by all the processing
elements.
•TS is the serial time.
•Observe that T all TS is then the total time spend by all processors
Combined in non-useful work. This is called the total overhead.
•The total time collectively spent by all the processing elements

T all = p T P (p is the number of processors).
•The overhead function (To) is therefore given by To= p TP- TS
4. Performance metrics for parallel systems: Speed up
• Speedup (S) is the ratio of the time taken to solve a problem on a single processor to the
time required to solve the same problem on a parallel computer with p identical
processing elements.
.
Performance metrics: speed up Example
• Speedup (S) is the ratio of the time taken to solve a problem on a single
processor to the time required to solve the same problem on a parallel computer
with p identical processing elements.
• Consider the problem of adding n numbers by using n processing elements.
• If n is a power of two, we can perform this operation in log n steps by

propagating partial sums up a logical binary tree of processors.
Example: Computing the global sum of 16 partial sums using 16 processing elements . Σji
denotes
Sum of numbers with consecutive labels from i to j.
Performance metrics: Example (continued)
• If an addition takes constant time, say, tc and communication
of a single word takes time ts + tw, we have the parallel time
TP = Θ (log n)
• We know that TS = Θ (n)

• Speedup S is given by S = Θ (n / log n)
Performance metrics: Speedup
• For a given problem, there might be many serial algorithms available. These algorithms
may have different asymptotic runtimes and may be parallelizable to different degrees.
• For the purpose of computing speedup, we always consider the best sequential program
as the baseline.
• Consider the problem of parallel bubble sort.
• The serial time for bubble sort is 150 seconds.
• The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40 seconds.
• The speedup would appear to be 150/40 = 3.75.
• But is this really a fair assessment of the system?
• What if serial quick sort only took 30 seconds? In this case, the speedup is 30/40 = 0.75.
This is a more realistic assessment of the system.
5. Performance metrics for parallel systems: Super linear Speedups
Resource-based super linearity: The higher aggregate cache/memory bandwidth can result
in better cache-hit ratios, and therefore superlinearity.
Example: A processor with 64KB of cache yields an 80% hit ratio. If two processors are
used, since the problem size/processor is smaller, the hit ratio goes up to 90%. Of the remaining
10% access, 8% come from local memory and 2% from remote memory.
If DRAM access time is 100 ns, cache access time is 2 ns, and remote memory access time
is 400ns, this corresponds to a speedup of 2.43!
6. Performance metrics for parallel systems: Efficiency
Efficiency is a measure of the fraction of time for which a processing element is usefully
employed
File systems for parallel I/O
The section opens with a survey of the "The Vesta parallel file system," (Peter F. Corbett and
Dror G. Feitelson, 1996). IBM's AIX Parallel I/O File System, offered on the SP2, is based on
Vesta. While many previous parallel file systems hide how a file is distributed (striped) across
nodes, Vesta provides user interface-level access to a file's parallel structure. Vesta also aims to
facilitate file access via different types of decomposition of a file's data (i.e., data might be
accessed by row, and also by column). A Vesta file has a 2-dimensional structure, called a cell.
A cell is defined as "a container where data can be deposited, or alternatively, as virtual I/O
nodes that are then mapped to available physical I/O nodes." Thus, one file dimension is the cell
dimension, and the other is the data within the cell. As a file is written, a parameter specifies how
many cells it can occupy. Given that the number of cells determines the amount of parallelism a
file uses, that features is useful especially in the presence of small files, since distributing a small
file over many nodes results in potentially poor access to that file. The article describes many
other Vesta features, and offers an evaluation of the file system's performance using a parallel
sorting application as an example.
"The Zebra striped network file system," (John H. Hartman and John K. Ousterhout, 1995),
describes an extension to log structured file systems. In such a system, "each client forms its new
data for all files into a sequential log that it stripes across the storage servers." That batching of
writes from each node results in much improved performance, especially in the presence of many
small writes. Zebra also takes advantage of RAID-style redundancy techniques: It can recover
from a single server failure without loss of information (but not from multiple concurrent server
failures). File block creations, updates, and deletes are communicated between nodes in the form
of deltas. The paper gives an overview of log-structured file systems, and then outlines Zebra's
components (including a "stripe cleaner" to reclaim unused space). It concludes with a
performance evaluation focusing on small-file writes.
Many parallel file systems hide the striping and other data distribution strategies from client
applications in order to simplify the interface application programmers must be aware of.
However, that facade also makes it difficult to experiment with, and explore, distributed file
system characteristics in general. "PPFS: A High-Performance Portable Parallel File System"
(James V. Huber, Jr., Christopher L. Elford, Daniel A. Reed, Andrew A. Chien, and David S.
Blumenthal), which is an updated version of a 1995 original paper, defines a parallel file system
that lends itself to experimentation. It is a rich user-level library implemented on top of an
operating system's native file system, effectively interposing itself between that OS and an
application program. The article outlines PPFS's design philosophy, and provides two benchmark
application results, one sfrom a read-intensive gene sequencing application, and the other based
on a write-intensive electron scattering code from plasma physics.
"The Global File system" (Steven R. Soltis, Thomas M. Ruwart, Grant M. Erickson, Kenneth W.
Preslan, and Matthew T. O'Keefe, 1996) is a cluster-based distributed file system. Instead of
focusing on serving a large number of clients, GFS aims to deliver high performance to a
relatively small number of systems. Its target applications are those requiring high bandwidth
and large amounts of storage place, such as multimedia information systems. GFS takes
advantage of device locks to ensure data consistency, and outlines a file access pattern that is
somewhat similar to how processes access shared memory in an SMP. In GFS, clients obtain a
lock when they read or write data on shared network storage, and those locks correspond to
device-level locks on the storage system; in the prototype implementation, those locks are
implemented with the SCSI Dlock command. Because that design allows GFS to atomically
modify data, GFS clients can remain unaware of each other's presence. To provide a unified
logical storage place, GFS collects storage devices into a network storage pool; subpools divide
network storage pools based on device type. GFS enables clients to export file systems to
machines not directly connected to a storage pool: a GFS client can act as an NFS server, and
can even serve HTTP clients, enabling access to the storage pool via the Web.
"Serverless network file systems" (1996, Thomas E. Anderson, Michael D. Dahlin, Jeanna M.
Neefe Matthews, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang) describes xFS,
the file system for the network of workstations (NOW) system. NOW defines a cooperation
mechanism for independent workstations. For storage, that implies that there is no central server,
and that file system services are provided collectively by the workstations: Any workstation on
the network can store, cache, and access any block of data. Since any machine on the network
can fail as well, xFS also provides for the redundancy of data. xFS incorporates previous
research and extends it to enable a serverless mode of operation. The paper discusses RAID, log-
structured storage, Zebra, different multiprocessor cache consistency mechanisms, and then
focuses on how that research is adapted to xFS's unique needs. For instance, xFS distributes file
system management services among the workstations: It attempts to assign a file used by a client
to a manager located on that client machine. The paper describes a simulation study showing that
co-locating a file's management with that file's client improves locality, and thereby significantly
reduces network-bound client requests. Data distribution in xFS is achieved by a software
implementation of RAID, utilizing a Zebra-like log-based striping.
Evaluating performance metrics for parallel systems:
In conducting any evaluation, we need to identify a set of performance metrics that we would
like to measure and the techniques and tools that will be used to gather these metrics.
Metrics which capture the “available” computer power are often not a true indicator of the
performance actually “delivered” by a parallel system. Metrics for parallel systems performance
evaluation should quantify this gap between available and delivered compute power since
understanding application and architectural bottlenecks is crucial for application restricting and
architectural enhancements. Many performance metrics such as speed up, scaled speed up and
isoefficiency, have been proposed to quantify the match between the application and architecture
in a parallel system. While these metrics are useful for tracking overall performance trends, they
provide little additional information about where performance is lost. Some of these metrics
attempt to identify the cause of the problem when the parallel system does not scale expected.
However, it is essential to find the individual application and architectural artifacts that lead to
these bottlenecks and quantify their relative contribution towards limiting the overall scalability
of the system. Traditional metrics do not help further in this regard.
Parallel system overheads may be broadly classified into purely algorithmic components. A
component arising from the interaction of the application with the system software and a
component arising from the interaction of the application with the hardware.
2.2 Evaluation Techniques

Experimentation, analytical modeling and simulation are three well-known techniques for
evaluating parallel systems. Experimentation involves implementing the application on the actual
hardware and measuring its performance. Analytical models abstract hardware and application
details in a parallel system and capture complex system features by simple mathematical
formulae. These formulae are usually parameterized by a limited number of degrees of freedom
so that the analysis is kept tractable. Simulation is a valuable technique which exploits computer
resources to model and imitate the behavior of a real system in a controlled manner. Each
technique has its own limitations. The amount of statistics that can be gleaned by
experimentation (to quantify the overhead functions) is limited by the
monitoring/instrumentation support provided by the underlying system. Additional
instrumentation can sometimes disturb the evaluation. Analytical models are often criticized for
the unrealism and simplifying assumptions made in expressing the complex interaction between
the application and the architecture. Simulation realistic computer system demand considerable
resources in terms of space and time.
References:
1. http://www google.com
2. http://www.wikipedia.com
3. http://www.guruji.com
4. http://www.google.com/introduction+to+performance+metrics&btnG=Search
5.h ttp://scholar.google.com/s
6. http://www.stormingmedia.us/html

5.final Term Paper of Architecture

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

5.final Term Paper of Architecture

Încărcat de

Drepturi de autor:

Formate disponibile

Pipelined Instruction Processing

Computer Organization and Architecture

Term Paper for CSE-211

Fifth Term 2008

Keywords: - Principle, Problems and Solution, Advantages & Disadvantages, Examples,

DATED: VISHAL PATYAL

1. Performance Metrics for Parallel Systems

2. Total parallel overload

1. Performance metrics for parallel system: Execution time

2. Performance metrics for parallel systems: Total parallel overhead

Performance metrics for parallel systems: Speed up

Performance metrics for parallel systems: Super linear Speedups

Performance metrics for parallel systems: Efficiency

File systems for parallel I/O

The performance can be considered as a sub-characterstic of dependability, although the

Performance metrics are measures of an organization's activities and performance. Performance

Developing performance metrics usually follows a process of:

1. Establishing critical processes/customer requirements,

2.Total parallel overload

•Serial runtime of a program is the time elapsed between the

•We denote the serial runtime by Ts

and the parallel runtime by Tp

2. Performance metrics for parallel systems: Total parallel overhead

•TS is the serial time.

•The total time collectively spent by all the processing elements

•The overhead function (To) is therefore given by To= p TP- TS

4. Performance metrics for parallel systems: Speed up

• Consider the problem of adding n numbers by using n processing elements.

• If n is a power of two, we can perform this operation in log n steps by

Sum of numbers with consecutive labels from i to j.

Performance metrics: Example (continued)

• If an addition takes constant time, say, tc and communication

of a single word takes time ts + tw, we have the parallel time

• We know that TS = Θ (n)

Performance metrics: Speedup

6. Performance metrics for parallel systems: Efficiency

File systems for parallel I/O

2.2 Evaluation Techniques

S-ar putea să vă placă și