0 Voturi pozitive0 Voturi negative

4 (de) vizualizări4 paginiJun 17, 2011

© Attribution Non-Commercial (BY-NC)

PDF, TXT sau citiți online pe Scribd

Attribution Non-Commercial (BY-NC)

4 (de) vizualizări

Attribution Non-Commercial (BY-NC)

- Barry Wilkinson, Michael Allen-Parallel Programming_ Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition) -Prentice Hall (2004)
- Q5 Processor Pool Model
- Impact of Noise on Scaling of Collectives: An Empirical Evaluation
- CUDA C Programming Guide
- Classic Quick Sizer for Beginners Short Guide
- Data-Parallel Architectures And
- tesdata_acm_72
- Syllabus
- gem5_hipeac_2f sfsd sdf
- CUDA C Programming Guide
- Chapter 4 the Processor
- Computer architecture first page
- Ch 5 Computer Architecture
- lec4
- discoverer10g904capacityplanninggui-132610.pdf
- msc_usr_fr_uk
- Challenges and Trends in Processor Design
- Effects of Processes Forcing on CPU and Total Execution-Time Using Multiprocessor Shared Memory System
- Chapter 4 the Processor
- FLUENT_2009April21_final.pdf

Sunteți pe pagina 1din 4

Validity of the single processor approach to achieving large scale computing capabilities1

Gene M. Amdahl IBM Sunnyvale, California

1 INTRODUCTION

For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly signi cant advences can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution. Variously the proper direction has been pointed out as general purpose computers with a generalized interconnection of memories, or as specialized computers with geometrically related memory interconnections and controlled by one or more instruction streams. Demonstration is made of the continued validity of the single processor approach and of the weaknesses of the multiple processor approach in terms of application to real problems and their attendant irregularities. The arguments presented are based on statistical characteristics of computation on computers over the last decade and upon the operational requirements within problems of physical interest. An additional reference will be one of the most thorough analyses of relative computer capabilities currently published \Changes in Computer Performance." Datamation, September 1966, Professor Kenneth F. Knight, Stanford School of Business Asministration. The rst characteristic of interest is the fraction of the computational load which is associated with data management housekeeping. This fraction has been very nearly constant for about ten years, and accounts for 40% of the executed instructions in production runs. In an entirely dedicated special purpose environment this might be reduced by a factor of two, but it is highly improbably that it could be reduced by a factor of three. The nature of this overhead appears to be sequential so that it is unlikely to be amenable to parallel processing techniques. Overhead alone would then place an upper limit on throughput of ve to seven times the sequential processing rate, even if the

This paper is retyped as the present form by Guihai Chen. He wishes you would enjoy reading this historical paper.

1

housekeeping were done in a separate processor. The non housekeeping part of the problem could exploit at most a processor of performance three to four times the performance of the housekeeping processor. A fairly obvious conclusion which can be drawn at this point is that the e ort expended on achieving high parallel processing rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude. Data management housekeeping is not the only problem to plague oversimpli ed approaches to high speed computation. The physical problems which are of practical interest tend to have rather signi cant complications. Examples of these complications are as follows: boundaries are likely to be irregular interiors are inhomogeneous computations required may be dependent on the states of the variables at each point propagation rates of di erent physical e ects may be quite di erent the rate of convergence, or convergence at all may be strongly dependent on sweeping through the array along di erent axes on succeeding passes, etc. The e ect of each of these complications is very severe on any computer organization based on geometrically related processors in a paralleled processing system. Even the existence of regular rectangular boundaries has the interesting property that for spatial dimension of N there are 3N di erent point geometries to be dealt with in a nearest neighbor computation. If the second nearest neighbor were also involved, there would be 5N di erent point geometries to contend with. An irregular boundary compounds this problem as does an inhomogeneous interiors. Computations which are dependent on the states of variables would require the processing at each point to consume approximately the same computational times as the sum of computations of all physical e ects within a large region. Di erences of changes in propagation rates may a ect the mesh point relationships. Ideally the computation of the action of the neighboring points upon the point under consideration involves their values at a previous time proportional to the mesh spacing and inversely proportional to the propagation rate. Since the time step is normally kept constant, a faster propagation rate for some e ects would imply interactions with more distant points. Finally the fairly common practice of sweeping through the mesh along di erent axes on succeeding passes posed problems of data management which a ects all processors, however it a ects geometrically related processors more severely by requiring transposing all points in storage in addition to the revised input-output scheduling. A realistic assessment of the e ect of these irregularities on a simpli ed and regularized abstraction of the problem yields a degradation in the vicinity of one-half to one order of magnitude. To sum up the e ects of data management housekeeping and of problem irregularities, the author has compared three di erent machine organizations involving approximately equal amounts 2

of hardware. Machine A has thirty two arithmetic execution units controlled by a single instruction stream. Machine B has pipelined arithmetic execution units with up to three overlapped operations on vectors of eight elements. Machine C has the same pipelined execution units, but initiation of individual operations at the same rate as Machine B permitted vector element operations. The performance of these three machines are plotted in Figure 1 as a function of the fraction of the number of instructions which permit parallelism. The probable region of operation is centered around a point corresponding to 25% data management overhead and 10% of the problem operations forced to be sequential.

Figure 1

The historic performance versus cost of computers has been explored very thoroughly by Professor Knight. The carefully analyzed data he presents re ects not just execution times for arithmetic operations and cost of minimum of recommended con gurations. He includes memory capacity effects, input-output overlap experienced, and special functional capabilities. The best statistical t obtained corresponds to a performance proportional to the square of cost at any technological level. This result very e ectively supports the often invoked \Grosch's Law". Utilizing this analysis, one can argue that if twice the amount of hardware were exploited in a single system, one could expect to obtain four times the performance. The only di culty is involved in knowing how to exploit this additional hardware. At any point in time it is di cult to foresee how the precious bottlenecks in a sequential computer will be e ectively overcome. If it were easy they would not have been left as bottlenecks. It is true by historical example that the successive obstacles have been hurdled, so it is appropriate to quote the Rev. Adam Clayton Powell{\Keep the faith, baby!" If alternatively one decided to improve the performance by putting two processors side by side with shared memory, one would nd approximately 2.2 times as much hardware. The additional two tenths in hardware accomplished the crossbar switching for the sharing. The resulting performance achieved would be about 1.8. The latter gure is derived from the assumption of each processor utilizing half of 3

the memories about half of the time. The resulting memory con icts in the shared system would extend the execution of one of two operations by one quarter of the execution time. The net result is a price performance degradation to 0.8 rather than an improvement to 2.0 for the single larger processor. Comparative analysis with associative processor is far less easy and obvious. Under certain condition of regular formats there is a fairly direct approach. Consider an associative processor designed for pattern recognition in which decisions within individual elements are forwarded to some set of other elements. In the associative processor design the receiving elements would have a set of source addresses which recognize by associative techniques whether or not it was to receive the decision of the currently declaring element. To make a corresponding special purpose non-associative processor one would consider a receiving element and its source addresses as an instruction, with binary decision maintained in registers. Considering the use of the lm memory, an associative cycle would be longer than a non-destructive read cycle. In such a real analogy the special purpose non-associative processor can be expected to take about one-fourth as many memory cycles as the associative version and only about one sixth of the time. These gures were computed on the full recognition task with somewhat di ering ratios in each phase. No blanket claim is intended here, but rather that each requirement should be investigated from both approaches.

The very famous Amdahl's Law, presented as in the following formula, is deprived from this paper. However, Amdahl gave only a literal description which was paraphrased by latecomers as follows: 1 Speedup = r + rp

s n

where rs + rp = 1 and rs represents the ratio of the sequential portion in one program.

Only a small part of this paper, exactly the fourth paragraph, contributes to the Amdahl's Law. This paper also discussed some other important problems. For example, Amdahl had forseen many negative factors plaguing the parallel computation of irregular problems, such as 1) boundaries are likely to be irregular 2) interiors are inhomogeneous 3) computations required may be dependent on the states of the variables at each point 4) propagation rates of di erent physical e ects may be quite di erent 5) the rate of convergence, or convergence at all may be strongly dependent on sweeping through the array along di erent axes on succeeding passes, etc. 4

- Barry Wilkinson, Michael Allen-Parallel Programming_ Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition) -Prentice Hall (2004)Încărcat deKusum0
- Q5 Processor Pool ModelÎncărcat deMasterBoy Chikky
- Impact of Noise on Scaling of Collectives: An Empirical EvaluationÎncărcat deDudeasaurus R.
- CUDA C Programming GuideÎncărcat deForrai Kriszitán
- Classic Quick Sizer for Beginners Short GuideÎncărcat dejackcomau3472
- Data-Parallel Architectures AndÎncărcat deAmeed Uddin
- tesdata_acm_72Încărcat deDelbert Ricardo
- SyllabusÎncărcat depramodkumar1515
- gem5_hipeac_2f sfsd sdfÎncărcat deArmin Ahmadzadeh
- CUDA C Programming GuideÎncărcat deAntonio
- Chapter 4 the ProcessorÎncărcat deFabiianii
- Computer architecture first pageÎncărcat deacabgth
- Ch 5 Computer ArchitectureÎncărcat deNuruz Zaman
- lec4Încărcat deAdip Chy
- discoverer10g904capacityplanninggui-132610.pdfÎncărcat deHendri Sitompul
- msc_usr_fr_ukÎncărcat deJyotsna Reddy
- Challenges and Trends in Processor DesignÎncărcat deMary Raullette Solomo
- Effects of Processes Forcing on CPU and Total Execution-Time Using Multiprocessor Shared Memory SystemÎncărcat deIJCERT PUBLICATIONS
- Chapter 4 the ProcessorÎncărcat deRahman Asmad
- FLUENT_2009April21_final.pdfÎncărcat deAnonymous GUitcVO
- Introduction to the Microprocessor and MicrocomputerÎncărcat deAriane Faye Salonga
- Oracle Parallel SQL Part 1.pdfÎncărcat deMd. Shamsul Haque
- Document VÎncărcat deVishal Dusane
- Finite Element AnalysisÎncărcat deLeonardo Freire Marques
- S7-300_IHB_eÎncărcat desivamurgan
- Billion-scale Similarity Search With GPUsÎncărcat deergodic
- A_Parallel_Programming_with_Microsoft_Visual_C++_.pdfÎncărcat detom
- SMA 2174 cat 1Încărcat deGabriel Kamau
- Gen of Comp .AssignmentÎncărcat deTooba Nadeem
- coa_unit1-2Încărcat deDeepanshu Prajapati

- Absence & attendance moduleÎncărcat deChary Madarapu
- b@buÎncărcat deapi-3713412
- Axe BladesÎncărcat deNosa Okunrobo
- NET CustomisationÎncărcat deRaju Naidu
- FPGA accelerator of Algebraic Quasi Cyclic LDPC Codes for NAND flash memories.pdfÎncărcat deBoppidiSrikanth
- dd_611Încărcat deilcorvonauta
- myengg.com/AIEEE 2010 Opening and Closing Ranks for All RoundsÎncărcat deLokesh Kumar
- BoTpl.pdfÎncărcat deAlberto Terrel
- Manufacturing PlantÎncărcat deanuragmishra2112
- Tableau Business ScenarioÎncărcat deクマー ヴィーン
- 164660681-H1-Evaluation-for-Narendra-Maravi.xlsÎncărcat deSumanth Achutha
- Communication Purpose PDFÎncărcat deLuke
- network protocolsÎncărcat deapi-216820839
- Best Practice in Essbase Business Rule Writing-manuskriptÎncărcat desen2nat
- programmer/analyst or software engineerÎncărcat deapi-78113644
- Microprocessor NotesÎncărcat deMag Creation
- Bpos Wiht Hr Head 12 AprilÎncărcat deKarthik Inbaraja
- # Installation GuideÎncărcat deHenrietta Jakab
- EMCF-17-F14Încărcat deShitPre Super
- Keypass Password Protection ManagerÎncărcat dehemant1122
- Codeforces TutorialÎncărcat deNeeraj Sharma
- BSC_How_to_Play.pdfÎncărcat deSwati Choraria
- 10.1.1.18.4512Încărcat deyogasana
- CacheCoherencyWhitepaper_6June2011.pdfÎncărcat deShashank Mishra
- Putovanje-1812-kroz-zemlje-pod-Turskom-i-pod-Grckom.pdfÎncărcat dejoj84
- VaxVoIPÎncărcat deNelson Parra Romo
- People Tools 8.51 UpgradeÎncărcat deSurbhit_Kumar
- FSG.docÎncărcat deMina Sameh
- MOSS 2007 CSS ReferenceÎncărcat deJennifer
- MainÎncărcat dejfranbripi793335

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.