Documente Academic
Documente Profesional
Documente Cultură
0 UP CLOSE
UP CLOSE
The compiler at the heart of open source is heading for a new release. GCC
fan and sometime contributor Biagio Lucini talks to leading developers for
an exclusive preview.
N
othing we do with open find out if these new additions are the in 1992 with the publication of version
source would be possible result of harmonious exploration or 2.0, which also added support for C++.
without the compiler acrimonious forks. GCC was beginning to be adopted as
collection GCC. It may be That history began in 1984, when the official compiler on several
mastered only by an inner Richard Stallman wrote the first chunk software platforms (including Linux),
circle of C++ gurus but it of GCC, the C front-end. In the same and its 2.7 manifestation received
affects us all. Its GCC that allows your year the GNU project officially began, special praise.
distributor to build the system youre and its no surprise that GCC is at the
running right now, and every heart of it: its hard to imagine how Fork ahead
improvement to it results in shorter you could provide freely modifiable Through the nineties, GCC
execution times and smaller binaries. software without providing a way to development remained in the firm
GCC is where the magic takes convert the modifications into hands of the Free Software
place, and thats why were paying executable code. Foundation (FSF), which was more
close attention to the major Three years later, in 1987, Stallman focused on stabilising than on
forthcoming release of GCC 4.0, a decided to expand the front-end into improving the compiler. As a
benchmark for the project. Mailing lists a fully-blown compiler, beginning consequence, third-party patches
talk of faster optimisation, improved GCCs journey to the version 4.0 were aimed at simplifying the building
security and cool hacks. Given GCCs awaiting. Architectural limitations of process on some architectures or
chequered history, we were keen to this first release series were overcome adding functionalities were very often
WHAT IS SSA?
A framework for better optimisation.
It will improve your life!
When writing code, its common to
reuse names of dummy variables. Take, have a new name. In the SSA
for instance, the code snippet: representation, the same code becomes:
a = 3; a1 = 3;
b = f(a); b1 = f(a1);
a = 4; a2 = 4;
The a that appears at line 3 has The scopes of the variables are now
nothing to do with the a at lines 1 and clearly exposed. This representation
2. What the Single Static Assignment offers a powerful tool for analysing
does is to give a different name to dependencies among different portions
logically independent variables, so of a program, which is the starting point
each newly referenced variable must for effective optimisations.
C trees C genericise
Java trees Java genericise 2/ The tree-SSA framework (taken from http://gcc.gnu.org/projects/tree-ssa/).
GCC writers believe it will play a vital role in optimisation advances in future releases.
>> Fortran 77, Ada and Java) has been unofficial distribution maintained in the will go to a branch, which will be the
vastly improved. As a result of the the form of a CVS branch of the main basis for the version following the next
language-independent infrastructure repository is started, to be periodically one. At Stage Three the known bugs
being revised, the generated code is synchronised with mainline. Being are fixed. The final check consists of
generally faster than the corresponding experimental software, the criteria for analysing the results obtained by
2.x executables, and support for more code thats checked into a branch are running the compiler on the provided
architectures has been added (there less strict than those for mainline test suite: there must be no regression
are few platforms to which GCC 3.x additions. If and when the branch with respect to the previous version
has not been ported). proves to do useful work without before the compiler can be tagged
Having learned its lesson with destabilising the compiler, it will be with the release number.
EGCS, GCC now welcomes new ideas merged with mainline. Otherwise it will The person responsible for this
and the transformed open nature of have been just an interesting exercise. process is the release manager. Since
the development process is a large Many of GCCs major projects version 3.0, the release manager for
factor in GCCs success and swift began life in one of these branches. GCC has been Mark Mitchell
development. CVS access is restricted The projects are overseen by the (see Q&A, right).
to a few trusted developers and, as steering committee, a group of leading
GCC is still the property of the FSF, all developers who decide what direction High hopes
contributors need to sign a copyright GCC should follow. It includes GCC is now at version 3.4.3, expected
transfer form to donate their code to developers from different companies to be the last release in the successful
the project. But theres plenty of room and institutions (such as David 3.x series before the coming of version
for developers who want to Edelsohn, a K42 researcher at IBM, 4.0, which is at Stage Three in its
experiment with new constructs within Jeff Law of Red Hat and Gerald development at the time of writing
the framework of GCC. Pfeifer, who works on Itanium at (and the chances are that it will be out
Everyone can contribute patches SUSE), with the aim of balancing by the time you read this).
by sending them to gcc-patches@gcc. different or even opposing needs The big jump in the release
gnu.org. These will be peer reviewed, within the user community. number reflects a major development:
and if theyre considered correct, Before a new version is released, the adoption of a new optimisation
adherent to GCC coding conventions its source code undergoes three framework that makes use of the
and useful to the community, they will different stages. In Stage One the Single Static Assignment (SSA)
be checked into the main tree. project is under heavy development transformations. Once the framework
Patches that require heavy and major modifications can be matures, it will provide faster and
modification of the architecture accepted. In Stage Two only better generated code and be the
undergo a stricter review process. First, stabilisation of the approved features basis for further optimisation. The
the main code is forked. Then an can be performed. Any major revision initial SSA implementation is largely
just a framework for the future, but readers of Paul Hudsons LXF series
the next few releases of GCC will on compilers will know, the clever way Language 1
include optimisations (tweaks, to reduce the work is to make sure
basically) based on this initial release. that the middle-end is logically
Language 2 Architecture 1
To understand why the new separated from the front-end and the
optimisation framework will make such back-end. If the middle-end also Intermediate
language
a difference, we have to take a step makes use of a representation of the Architecture 2
Language 3
backward and talk about compilers in source code that is not language-
general. A compiler is a software specific, front-ends of different
program that transforms a text file languages can share it. Language 4
written according to well-defined In the same way, its possible to
lexical and syntactical rules specified interface several back-ends to the Front-ends Middle-ends Back-ends
by the programming language into same middle-end. For a compiler that
machine executable code. The follows this structure, to support x and the RTL representation is not well 2/ An ideal compiler that supports
compilation process comprises a languages on y architectures you suited for high-level optimisations. four languages on two different
architectures.
parsing part, in which the source is would need x + y separate projects Each front-end has to know about
validated; an optimisation part, in emitting or accepting code according optimisations, which apart from
which the code is restructured for to the rules dictated by the middle- causing duplication of efforts means
improving its performance; and a end. Fig 2 represents the structure of the quality of the generated code is
generation part, in which the such a compiler. dependent on the language and
executable is built. Technically, we refer In principle, old versions of GCC optimisation processes in the
to them respectively as the front-end, have followed that structure, with the particular front-end.
the middle-end and the back-end. front-end emitting abstract syntax Whats the answer? The new tree-
The three components do not have trees (ASTs) and the intermediate SSA framework, which will offer a
to be kept distinct, but if they arent, to language being Register Transfer language-independent infrastructure
support x languages on y different Language (RTL). Unfortunately for fans for optimisations, sitting as it does
architectures one would need to write of smooth compiling, the ASTs between the front-ends and the >>
x times y different compilers. As generated by each front-end differ, RTL (see What Is SSA? box, left).
new version of the compiler. I also help steer Its a pretty diverse set of goals, and
what changes go into the compiler at which sometimes the goals are incompatible.
points in the development cycle and I try to
facilitate high-level technical conversations LXF: How is GCC developed?
about the desirability of particular changes. MM: GCC is developed by a pretty large
Historically, Ive done a lot of development team. Most of the major contributors are contain the tree-SSA infrastructure. There
of the G++ compiler. I still do some of that, now being paid for their efforts, which is are some programs that run a lot faster with
but now Im working more on other things, somewhat different from five or ten years GCC 4.0.
including managing CodeSourcerys rapid ago. But theres still a tremendous amount of I think that GCC 4.1 will demonstrate even
growth. I can get a lot more done by helping volunteer effort as well. I dont want to name more of an across-the-board win. Frankly,
others than by trying to do it all myself! particular organisations because Ill probably replacing most all of the optimisers in GCC
leave somebody out, and I dont want to be with brand-new technology, and having it (a)
LXF: What are the goals of GCC? accused of promoting particular interests. In work, and (b) not generate worse code is a
MM: It depends a lot on who you ask. One general, the major contributors are software huge achievement!
LXF: How have you been involved in of the challenges is that the goals of the development businesses (like CodeSourcery), GCC 4.0 also contains a Fortran 95
GCCs development? various stakeholders are not uniform. Some GNU/Linux distribution vendors, operating front-end. Its not as polished as C or C++ at
MM: Ive enjoyed working on compilers and people want to see releases very frequently system vendors and hardware vendors. this point, but its coming along very nicely.
programming languages for a long time: in so that improvements are always available to The development model has come out of The C++ front-end is substantially faster
fact, my elementary school computer people. The distribution vendors want to see years of evolution. Its a balance between when compiling without optimisation. As
teacher was a wonderful woman who was releases that contain the features their freeform development and a strictly top- always, there is support for more chip
very interested in programming languages. customers need on a schedule that works down model. The GCC Steering Committee variants, newer versions of operating
So I think I was doomed to like compilers for them. Some people want maximum sets some high-level policies, but most systems, and tons of bugfixes.
from about age five! backwards compatibility with older versions technical decisions are being made by the
My biggest role is release manager. I of the compiler. Some people want strict individual maintainers. Theres a lot of back- LXF: Has the availability of the Intel
decide when its time to officially release a conformance with language standards. and-forth between the developers to work compilers had any impact on the
out how best to solve problems. We use peer development goals of GCC?
review to check each others work and MM: I believe that competition is great for
I THINK I WAS DOOMED decide on designs. GCC. People say a lot of things, positive and
negative, about the Intel compilers. Im not
TO LIKE COMPILERS FROM LXF: What can the end user expect from
GCC 4.0?
going to do that; Ive not examined them
closely enough to say for sure. Im confident
are bound to make the branch too unstable, now implement optimisations like
particularly if mainline is in Stage 1, ie open vectorisation and software pipelining that
to major changes. If you let too much time were difficult or impossible to implement on
pass between merges, you may spend quite RTL. It also separates the front-ends from the
LXF: How did you get involved in GCC? a few hours fixing merge problems, back- and middle-ends so that adding new
DN: I am originally from Argentina and came particularly if the branch is too active, like languages to GCC wont be nearly impossible
to Canada in 1993 to do a PhD in Computer tree-SSA used to be. anymore. Before, every front-end had
Science at the University of Alberta. I started Branches are not much different to intimate ties with the back-end and the
getting involved with compilers and ended mainline In terms of contributions either. First internal interfaces were slim or non-existent.
up developing techniques for analysing and and foremost, you have to make sure that As with any other internal infrastructure
optimising concurrent programs. everyone contributing to the branch has all overhaul, these major changes typically
In 1999 I came into contact with Cygnus their FSF copyright paperwork in order. mean little to the user. But in this case, the
and started working for the GCC team. Until As far as stability goes, branches also two major visible changes will be the
then I only knew about GCC by name I operate in stages. Initially, you allow just inclusion of Fortran 95 and mudflap [a
had played with it a little bit during my about any change that is reasonable, and as technology for checking run-time errors]. The
research, but not to any serious extent. After you are getting ready to merge into mainline new optimisations will probably help some We are also
graduation, I relocated to Toronto and kept you start clamping down. The tree-SSA users. For instance, the new scalarisation starting to add intermodule optimisations
working on GCC (now as part of Red Hat, branch was pretty flexible initially, but in the capabilities are likely to help C++ code with optimisations that can work across function
since [Cygnus was] acquired in late 1999). months prior to the final merge, I would not lots of short-lived small objects that were calls and even file boundaries. Explicit
allow any patch that broke bootstraps on the demoted to memory too early in previous concurrency in the form of OpenMP [a
LXF: What does it mean in practical terms 5 or 6 architectures I was testing. Even if the versions of GCC. Also, the autovectorisation shared-memory API] or something along
to be the maintainer of a branch of GCC? patch was not at fault, we would remove it passes may come in handy for some codes. those lines is also likely in the mid- to long
DN: The work isnt much different to what and ask the author to figure it out. I dont expect GCC 4.0 to do the job term. Dynamic languages like Java will also
you do on mainline. Perhaps the major across the board, but the new architecture benefit from the new architecture. People
liability is merging changes from mainline LXF: Can you explain what tree-SSA is? will certainly help us improve and maintain it will be able to implement analyses like
into the branch. Its a delicate balance you DN: Basically, it is an overhaul of GCCs a lot better than before. escape analysis and devirtualisation.
have to strike if you merge too often, you optimisation infrastructure. With it, we can
LXF: How do you see the future of LXF: Do you plan to work on other
tree-SSA and of GCC in general? innovative projects for GCC?
>> version will be no worse than the performances are plotted in Fig 3 (for
experimental one. details about the various tests, refer to
PAUL BROOK: FORTRAN VISIONARY
The same applies to gfortran, the home page of the benchmarks).
Together with Steven Bosscher, Paul Brook made it his mission to
which at the moment runs at about GCC 4.0 overperforms its
have a Fortran 95 front-end as a part of the official GCC distribution.
We asked Paul where the projects at today. half the speed of the Intel Fortran predecessor in most tests, often by a
Compiler version 8.1 in our self- wide margin. Even more excitingly,
developed Fortran 90 benchmark GCC 4.0 now runs neck and neck with
suite (we could not compare directly the Intel compiler, and outperforms it
LXF: Why did you decide to fork g95?
PB: The original g95 author likes to keep with GCC 3.4, since Fortran 90/95 by a significant margin in at least two
very tight control of the project, ensuring support is a new feature of GCC 4.0). tests. Still, at the moment a tedious
that all code meets his personal standards With all this in mind, we tested the optimisation bug (a wrong move of
and ways of doing things. We felt that it performance of the code generated floating-point variables through integer
was important to have a more open by GCC 4.0 CVS with the SciMark2 registers) affects the performance of
development environment, and to work
benchmark suite (http://math.nist. GCC 4.0. As this bug will be fixed
more closely with the rest of the GCC
community. Our initial goal was to integrate
gov/scimark2), designed for gauging before the official release, expect the
gfortran into the main GCC CVS repository, the speed of floating-point operations, official version to perform much better
making it part of official GCC releases. and did the same with GCC 3.4.3 and than in our tests. We dont expect you
the Intel C Compiler release version to have a dual Opteron on your desks,
LXF: Is there any cooperation among 8.1. For the GNU compilers we used so we repeated the tests on a Pentium
the two Fortran implementations of
the optimisation flags IV 1.7 GHz with 768 MB of RAM, which
GCC? For instance, are you exchanging
gcc -O3 -funroll-loops -D__ threw up roughly the same results.
code for the libraries?
PB: No, not much. In practice the two NO_MATH_INLINES -ffast- The tests confirmed our hopes that
projects have diverged sufficiently that math -march=opteron - GCC 4.0 will be a great release. But
most changes do not transfer easily. mfpmath=sse,387 -ftree- the GCC developers have no time to
There has also been some difficulty vectorize -onestep -fomit- bask in the glory, since they are
obtaining up-to-date versions of the g95 frame-pointer -finline- already working on new features and
source code.
functions -static additions. GCC still lags behind
except for the -ftree_vectorize commercial competitors in the high-
LXF: How long have you been working LXF: What needs to be done to
on GCC? consider the implementation complete? option, which is specific to tree-SSA performance computing market, and
PB: Ive been involved with GCC since I left PB: Gfortran should still be considered (other tree-SSA optimisation options we expect this gap to be filled pretty
university in 2002, and have been working beta quality. Most Fortran 95 language are automatically activated by the -O3 soon. The GOMP project (http://gcc.
for CodeSourcery on GCC for just over a year. features have been implemented, and switch). For ICC we used: gnu.org/projects/gomp), aimed at
Im joint maintainer of the GCC ARM back- some large applications (eg the SPEC -O3 -tpp7 -xW -ipo -align - providing support for the powerful
end and Fortran front-end, and spend most CPU2000 benchmarks) can be
Zp16 -static. OpenMP parallel instruction extensions,
of my time working on these. successfully compiled. However, there are
still many bugs, and many of the language
Without the static option, which would is an initial step in that direction. LXF
LXF: Why do you believe that GCC must extensions supported by g77 arent yet have hidden the features we were
support Fortran 95? implemented. interested in. The compilation time on
PB: Fortran is still quite widely used for Ill consider gfortran done when the few 4.0 was on average about 10% slower ACKNOWLEDGEMENTS
computationally-intensive numerical remaining corners of Fortran 95, and most than on 3.4.3, and the size of the
simulations, particularly in academic of the extensions supported by g77, are Thanks to Vladimir Marakov, Paolo
executable was about 2% larger. The
institutions. It is quite common for new code working. GCC 4.0 will be the first GCC Bonzini, Uros Bizjak and especially
generated code was then executed on Richard Guenther for discussing
to be written in Fortran 95, then combined release to include gfortran. We expect that
with legacy Fortran 77 libraries. by then gfortran will be usable for many a dual AMD Opteron 244 processor optimisation flags in GCC 4.0.
Support for Fortran 95 is essential if GCC purposes, though it may not be suitable as machine with 4GB of RAM. Measured
is to remain a viable alternative in this area. a production compiler or as a direct
GCCs free availability and portability to a replacement for g77.
large number of hardware and OS platforms
make it particularly attractive for a user LXF: Do you have any idea of how
wanting to develop an application on a local gfortran compares in terms of
workstation, then migrate it to a high- performance with commercial
performance cluster. implementations such as Intels?
PB: For Fortran 77 code gfortran should
LXF: How did you get the idea of adding generate code that is at least as good as
F95 support to GCC? g77, and comparable to many commercial
PB: My final year project at university compilers. For some complex Fortran 95
involved modifying a fluid simulation code code we generate code that is significantly
written in Fortran 95. I was frustrated by the slower than commercial compilers. Most of
lack of a free Fortran 95 compiler, which the work on gfortran is concerned with
meant I was restricted to working on a few correct implementation of missing features:
university machines. theres a lot of work left to do to improve
After finishing university I joined the g95 performance. Having said that, gfortran
project. At that time g95 could parse most uses the same optimisers as GCC and
Fortran 95 source, but had no real code G++, so any improvements to these will
generation capabilities. Like most recent benefit gfortran. GCC 4.0 will contain many
university graduates I had quite a bit of spare new optimisations, like autovectorisation.
time, so wrote the code to glue g95 and These should help close the gap between
GCC together. gfortran and commercial compilers.