Documente Academic
Documente Profesional
Documente Cultură
Michiaki Tatsubori
Akihiko Tozawa
Toyotaro Suzumura
Scott Trent
Tamiya Onodera
Abstract
Programmers who develop Web applications often use dynamic
scripting languages such as Perl, PHP, Python, and Ruby. For general purpose scripting language usage, interpreter-based implementations are efficient and popular but the server-side usage for Web
application development implies an opportunity to significantly enhance Web server throughput. This paper summarizes a study of the
optimization of PHP script processing. We developed a PHP processor, P9, by adapting an existing production-quality just-in-time
(JIT) compiler for a Java virtual machine, for which optimization
technologies have been well-established, especially for server-side
application. This paper describes and contrasts microbenchmarks
and SPECweb2005 benchmark results for a well-tuned configuration of a traditional PHP interpreter and our JIT compiler-based implementation, P9. Experimental results with the microbenchmarks
show 2.5-9.5x advantage with P9, and the SPECweb2005 measurements show 20-30% improvements. These results show that the
acceleration of dynamic scripting language processing does matter in a realistic Web application server environment. CPU usage
profiling shows our simple JIT compiler introduction reduces the
PHP core runtime overhead from 45% to 13% for a SPECweb2005
scenario, implying that further improvements of dynamic compilers would provide little additional return unless other major overheads such as heavy memory copy between the language runtime
and Web server frontend are reduced.
Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors Run-time environments; Compilers; Optimization
General Terms
Keywords
1. Introduction
The dynamic scripting languages such as Perl, PHP, Python, and
Ruby have become enormously popular for quickly implementing Web applications, and are widely used to access databases and
other middleware. In particular, PHP has been one of the most popular server-side programming languages [25]. A dynamic scripting
language runtime is usually an interpreter-based implementation,
making it easy to support scripting language-specific features such
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
VEE10, March 1719, 2010, Pittsburgh, Pennsylvania, USA
c 2010 ACM 978-1-60558-910-7/10/03. . . $10.00
Copyright
Web server benchmark, showing that the acceleration of dynamic scripting language processing in a realistic Web application server environment does matter and that applying an existing JIT compiler simply adapted to PHP processing works
effectively, and
performance profiling which shows the simple JIT introduction
main.php:
<html>
<body>
<?php
include "hello.php";
hello("World");
?>
</body>
</html>
Resulting output:
<html>
<body>
Hello, World
</body>
</html>
hello.php:
<?php
function hello($nm) {
echo "Hello, " . $nm;
}
?>
op
extension/operands
---------------------------------------ECHO
<html>\n<body>\n
INCLUDE_OR_EVAL
hello.php, INCLUDE
INIT_FCALL_BY_NAME hello
SEND_VAL
World
DO_FCALL_BY_NAME
1
ECHO
\n</body></html>\n
RETURN
1
Bytecode
For improved efficiency, interpreter-based runtimes typically compile their source code into some kind of intermediate code, often
called bytecode. This contributes to higher performance by:
Avoiding duplicated lexical and syntactical parsing within a
Bytecode
TR-IL
Trees & CFG
Code
Code
Generators
Generators
IL
Generator
Optimizers
Instructions &
Meta-data
typedef struct {
P9Tag type;
union {
long
char
P9Double*
P9Reference*
P9Object*
P9String*
P9Array*
P9Resource*
} value;
} P9Value;
longVal;
charVal;
dblPtr;
refPtr;
objPtr;
strPtr;
aryPtr;
resPtr;
typedef enum {
PNull,
PBool,
PInt,
PStr1,
PDouble,
PReference,
PObject,
PStr,
PArray
PResource,
} P9Tag;
Helper Functions
Variables Layout
To handle such the variable scoping, the Zend PHP runtime allocates variables using associative arrays, which can be searched by
using the names of the variables. However, this is apparently inefficient. The first optimization for scopes is to use the machine stack
frame to access variables.
Our Java JIT compiler provides an aggregate type, the components of whose values can be accessed by using indices. We initially
allocate the local variable frame on stack, as an aggregate value.
Now the compiler translates the normal local variable access, such
as $x=y, into an index-lookup operation on this aggregate structure, which in turn is translated into a pair 3 of memory access instructions. In contrast, the other accesses such as variable-variable
access use the variable table, which is an associative array for mapping from each variable name to a location inside the aggregate
value.
3.6
Here are some of the other less important design choices for the
runtime. These choices are less important for the main concerns
of this paper, but must be explained because they greatly affect the
performance of the runtime, which affects our experimental results.
Memory Management
While asynchronous garbage collection mechanisms (frequently
used in Java) are often best for good throughput, we had to use
reference counting for three reasons:
To manage the lifetimes of the PHP references and resources.
To manage the copy-on-write operations for associative ar-
rays [23].
To reclaim a heap cell when the reference count becomes 0.
The first reason is due to the PHP language semantics. The second
reason significantly improves the performance of the runtime, since
values are copied for each assignment to a variable in PHP, so that
a naive implementation will suffer from very poor performance.
For the third reason, we consider an alternative design choice to
asynchronous garbage collection reasonable since we do not need
to offset the costs of reference counting as it is already introduced
for the first two requirements.
2 This
3 Due
4. Microbenchmarks
Before discussing the end-to-end Web server benchmark results,
we discuss some microbenchmark results here. The test system
was an IBM IntelliStation M Pro with a 3.4GHz Xeon uniprocessor
running Fedora Core 7 (kernel 2.6.23).
4.1
PHPBench
ar
it
hm
et
bi ic
tw
c h c h is e
r_ r_f
i
h
co ard xe d
m co
m
d
co e n e d
m t_l
c o par oop
m e_
p
f
co are alse
_
co mp inv
m are er
pa _ t
re str
_u ict
ns
co trict
m
pa
do
r
_w c e
hi rc3
le
_b 2
r
do ea k
em _w
pt hile
y_
lo
o
em p
p
gl
fo ty
ob
r
al
g ea
gl _s c e t_ ch
ob a
c
al lar las
_s _a s
tri
s
n g sig
_a n
if _ s s
co ign
n
in stan
cr
em t
is_ en t
a
is_ rr ay
ob
je
is_ c t
ty
pe
lo
iss
c
lo al_
et
ca a
l_ r ra
bo y lin
e
_
lo olea ass
ca n
_ ign
lo l_flo as
ca
a si
lo l_h t _a gn
ca a
s
l_ sh sig
i
lo nt e _ a n
ca
s
l_ ge r sig
lo ob _a n
ca je s s
l_ ct
ig
lo s ca _as n
c a la s
or
l_ r _ ign
de
st
a
re
rin ss
d_
g_ ign
fu
as
nc
sig
tio
n
ns
_
or
m
de ref
d
re ere 5
d_ n
fu ce
s
pr ncti
eg o n
_m s
re at
st fer c h
rin en
g_ ce
ap s
pe
nd
st
rle
un
sw n
or
itc
de
h
r
va ed_ t im
ria fu
e
bl nc
po
e
t
pu
_v ion
la
ar s
te
ia
_a
bl
es
rra
pr
op
y_ wh
p
er r o st
r ile
ty
_a per in g
c ty_ ke
ob ce s ac y
je s _ ce
ct dy ss
_i
ns nam
ta
nt ic
ia
tio
n
Figure 6. Relative elapsed time of P9 compared to PHP 5 for each test in the PHPBench benchmark suite (lower is better)
Java virtual machine, their benefit seems to be limited due to possible gaps between expressive powers of Java bytecode and dynamic
scripting languages implemented on top of Java.
Environmental Set-Up
1000
Relative Elapsed Time to Java w/ JIT
Fibonacchi
Levenshtein
Quick Sort
100
10
1
PHP 4
PHP 5
P8 1.1
P9
Python
Jython
JRuby
Java int
Java
noopt
Figure 7. Mini-application benchmark results with relative elapsed time to Java in a log scale (lower is better)
Fast
CGI
Client
Lighttpd
PHP
PHP
PHP
runtime
runtime
runtime
Backend
Simulator
Ecommerce
Support
100.0%
14.1%
0.0%
8.7%
7.1%
4.7%
59.5%
71.6%
11.7%
is transmitted using SSL encryption and 70% of the data transmitted is generated for dynamic webpages. Finally, the vendor support
site provides downloads of large unencrypted support files such as
manuals and software upgrades. Since this scenario primarily allows for accessing large and non-confidential static files, there is
no encryption, and only 12% of the data transmitted is generated
through dynamic webpages. Table 1 summerizes our analysis of
the characteristics of the experimental data.
A typical SPECweb2005 test bed has multiple client machines
controlled by a so-called Prime Client to provide a load on the System Under Test (SUT), thus simulating hundreds or thousands of
users accessing the scenario websites. To reflect a modern multitier Web server architecture, SPECweb2005 uses one or more machines to serve as a Back-End SIMulator (BESIM), emulating the
function of a Back-End database server.
Results
Figure 9 shows the result of running our test with the SPECweb2005 banking scenario, which includes 18 dynamic pages visited by each client based on the fixed think time delay and page
transition probability. The size of the workload is specified by the
number of clients. The response time for each page request is classified as GOOD (within 2 seconds), TOLERABLE (within 4 seconds), and FAIL. For example with a workload of 1,200 clients
in the Banking scenario (top two graphs), 97% of the requests are
GOOD with the server configuration using P9, while this number
drops to 7% with the PHP 5 runtime. Also the maximum number
of total requests processed in 2-minute test runs is 18,690 at 1,200
clients for P9, and 16,373 at 1,000 clients for the PHP 5 runtime.
sending many small and large files from Web server to clients puts
large overhead in Web server, rather than PHP runtime.
6. Discussion
Figure 10 shows a profile of the CPU usage in the server machine
running the SPECweb2005 E-commerce scenarios. It breaks down
the CPU usage for the PHP runtimes according to how much CPU
time is spent in each shared library component. A total of 29% of
the CPU time is spent in the P9 runtime, whereas 57% is spent in
the PHP 5 runtime.
We conclude that the compiler approach is effective in improving the end-to-end throughput of the Web servers with dynamic
scripting language programs. The improvements over the PHP 5
runtime come from the reduced interpretive overhead for program
execution.
To address the overhead in the runtime helper functions and
extensions, an inlining technique used for Java Native Interface
(JNI) inlining [20] might be useful. The technique of inserting
guards to increase the opportunities of inlining, the polymorphic
inline cache [11], could be used to inline an invoked function whose
target is uncertain at compile-time.
The large memory copy overheads are due to the communications between the Web server and the PHP runtime. This overhead
could be eliminated by specializing the PHP runtime for the Web
application while using zero-copy techniques [21].
Banking Scenario
The banking scenario represents an online banking Web application, characterized as a secure Web application with SSL communications, such as checking account information, transferring money
to different accounts, and so forth. The average amount of data sent
to each Web client is 34.8 KB, as shown in Table 1.
Figure 9 shows the performance results of the Banking scenario
in SPECweb with the two runtime configurations, PHP 5 and P9.
As shown in the graph, the peak performance of PHP 5 is 1,000
sessions, whereas the it is 1,200 sessions for P9, showing about
20% advantage with P9. This performance advantage of P9 comes
from the fact that the runtime exploits just-in-time compilation, and
PHP 5 is an interpreter-based runtime. The language runtime core
overhead must be enough large even comparing to the overhead
of encryption and decryption in SSL, which could be known to
become dominant in light Web applications in Java.
E-commerce Scenario
The E-commerce scenario represents an online shopping Web application that supports searching for certain products, displaying
product details, and finally purchasing the products. SSL processing is only used at checkout time. The average amount of data in
this scenario is around 144 KB, which is larger than other scenarios.
The peak performance of PHP 5 is 1,400 sessions, and P9 peeks
at 1,800 sessions. This result demonstrates that our JIT compilerbased optimization approach outperforms the original runtime,
PHP 5 runtime by about 30%.
Support Scenario
Finally the Support scenario represents a company support website
where customers can download files such as drivers and manuals.
The portion of dynamic content is small, and many files are sent
from the Web server without the use of any PHP runtime. The
average amount of data size is 78 KB.
The peak performance of PHP 5 is 1,000 sessions, while P9
handles 1,200 sessions, having about 20% advantage over PHP 5.
Even though SSL processing is not involved in this scenario, simply
7. Related Work
Compilers
There are a number of ongoing efforts to compile dynamic scripting
languages. A popular approach is to compile the scripts into an
existing bytecode. For example, Phalanger [2] is a PHP language
compiler for the .NET Framework. We tried the downloadable
version of Phalanger (v2.0 Beta 2) with our microbenchmarks and
found the performance similar to be similar to the Zend interpreter,
possibly because we could not tune the configuration adequately.
We were also unable to this system to run SPECweb2005, which is
not currently in the list of supported applications. Microsoft DLR is
a framework to support various dynamic scripting languages for the
.NET Framework including IronPython 8 and IronRuby 9 , which
are for Python and Ruby. The latest versions of Phalanger are also
reported to support DLR.
Jython [12] 10 and JRuby 11 are Python and Ruby compilers generating Java bytecode. The P8 runtime in Project Zero 12 is another
Java-based PHP compiler. However, for a runtime using an existing
Java VM, the lack of expressiveness in the bytecode is problematic.
JSR-292 proposes to add the invokedynamic instruction to Java
bytecode as an inexpensive mechanism to call methods for untyped
dynamic objects.
The PyPy project [18] takes an implementation approach that
attempts to preserve the flexibility of Python, while still allowing for efficient execution. This is achieved by limiting the use
of the more dynamic features of Python to an initial, bootstrapping phase. This phase is used to construct a final, restricted python
program, RPython [1], that is actually executed. RPython is a subset of Python with static typing and no dynamic modifications of
classes or method definitions. However, it can still take advan8 http://www.codeplex.com/IronPython
9 http://ironruby.rubyforge.org/
10 http://www.jython.org/
11 http://dist.codehaus.org/jruby/
12 http://www.projectzero.org/
Banking scenario
Good
Failed
18000
18000
16000
16000
6000
Simultaneous Sessions
3000
2800
2600
2400
2200
200
3000
2800
2600
2400
2200
2000
1800
1600
1400
1200
800
1000
600
0
400
2000
2000
4000
2000
1800
4000
8000
1600
6000
10000
1400
8000
Failed
12000
1200
10000
Tolerable
14000
800
12000
1000
14000
600
Thoughput (requests/sec)
20000
400
Tolerable
20000
200
Thoughput (requests/sec)
Good
Simultaneous Sessions
PHP 5 runtime
P9 runtime
E-commerce scenario
Failed
Good
18000
16000
16000
6000
3000
2800
2600
2400
2200
200
3000
2800
2600
2400
2200
2000
1800
1600
1400
1200
1000
800
600
400
2000
2000
4000
2000
1800
4000
8000
1600
6000
10000
1400
8000
Failed
12000
1200
10000
Tolerable
14000
1000
12000
800
14000
600
Thoughput (requests/sec)
18000
400
Tolerable
20000
200
Thoughput (requests/sec)
Good
20000
Simultaneous Sessions
Simultaneous Sessions
PHP 5 runtime
P9 runtime
Support scenario
Failed
Good
12000
12000
6000
Simultaneous Sessions
Simultaneous Sessions
PHP 5 runtime
P9 runtime
3000
2800
2600
2400
2200
2000
1800
1600
3000
2800
2600
2400
2200
2000
1800
1600
1400
1200
1000
800
600
0
400
2000
1400
4000
2000
1200
4000
Failed
8000
1000
6000
Tolerable
10000
800
8000
600
10000
400
Thoughput (requests/sec)
14000
200
Tolerable
14000
200
Thoughput (requests/sec)
Good
2%
Disk I/O
Profiling Overhead
Connection
to Backend
1%
Web Server
Core 1%
2%
12%
Memory
Copy
22%
SSL
PHP
Runtime
58%
45%
Runtime
Core
9%
Memory
Copy
Profiling oprofile/oprofiled
PHP Runtime e1000
PHP Runtime libc
PHP Runtime php-cgi
6%
Connection to Clients
Web
Server Disk I/O Profiling
Core 2%2%
Overhead
Connection
to Backend
4%
28%
3%
SSL
PHP
Runtime
13%
Memory
Copy
29%
13%
Memory
Copy
Runtime
Core
24%
11%
Connection to Clients
8. Concluding Remarks
In this paper, we explored the potential acceleration of the dynamic
scripting language PHP in the context of server-side usage for Web
applications. We modified an existing production quality just-intime compiler for Java to support PHP to evaluate the effectiveness of the compilation technologies in this context. We compared
pure scripting runtime engine performance with microbenchmarks
and then measured the JIT acceleration of the end-to-end PHP Web
application server performance with the industry standard benchmark, SPECweb2005. The experimental results showed about 2030% advantages with our JIT-based acceleration, showing that the
acceleration of dynamic scripting language processing matters in
a realistic Web application environment, and that significant overheads still remain in the language runtime as well as in the communications with the Web server front-end.
This paper found low-hanging fruit for the performance improvements in server-side dynamic scripting language implementations, which are currently popular and widely used in Web application development. By applying simple, but yet effective, implementation strategies, we can raise the bar for performance improve-
Acknowledgments
We thank anonymous reviewers of this paper for their insightful
comments, constructive criticism, and advices, which were largely
reflected on the final version of this paper.
References
[1] D. Ancona, M. Ancona, A. Cuni, and N. D. Matsakis. RPython: a step
towards reconciling dynamically and statically typed oo languages. In
DLS 07: Proceedings of the 2007 Symposium on Dynamic Languages,
pages 5364, New York, NY, USA, 2007. ACM.
[2] J. Benda, T. Matousek, and L. Prosek. Phalanger: Compiling and running PHP applications on the Microsoft .NET platform. In Proceedings of .NET Technologies 2006, the 4th International Conference on
.NET Technolgoies, Plzen, Czech Republic, May 29 - June 1, 2006,
pages 1120, 2006.
[3] R. Cartwright and M. Fagan. Soft typing. In Proceedings of the
SIGPLAN 91 Conference on Programming Language Design and
Implementation, pages 278292, 1991.
[4] E. Cecchet, A. Chanda, S. Elnikety, J. Marguerite, and W. Zwaenepoel.
Performance comparison of middleware architectures for generating dynamic web content. In Proceedings of Middleware 2003,
ACM/IFIP/USENIX International Middleware Conference, Rio de
Janeiro, Brazil, June 16-20, 2003, volume 2672 of Lecture Notes in
Computer Science, pages 242261. Springer, 2003.
[5] C. D. Chambers. The design and implementation of the self compiler,
an optimizing compiler for object-oriented programming languages.
PhD thesis, Stanford University, Stanford, CA, USA, 1992.
[6] J.-D. Choi, M. Gupta, M. J. Serrano, V. C. Sreedhar, and S. P. Midkiff.
Escape analysis for Java. In Proceedings of OOPSLA99, Proceedings
of the 1999 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, number 10 in SIGPLAN
Notices vol.34, pages 119, Denver, Colorado, USA, November 1999.
ACM.
[7] K.-F. Faxen. Representation analysis for coercion placement. In Static
Analysis, 9th International Symposium, SAS 2002, Madrid, Spain,
September 17-20, 2002, Proceedings, volume 2477 of Lecture Notes
in Computer Science, pages 278293. Springer, 2002.
[8] A. Gal, C. W. Probst, and M. Franz. Hotpathvm: an effective jit
compiler for resource-constrained devices. In VEE 06: Proceedings
of the 2nd international conference on Virtual execution environments,
pages 144153, New York, NY, USA, 2006. ACM.
[9] N. Grcevski, A. Kielstra, K. Stoodley, M. G. Stoodley, and V. Sundaresan. Java just-in-time compiler and virtual machine improvements for
server and middleware applications. In Proceedings of the 3rd Virtual
Machine Research and Technology Symposium, May 6-7, 2004, San
Jose, CA, USA, pages 151162. USENIX, 2004.
[10] F. Henglein. Global tagging optimization by type inference. In Proc.
1992 ACM Conf. on LISP and Functional Programming (LFP), San
Francisco, California, pages 205215. ACM Press, 1992.
[11] U. Holzle, C. Chambers, and D. Ungar. Optimizing dynamicallytyped object-oriented languages with polymorphic inline caches. In
ECOOP 91: Proceedings of the European Conference on ObjectOriented Programming, pages 2138, London, UK, 1991. SpringerVerlag.
[12] J. Hugunin. Python and Java - the best of both worlds. In Proceedings
of the 6th International Python Conference, October 14-17, 1997,
pages 3138, 1997.
[13] X. Leroy. Unboxed objects and polymorphic typing. In Conference
Record of the Nineteenth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 177188, Albequerque, New Mexico, 1992.
[14] Y. Minamide and J. Garrigue. On the runtime complexity of typedirected unboxing. In ICFP 98: Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming, pages
112, New York, NY, USA, 1998. ACM.
[15] Y. G. Park and B. Goldberg. Escape analysis on lists. In PLDI 92:
Proceedings of the ACM SIGPLAN 1992 Conference on Programming
Language Design and Implementation, pages 116127, New York,
NY, USA, 1992. ACM.
[16] U. V. Ramana. Some experiments with the performance of lamp architecture. In Proceedings of CIT 2005, Fifth International Conference
on Computer and Information Technology, 21-23 September 2005,
Shanghai, China, pages 916921. IEEE Computer Society, 2005.
[17] A. Rigo. Representation-based just-in-time specialization and the
psyco prototype for python. In Proceedings of PEPM 2004, the 2004
ACM SIGPLAN Workshop on Partial Evaluation and Semantics-based
Program Manipulation, 2004, Verona, Italy, August 24-25, 2004,
pages 1526. ACM, 2004.
[18] A. Rigo and S. Pedroni. PyPys approach to virtual machine construction. In OOPSLA 06: Companion to the 21st ACM SIGPLAN conference on Object-Oriented Programming Systems, Languages, and
Applications, pages 944953, New York, NY, USA, 2006. ACM.
[19] Standard Performance Evaluation Corporation. SPECWeb2005, 2005.
http://www.spec.org/web2005/.
[20] L. Stepanian, A. D. Brown, A. Kielstra, G. Koblents, and K. Stoodley.
Inlining Java native calls at runtime. In VEE 05: Proceedings of
the 1st ACM/USENIX International Conference on Virtual Execution
Environments, pages 121131, New York, NY, USA, 2005. ACM.
[21] T. Suzumura, M. Tatsubori, S. Trent, A. Tozawa, and T. Onodera.
Highly scalable web applications with zero-copy data transfer. In
Proceedings of the 18th International Conference on World Wide Web,
WWW 2009, Madrid, Spain, April 20-24, 2009, pages 921930, 2009.
[22] L. Titchkosky, M. F. Arlitt, and C. L. Williamson. A performance comparison of dynamic web technologies. SIGMETRICS Performance
Evaluation Review, 31(3):211, 2003.
[23] A. Tozawa, M. Tatsubori, T. Onodera, and Y. Minamide. Copy-onwrite in the PHP language. In Proceedings of POPL 2009, the 36th
ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, Savannah, Georgia, USA, January 21-23, 2009, pages
200212. ACM, 2009.
[24] S. Trent, M. Tatsubori, T. Suzumura, A. Tozawa, and T. Onodera. Performance comparison of PHP and JSP as server-side scripting languages. In Proceedings of Middleware 2008, ACM/IFIP/USENIX 9th
International Middleware Conference, Leuven, Belgium, December 15, 2008, pages 164182. Springer, 2008.
[25] S. R. Warner and J. S. Worley. SPECweb2005 in the real world: Using
IIS and PHP. In Proceedings of 2008 SPEC Benchmark Workshop,
Millbrae, CA, USA, January 27, 2008. Standard Performance Evaluation Corporation (SPEC), 2008.