Sunteți pe pagina 1din 4

ACCELERATED SCIENCE APPLICATIONS

CLEARSPEED APPLICATION NOTE:


Accelerating computer assisted
molecular modeling for drug design

Proteins are the basic unit of life, a scientific research and drug development.
fundamental component of all living Molecular mechanics is one aspect of
cells from our own, to the bacteria that molecular modeling, the benefit of which is
infect us, to the plants and animals that that it reduces the complexity of the system,
we eat. The more we understand of the allowing many more particles (atoms) to
be considered during simulations. This is
structure and function of a protein, the
in contrast to quantum chemistry where
more we understand about how life
each electron is considered.
works, or in some cases how it can go
wrong. Protein molecules are also the The University of Bristol's Biochemistry
target of most drug therapies. Department is one of the top two
bio-chemical research organizations in
To make proteins ribosomes string together the UK. The research of the Protein
amino acids into long, linear chains. Like Folding Group uses molecular modeling
skipping ropes, these chains loop and fold techniques to provide vital input to rational
about each other in a variety of ways, but drug design.
also like skipping rope only one of these
many ways actually allows the protein to Research into peptide-
function properly. Sometimes this folding
goes wrong and, in the worst case, a
based protease inhibitors
misfolded protein within a cell can also One of the many projects at Bristol
prevent the cells around it from functioning. concerns the research of protease
inhibitors. Recent studies indicate that
The amazing thing about proteins is not between 1 and 5% of an organism’s
only that they fold, but they do so to a genomes code for proteases, an
unique three dimensional shape which observation which reinforces the key role
governs their function. that proteases play in many biological
processes and diseases including cell
signaling, pro-enzyme maturation, viral
Introducing molecular infection, blood clotting, hypertension and
modeling Alzheimer's disease to name but a few. A
number of protease inhibitors are already
Molecular modeling combines theoretical available as drugs on the market which
research methods with computational target pathogens and form the key
techniques to reproduce the behavior of component of anti-HIV and blood pressure
molecules at the atomistic level. Molecular medication.
modeling is a subset of Bioinformatics
and, as we pass into the “post-genomic Peptides are simply short lengths of natural
sequence era”, it is thought that this field polypeptide and are typical substrates and
will play an ever more important role in products of proteases.

www.clearspeed.com
Many scientists believe that knowing more about The overall algorithm is composed of the following
peptide/protease combinations is key to treating a wide elements:
variety of medical conditions. At Bristol University, 1. The user defines a discrete search space (a 6-D grid)
scientists are researching protease inhibitors using a around the initial ligand (peptide) pose.
specific type of peptide against human elastase, a
protease which causes extensive scarring of lung tissue 2. The fitness of a pose is evaluated by a novel atom-atom
in emphysema. based empirical free energy force field.

A combination of initial molecular modeling and inspection 3. The grid positions may be evaluated exhaustively or by
of the crystal structures of the peptide identified five amino using a genetic-algorithm-like Monte Carlo search
acid residue positions on the peptide that could be used to method (EMC N. Gibbs, A.R. Clarke & R.B. Sessions,
affect the interaction of the peptide with proteins. Since Proteins 43:186-202 (2001))
each of the five positions could be occupied by one of 4. Ligand flexibility is treated by docking different
twenty amino acids, the total number of possible conformations of the peptide.
compounds that could be synthesized is 205 = 3.2 x106.
However, it is clearly impractical to synthesize and test the 5. Many ligands - In this case study a virtual library of 576
inhibitory properties of each possible peptide sequence as different peptide sequences was generated and each
this is similar to the total number of compounds available docked as a separate BUDE job. Shell scripting is used
to the world's pharmaceutical companies for testing to address this problem of trivial parallelization. Each
against. Therefore the team at Bristol devised a new sequence generates between 80 and 30,420 conformations
approach using an empirical-free-energy based docking to be docked, depending on the number of rotamers
program. associated with each amino acid in the peptide. In total,
1,966,272 docking operations are performed. Since
each docking operation searches some 5% of the grid
A new approach – Bristol University (4,225 poses), the energy of over 8 billion poses must
be calculated to evaluate the whole virtual peptide
Docking Engine (BUDE) library. Each peptide ligand is represented by about 100
For technical reasons related to solubility and concentration, atoms and the protein elastase has 1636 atoms.
it transpires that a library containing about 100 different
peptide sequences in one pot is the maximum convenient
size for testing and identification of a single (or series) Issue – how to accelerate the pace
of inhibitors. Such a library is easily prepared by mixed of discovery?
synthesis.
Using the docking engine requires significant
An experienced molecular modeler can use molecular computational power. Dr. Richard Sessions of the
graphics methods to generate an initial docking position University of Bristol's Biochemistry Protein Folding Group
(initial shape and structure referred to as a pose) of a plays a lead role in enabling ever more sophisticated
generic cyclic peptide (e.g. alanine at each of the 5 variable modeling techniques to be used for research. Dr Sessions
positions). Since evolution has selected the 20 natural explained that;
amino acids to cover a wide range of chemical diversity,
the individual amino acids can be grouped into a variety of
“In order to predict the binding affinities more
types that include large, small, hydrophilic, hydrophobic, accurately we needed a more detailed model to
positively charged and negatively charged. Hence the measure interaction between molecules more
molecular modeler can also make predictions of what type carefully. Unfortunately we were significantly
of amino acid would be best matched to the particular hampered, not by methodology, but by having
environment surrounding the five variable positions in the enough compute power to carry out our
initial pose. The modeler strives to choose an average of 4
or fewer candidate amino acids for each position, yielding
research… we simply didn't have enough floating
a virtual library of peptide sequences of 45 = 1024 point operations available to us.”
members or less. Initial investigation showed that it would take weeks to run
The BUDE computer algorithm described here was the BUDE system and gain a result using the local
designed to bridge the gap between the whole of this department cluster. Using the Universities HPC was an
option, but an expensive one as the majority of the
virtual library and a refined 10% identified as the best
compute power would be consumed with consequences
choice for actual synthesis and testing

2
ACCELERATED SCIENCE APPLICATIONS

on other university departments. Therefore Dr. Sessions Configuration


began looking for alternative solutions to accelerate the
The ClearSpeed Accelerated TeraScale System (CATS) is
pace of discovery. In addition to compute power a number
of other considerations had to be taken into account when a custom 1U high 19" rack mounted enclosure which can
looking for a possible solution, these included budget be populated with up to 12 ClearSpeed Advance e620
constraints and the fact that the current department’s accelerator cards. When fully populated, CATS has the
cluster was placed in a normal office with associated capability of up to 968 GFLOPS and typically consumes
space, cooling and power restrictions. less than 850 Watts and therefore demonstrates the power
and space efficiency, together with the scalability enabled
by ClearSpeed's accelerators. A single rack of 12 CATS
ClearSpeed Accelerated TeraScale nodes plus attached 1U server have a combined capability
of over 11 TeraFLOPS of double-precision computing power.
System (CATS) for incredible speedup
The high level of parallelism in the BUDE application meant An accelerated system was built consisting of twelve CATS
that finding a solution based on a parallel architecture nodes, each of which was connected to a standard 1U
would be ideal. ClearSpeed Advance acceleration technology dual processor, dual core 3.0GHz Intel Xeon server. The
was chosen due to its multi-core processor design and connection between the CATS node and its corresponding
ability to work within the existing environment. x86 host is via two PCI Express x8 cables.
“The ClearSpeed Advance accelerator was specifically
designed for the needs of the HPC community. With a
multi-core processor design and 96 processing
elements running at only 210 MegaHertz it had 64-bit
floating point speed with an average power consumption
of 10 watts to provide the processing power, accuracy
and energy efficiency we were looking for”, remarked
Dr. Sessions.

Methodology
The molecular modeller provides the receptor and ligand
start positions and defines a 6 dimensional search grid.
This grid is searched via a GA-like EMC (Evolutionary
Monte Carlo) procedure. The “currency” of the EMC is the
pose descriptor; it is the principal task of the docking
engine to translate that pose descriptor into a pose energy.
The BUDE source code is about 5,000 lines of FORTRAN.
Profiling the code shows that, as expected, more than 99%
of the execution time is spent in the energy calculation
routine which is about 500 lines of code. Accelerating the
algorithm required porting the energy calculation and
geometry routines to the ClearSpeed Advance accelerator.
Before the search begins, the initial coordinates of the protein
(elastase), and the ligand (cyclic peptide) are copied to the
Advance accelerator's on-board DRAM. When the search
requires the energies of a set of pose descriptors the program
translates these into a set of transformation matrices.
This set is copied to the Advance accelerator where the
transformations are applied and the energy calculated;
the results are then copied back to the host process. THE CLEARSPEED ACCELERATED TERASCALE SYSTEM (CATS)

3
CLEARSPEED TECHNOLOGY

Results
In the base case on the host, each BUDE run requires 100% CPU loading. However, when
ClearSpeed accelerators are employed, the bulk of the processing is moved to the
accelerator leaving only 2.5% CPU loading per job and therefore freeing the CPU for other
applications.
BUDE is able to scale on both multiple cores in the host Xeon, and on multiple cards in
the CATS node.
The initial measured figures for performance on a single card in the CATS node is
3.41 x speedup over one host core (3.0GHz Xeon Woodcrest). A CATS node performance
compared to 3GHz Xeon 4 host core is therefore (12 * 3.41)/4 = 10.2 x speedup over the
quad core 3GHz host node.
However, performance is only part of the outcome. The CATS system delivers performance
without compromising power consumption. The measured performance results of power
consumption are as follows
CATS = 550W HOST = 300W
Consequently the outcome is a 10.2 x speedup for 1.8x times the power or 5.6x greater
performance per watt.

Copyright 2007 ClearSpeed


Technology plc (“ClearSpeed”).
Performing twelve BUDE runs in parallel using one CATS node results in:

All rights reserved.


10.2 x speedup and 5.6x less energy consumed
All information in this document
is provided only as general
Further performance optimization of the code is expected to enhance this result still
information in connection with further. For the latest performance figures please visit
ClearSpeed products. Except www.clearspeed.com/acceleration/benchmarks
as provided in ClearSpeed's
terms and conditions of sale
for such products, ClearSpeed In Conclusion
assumes no liability whatsoever,
As Dr. Sessions concludes;
and ClearSpeed disclaims any
express or implied warranty
relating to sale and/or use of
“ClearSpeed acceleration has enabled us to accurately measure the inter-
ClearSpeed products, including action between virtual libraries of peptides and their target protein in a few
liability or warranties relating to hours rather than a number of weeks. We can exploit this either by
fitness for a particular purpose,
merchantability, or infringement dramatically reducing the time of the design phase, or by searching a
of any patent, copyright, or significantly wider area of chemical space. For example, a typical library
other intellectual property right.
ClearSpeed may make changes
design could be assessed in 2 weeks using a machine built from twelve
to specifications, product 3 GHz quad core Xeons, while the same work would be completed in one
descriptions, and plans at any day using a machine constructed from twelve CATS.”
time, without notice.
By selectively adapting the most numerically intensive routines to the ClearSpeed
ClearSpeed, ClearConnect and Advance accelerator, the time to result has been decreased by an order of magnitude.
Advance are trademarks or
registered trademarks of
The ClearSpeed Accelerated TeraScale System (CATS) demonstrates the incredible
ClearSpeed Technology plc or performance density, class-leading performance per watt and scalability of ClearSpeed
its group companies. All other solutions for high throughput applications.
marks are the property of their
respective owners.

V1 0711 REAL SCIENCE. REAL TOOLS. REAL NUMBERS.

www.clearspeed.com

S-ar putea să vă placă și