Infernal-GPU: Accelerating RNA Alignment with CUDA

Infernal-GPU:
CUDA-Accelerated RNA Alignment
Adam Bazinet
Infernal
Infernal: "INFERence of RNA ALignment"

A software package for searching DNA
sequence databases for RNA structure

and sequence similarities
Written and maintained by the Sean Eddy

laboratory at Janelia Farm
Ribonucleic Acid (RNA)

Much like DNA, RNA consists
of nucleotides (A, C, G, U)
Unlike DNA, RNA is usually

single-stranded
RNA molecules play a central
role in many cellular processes
mRNA
Messenger RNA (mRNA)
-Carries information about a protein to the ribosome
ncRNA
Non-coding RNA (ncRNA)
-a functional RNA molecule that is not translated into a protein
Many different sub-types; two of the most
abundant are ribosomal RNA (rRNA) and

transfer RNA (tRNA)
Infernal is primarily concerned with

these functional, non-coding RNAs
Secondary Structure
An example of an RNA stem-loop secondary structure
The functional forms of single-stranded RNA

molecules require a specific tertiary
structure, the scaffold of which is provided
by secondary structural elements
Secondary Structure
Small subunit ribosomal RNA, 5' domain

taken from the Rfam database
RNA Multiple Alignment
RNA Folding Algorithms

Use a dynamic programming algorithm to
computationally predict secondary structure

according to a thermodynamic model
Drawbacks:
-only call 50-70% of base pairs correctly, on average

-can be very computationally intensive (i.e., slow)
Excellent summary article by Sean Eddy

-Nature Biotechnology 22, 1457 - 1458 (2004)
Infernal
Takes an RNA multiple alignment as input,
and secondary structure must be provided!

-the secondary structure is often determined in the laboratory
Builds a covariance model (CM) from it

Searches a target sequence database for
possible matches to the input model
Covariance Model
CMs are a type of stochastic context-free
grammar (SCFG)
Each residue in the query RNA is represented by a

state, arranged in a tree-like structure that mirrors
the secondary structure of the RNA, along with
additional states to model insertions and deletions
Dynamic programming calculates the probability

that a substructure of the query rooted at state v
aligns to a subsequence i..j in the target sequence
Covariance Model
Computational Complexity
The most noteworthy limitation of SCFGs is their

computational complexity
SCFG-based RNA analysis algorithms require time and

memory proportional to at least L3 (where L is the
sequence length), because every possible pair of residues
(L2) must be tried against up to L/2 basepairing states in
the model (and in most RNA SCFGs, the time required
more typically scales as L4)
The latest version of Infernal incorporates some

heuristics to ameliorate the situation, but the
computational cost can still be considerable
Accelerating Infernal
There are two programs that would benefit
the most from speedup:
-cmcalibrate (part of model building)

-cmsearch (database searching)
Both use a banded version of the Inside

algorithm, which is nearly identical to the
Cocke-Younger-Kasami (CYK) database search
dynamic programming algorithm for CMs
CYK returns the optimal derivation, whereas

Inside returns the probability of the observation
Banded CYK Algorithm
Profiling Infernal
Used a short test run of cmsearch as a test case
FastIInsideScan is optimized for the CPU,

so there were 13 blocks of ILogsum calls - each
~25% of runtime was in FastIInsideScan, and

~22% of runtime was in ILogsum
of which is a potential target for parallelization
Parallelizing Infernal
Each block of ILogsum calls was inside a loop
Answer: with 22 billion kernel invocations, the

overhead of invoking the kernel was greater than the
work the kernel was actually doing!
Assigned each loop iteration to a separate GPU thread

Ensured there were no redundant memory transfers
However, the GPU version was ~9x slower than the
optimized CPU version why?
Switched to working with RefIInsideScan, a

simpler, non-optimized reference implementation
Saw an opportunity for parallelization at the level

of the v-loop (loop over CM states)
The v-loop was 229 iterations, each of which was

assigned to a separate GPU thread
v-loop was nested inside the j-loop (loop over

database sequence positions) j-loop was
~17,000 iterations, which means far fewer kernel
invocations than in FastIInsideScan
Even after moving all memory transfers
outside the j-loop, the program still ran ~7x

slower than the reference CPU program
Best current explanation is that the kernel is

not optimized there are large numbers of
incoherent reads/writes
Perhaps with additional work, a speedup can

be attained source code is available:
http://www.cbcb.umd.edu/~pknut777/
Takeaways
It was difficult to dive into complex scientific
code and attempt to parallelize it
Spent a LOT of time profiling the
application, determining the extents of host

arrays, chasing down runtime errors, etc.
Very much enjoyed learning about this

unique problem area of bioinformatics
Acknowledgments
Eric Nawrocki, a graduate student who
develops Infernal in the Eddy Lab, provided

helpful information along the way
Many thanks, Eric!
Questions?

Infernal-GPU: Accelerating RNA Alignment with CUDA

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Infernal-GPU: Accelerating RNA Alignment with CUDA

Încărcat de

Drepturi de autor:

Formate disponibile

Infernal-GPU:

CUDA-Accelerated RNA Alignment

Infernal: "INFERence of RNA ALignment"

sequence databases for RNA structure

Written and maintained by the Sean Eddy

Ribonucleic Acid (RNA)

Unlike DNA, RNA is usually

RNA molecules play a central

role in many cellular processes

-Carries information about a protein to the ribosome

-a functional RNA molecule that is not translated into a protein

Many different sub-types; two of the most

abundant are ribosomal RNA (rRNA) and

Infernal is primarily concerned with

An example of an RNA stem-loop secondary structure

The functional forms of single-stranded RNA

Small subunit ribosomal RNA, 5' domain

RNA Multiple Alignment

RNA Folding Algorithms

computationally predict secondary structure

-only call 50-70% of base pairs correctly, on average

Excellent summary article by Sean Eddy

and secondary structure must be provided!

Builds a covariance model (CM) from it

Each residue in the query RNA is represented by a

Dynamic programming calculates the probability

The most noteworthy limitation of SCFGs is their

SCFG-based RNA analysis algorithms require time and

The latest version of Infernal incorporates some

-cmcalibrate (part of model building)

Both use a banded version of the Inside

CYK returns the optimal derivation, whereas

Banded CYK Algorithm

Used a short test run of cmsearch as a test case

FastIInsideScan is optimized for the CPU,

~25% of runtime was in FastIInsideScan, and

of which is a potential target for parallelization

Each block of ILogsum calls was inside a loop

Answer: with 22 billion kernel invocations, the

Assigned each loop iteration to a separate GPU thread

Switched to working with RefIInsideScan, a

Saw an opportunity for parallelization at the level

The v-loop was 229 iterations, each of which was

v-loop was nested inside the j-loop (loop over

outside the j-loop, the program still ran ~7x

Best current explanation is that the kernel is

Perhaps with additional work, a speedup can

Spent a LOT of time profiling the

application, determining the extents of host

Very much enjoyed learning about this

Eric Nawrocki, a graduate student who

develops Infernal in the Eddy Lab, provided

Many thanks, Eric!

S-ar putea să vă placă și