Sunteți pe pagina 1din 6

Runtimes of Reads and Loops on Internal Tables Subscribe

Siegfried Boes Print


Business Card Permalink
Company: SAP
Posted on Sep. 12, 2007 10:05 AM in ABAP, Application
Server

In ABAP coding many performance problems are related to the handling of


internal tables. Especially in the case of nested loops on two large tables, it is
important for the runtime of the inner operation to scale faster than linear,
otherwise the complete nested loop would show a quadratic scaling. Therefore, it is
essential to know the runtime behavior of internal table operations. In this web log
we will present precise measurement results of read statements on internal tables.

1. Nested Loop Processing

Assume your program contains a nested loop on two internal tables itab1 and itab2. The
most commonly used ones are the following three,

LOOP AT itab1 INTO wa1.


LOOP AT itab2 INTO wa2.
If ( wa2-key = wa1-key ).
.. processing ..
ENDIF.
ENDLOOP.
ENDLOOP.

Or

LOOP AT itab1 INTO wa1.


LOOP AT itab2 INTO wa2 WHERE key = wa1-key.
.. processing ..
ENDLOOP.
ENDLOOP.

Or

LOOP AT itab1 INTO wa1.


READ TABLE itab2 INTO wa2 WITH key = wa1-key.
IF ( sy-subrc = 0 ).
.. processing ..
ENDIF.
ENDLOOP.

The loop at table itab1 must process all lines. There is nothing to improve, as it will
scale linearly with the number of lines in table 1, i.e. with N1. It is obvious that the first
example loops over all lines in table itab2, i.e. N2. Altogether the nested loop scales
with N1 * N2. If both N1 and N2 grow with the amount of processed data N, which can
be the number of positions, contract lines etc., then the nested loop scales proportional
to N*N, i.e. it scales quadratically or nonlinearly.
It is often overseen that also examples 2 and 3 scale quadratically, as the ‘LOOP AT
WHERE’ and the READ with BINARY SEARCH on a standard table also scan the
whole table, i.e. itab2.
This, however, is not necessary, as only one or a few lines of itab2 correspond to one
line in itab1. These lines must be found efficiently to get a scaling behavior on N2
which is faster than linear.

LOOP AT itab1 INTO wa1.

Efficient Operation to find the line


or lines on table itab2 which correspond to wa1

ENDLOOP.

The following sections present measurements on different table types. How the
measurements must be done in order to get reliable results will be explained in my next
web log (coming soon).

2. Reads with Runtimes Independent of Table Size

The fastest access type is when the searched line can be accessed directly, which makes
this read type independent of the size of the internal table. This is achieved by

• ‘READ itab WITH index’ (red)


• ‘READ sort_tab’ WITH index’ (blue)
• ‘READ hash_tab WITH TABLE KEY …’ (green)
Figure 1: Averaged runtimes (in microseconds) for different fast reads on internal
tables as a function of the table sizes N. Red indicates the read with index on a standard
table, blue the read with index on a sorted table, and green the read on a hashed table
with a complete table key.

3. Reads with Runtimes with a Logarithmic Dependence on the Table


Size

The binary search splits the search area into two halves in every step, and checks in
which half it must continue to search. Therefore the runtime must have a logarithmic
dependence on the table size N. It is realized by

• ‘READ itab WITH KEY … BINARY SEARCH’ on standard tables (red) or


• ‘READ sort_tab WITH TABLE KEY …’ (blue)

Figure 2: Averaged runtimes (in microseconds) for different read with logarithmic
dependences on table size N. Red indicates the read with binary search on a standard
table, and blue the read on a sorted table with a table key.

Reads with keys which are only incomplete table keys can exploit the binary search and
the sorted search for the leading fields of the table key. The rest of the processing is a
linear scan.

4. Loops in Internal Tables


For completeness on nested loop processing a remark on loops as inner operation is
necessary. A loop is required if several lines of the inner table 2 can fulfill the key
condition. The loop can be realized by

• ‘LOOP AT itab WHERE …’


• ‘LOOP AT sort_tab WHERE
• READ itab WITH KEY … BINARY SEARCH, Save index, LOOP AT itab
FROM index, Exit if condition is not fulfilled anymore.

To be more precise the coding is added

It is important that the SORT on table 2 is outside of the LOOP on table 2, so that it is
executed only once. The BINARY SEARCH efficiently finds the starting point of the
loop, but it is equally important to leave the LOOP as soon as the condition is no longer
fulfilled.

For the LOOP AT WHERE on standard table we expect a linear dependence on the size
of table 2, as it must scan the whole table. The other two methods should show
logarithmic dependencies. Especially the - not so widely known - workaround on
standard tables can help to improve existing programs where a change of the table type
is no longer possible.
Figure 3: Averaged runtime (in microseconds) for ‘LOOP AT … WHERE’ with
different table sizes N. Black is the very expensive loop at where on standard table,
which is already for small values outside of the displayed scale. Blue indicates the loop
at where on a sorted table, and red the workaround on a standard using read binary
search plus loop from index.

Remarks: The measurements were executed in July 2007 on a pretty fast new machine.
Older servers will give larger absolute times; however, the relations should hold on all
systems.

Here, reliable measurements of reads and loops on internal tables have been
shown. The results are summarized in the table below. Only operations scaling
faster than linear should be used inside a nested loop, otherwise quadratic coding
is produced.
Recommendations for nested loops:

• Use sorted tables as inner tables, if possible, because the usage is simple and
fault tolerant. You do not have to track the number of sorts. Sorted tables
can be used with reads and also with loops if the access is not unique.
• In cases where the key is either one field only or always completely known,
hashed tables can be even better.
• In practise index accesses are in practise quite rare, because the index is
usually not known.
• Standard tables should be used as inner tables only in optimized ways
where the binary search can be exploited, either in the read or in the loop.
But the table must be sorted, and the number of sorts must be kept very
small. Optimally, there is only one sort and all later table changes keep the
sort order.
• Standard tables without optimization should not be used in a nested loop.
• The order of the tables must be checked, the outer table itab1 should be the
smaller one as it must be processed completely. There is no optimization
potential for the outer table, it can be standard table. A sort will have no
effect, and should therefore be avoided.
• There are more sophisticated ways of programming nested loops using
nested indexes which are even a bit more performance-efficient. These are,
however, more difficult to program and to test and are only recommended
in highly performance critical tasks.

S-ar putea să vă placă și