Documente Academic
Documente Profesional
Documente Cultură
TABLE OF CONTENTS
DOCUMENT VERSION ............................................................................................................................... 3
DESCRIPTION ............................................................................................................................................ 3
WHY DO WE NEED STATISITCS ON VIRTUAL TABLES FOR HANA SMART DATA ACCESS? ............. 4
HOW CAN STATISTICS ON VIRTUAL TABLES BE CREATED? ............................................................... 4
Creation of statistics with ABAP program ................................................................................................ 4
Creation of statistics with SQL console ................................................................................................... 5
HOW ABOUT AN EXAMPLE? .................................................................................................................... 6
2
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
Document history
3
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
WHY DO WE NEED STATISITCS ON VIRTUAL TABLES FOR HANA SMART DATA ACCESS?
Virtual tables are used in the context of HANA Smart Data Access to connect to a remote source. In order to
create an optimized query execution plan, HANA should have database statistics for the virtual table. The
simplest statistics would be just the number of records of the source table. If there are no statistics, a default
value will be used (see article “How can I create an Open ODS View of type Virtual Table?” on this SCN page?).
Note: there is currently no way to create the statistics for the virtual table out of the DB statistics for the table
in the remote source, but they are created basically by single COUNT statements. Steps into this direction
are planned, but no concrete time line can be provided. Of course, the query execution in the remote
database optimizes the query execution based on its own techniques.
As of HANA Revision 74.01 a new default cardinality for virtual tables has been introduced. If database
statistics are not available for the virtual table, then HANA assumes a cardinality of 1 million records for the
virtual table (formerly 10.000 records). This should better “protect” the source database against expensive
queries caused by suboptimal query optimization. The default cardinality is set by parameter
virtual_table_default_cardinality in the indexserver.ini (section smart_data_access).
Statistics can be created with the HANA Studio SQL console. Alternatively BW provides the program
RSSDA_CREATE_TABLE_STAT to create statistics which can also be used to refresh statistics periodically.
Execute the program RSSDA_CREATE_TABLE_STAT with the following selections, see also note 1990181:
InfoProvider
Name of Open ODS View with SAP HANA Smart Data Access
Or Name of InfoProvider with Near-line Storage using SDA for read access.
Fieldname
Enter a fieldname if statistics should be created for selected fields only. If no field is provided, the statistics
are created for all fields of the virtual table.
HANA can optionally create histogram statistics to better evaluate the costs. Please note that this option
causes a higher workload on the remote source during statistics creation than simple statistics.
4
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
In order to better evaluate the costs of semi-join optimizations, simple statistics should however be created
on all fields which are potentially in a join condition.
The best possible query optimizations however rely on the full set of statistics, which also include histogram
information. These statistics can be created as follows:
As a starting point we recommend creating simple statistics on one low cardinality field. For further fine
tuning also e.g. histograms could be used see the HANA SQL reference (section 1.8.1.12) and note
1872652 for more information.
Note: As for other classic DB statistics, it is not necessary to re-create/refresh the statistics on Virtual Tables
after each change of the data in the remote source, but only if significant changes, e.g. massive growth or
different value distribution occurred.
5
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
We show how the behavior changes with and without statistics on a virtual table. This query is built on top of
an Open ODS View called “XSB_01B_BZH”. To be sure that no statistics are available for the virtual table,
we drop the statistics in the SQL console:
The BW query statistics show that the database time for the query took 128 seconds.
6
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
The SQL statement sent to the remote database does not contain any filter condition, which means that no
SEMI-join or join-relocation is applied.
SELECT
"W1"."PRODUCT",
"W1"."STORE",
"W1"."DOC_CURRENCY",
COUNT(*),
SUM("W1"."COSTWT")
FROM
"SAPKIT"."YSB_50MIO" "W1"
GROUP BY
"W1"."PRODUCT",
"W1"."STORE",
"W1"."DOC_CURRENCY"
We are executing this statement with select count (*) in the remote source (here a remote HANA DB) to find
out how much records are selected in the source. This information can also be found in the local HANA
under Provisioning Smart Data Access:
Now statistics are created for the virtual table of the Open ODS View with program
RSSDA_CREATE_TABLE_STAT. The HANA Query optimizer is now aware of the number of rows of the
source table to improve the query plan optimization.
7
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
To compute the statistics, a SQL query is executed in the remote database as shown below (visible when the
federation trace is set to “debug”):
SELECT
"/BIC/EXSB_01B_BZH"."DOC_CURRENCY",
COUNT(*)
FROM
"SAPKIT"."YSB_50MIO" "/BIC/EXSB_01B_BZH"
GROUP BY
"/BIC/EXSB_01B_BZH"."DOC_CURRENCY"
ORDER BY
"/BIC/EXSB_01B_BZH"."DOC_CURRENCY" ASC
The number of selected records corresponds to the cardinality of the field. Therefore the smaller the
cardinality the faster the statistics are created. As mentioned at the beginning of this document, it is planned
to change this with the implementation of a new statistics concept.
Now the same query is executed again with statistics for the virtual table of InfoProvider XSB_01B_BZH:
8
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
Under Provisioning Smart Data Access in the HANA Studio e.g. the SQL statement is shown as it is sent
to the remote source (with IN-clause):
SELECT
SQ.*
FROM (SELECT
"W1"."PRODUCT" AS "PRODUCT",
"W1"."STORE" AS "STORE",
"W1"."DOC_CURRENCY" AS "DOC_CURRENCY",
COUNT(*) AS COL0,
SUM("W1"."COSTWT") AS COL1
FROM
"SAPKIT"."YSB_50MIO" "W1"
GROUP BY
"W1"."PRODUCT",
"W1"."STORE",
"W1"."DOC_CURRENCY" ) SQ
WHERE
SQ."STORE" IN ('CH05');
This statement is executed again in the remote source with select count (*) to find out how much records are
selected in the source database. This information can also be found in the local HANA under Provisioning
Smart Data Access:
If HANA does not have statistics about the size of the remote fact table, the query optimizer can only
generate a plan optimization bx using default values. These defaults may not be suitable in many scenarios
and therefore may lead to suboptimal query performance. In our example, the optimizer may decide sending
the 27 Mio. records from the remote source to the local HANA ifno optimiazation like semi-join are performed
(see picture 3). For details about the query execution optimizations please see document “How does a BEx
Query execution with SDA look like?” on this SCN page.).
After having created statistics, which provide the information that the fact table to be joined is big, the
optimizer was able to decide for the semi-join execution.
Also it is known to the local HANA - via the selective filter on field STORE (selection of one characteristic
value for STORE) – that the semi-join would reduce the result set dramatically. The cardinality of field
STORE is 406 in our example and the remote fact table has approximately 54 Mio.rows. Therefore about
133.000 records (if we assume equipartition) to be transferred from the remote source could be expected
when filtering on one characteristic value. In fact 77.827 rows had to be transferred from the remote source
(see picture 7).
9
SAP BW ON HANA & HANA SMART DATA ACCESS – VIRTUAL TABLE STATISTICS
So, despite the importance that the optimizer should have information about the number of rows in a source
table, statistics should be available for fields which are filtered (directly or indirectly via the join condition).
If the optimizer knows of the cardinality of a field, it can better judge the selectivityof a filter condition.
Example 1:
A table has 10.000 rows and 10.000 distinct values in a field. A single value filter on this field will return 0 or
1 record which means that this is a very selective filter.
Example 2:
A table has 10.000 rows and 1 distinct value in a field. A single value filter on this field can return up to
10.000 records which means that the filter might not be selective.
In this case histograms are important. With histograms on this field, HANA could recognize, if no records or
all records are returned according to the filter condition. In case alle records have to be returned it makes
sense not to execute a semi-join optimization.
When looking at the optimizations like semi-join or join relocation it is always the trade-off between costs for
the optimization execution and the saving for reading and transferring data from the source to the local
HANA.
10
www.sap.com
Crossgate, m@gic EDDY, B2B 360°, and B2B 360° Services are
registered trademarks of Crossgate AG in Germany and other
countries. Crossgate is an SAP company.
All other product and service names mentioned are the trademarks of
their respective companies. Data contained in this document serves
informational purposes only. National product specifications may vary.