Sunteți pe pagina 1din 10

Amazon Redshift Concurrency Scaling Preview

This prerelease documentation is confidential and is provided under the terms of your
nondisclosure agreement with Amazon Web Services (AWS) or other agreement governing
your receipt of AWS confidential information.

Welcome to the Amazon Redshift Concurrency Scaling preview. In this document you can find
steps to prepare a cluster to preview the Concurrency Scaling feature.

Important:

This feature is in preview. Don't use Concurrency Scaling with production workloads
during preview. Testing in production is not currently supported.

l
tia
What is Concurrency Scaling?

A typical data warehouse has significant variance in concurrent query usage over the course of a
day. It is more cost-effective to add resources just for the period during which they are required
en
rather than provisioning to peak demand.

Concurrency Scaling adds transient capacity when you need to handle heavy demand from
concurrent users and queries. When your workload increases, Amazon Redshift automatically
fid
routes queries to a scaling cluster, bypassing the WLM queues. With Concurrency Scaling you
can have virtually unlimited query concurrency.

Preview Requirements
on

These limitations do not apply to the GA release feature.

The cluster you use during for the Concurrency Scaling preview has the following requirements:
C

• Must be in the us-east-1 AWS Region


• Node type must be dc2.8xl or ds2.8xl
• Maximum cluster size is 32 compute nodes
• Platform must be EC2-VPC. EC2-Classic is not supported
• Cluster resize is not supported

During preview, the following types of queries are not eligible for concurrency scaling:
• Write queries. Data definition language (DDL) and data manipulation language (DML)
queries run only on the main cluster.
• Queries that reference external tables using Redshift Spectrum are served only through
the main cluster.

Prerequisites
1. Create a new cluster. We recommend creating a two node dc2.8xl cluster or larger.
2. Send us the cluster endpoint so we can disable billing. Please send the cluster endpoint
to RedshiftConcurrencyScalingPreview@amazon.com.
3. Load your data.

Step 1: Configure WLM for Concurrency Scaling

Concurrency scaling routes queries to the scaling cluster using a WLM query queue. To enable
concurrency scaling, set the Burst mode to auto for a queue.

Always use a test parameter group and ensure it is not used in production.

A query is routed to the concurrency scaling cluster, or scaled, based on the following criteria:
• The query doesn't use temp tables. Some queries that use subqueries create temp
tables.
• The query is read-only. Data definition language (DDL) and data manipulation language
(DML) queries are not scaled.

l
• The query doesn't use Redshift Spectrum to reference external tables.

tia
• The query must encounter queueing to be eligible for concurrency scaling.
If your workload encounters very little or no queueing, we recommend lowering your
concurrency settings. This will increase the number of queries that are queued, which makes
scaling more likely to occur.
en
To configure WLM for Concurrency Scaling

1. Create a new parameter group for testing.


fid

We strongly recommend creating a new parameter group to be sure it's not used on a
production cluster.
on

To create a parameter group for testing using the console

a. Sign in to the AWS Management Console and open the Amazon Redshift console
C

at https://console.aws.amazon.com/redshift/.
b. In the navigation pane, choose Parameter Groups.
c. On the Parameter Groups page, choose Create Cluster Parameter Group.
d. In the Create Cluster Parameter Group dialog box, enter a parameter group name and a
parameter group description.
l
tia
e. Choose Create.

For more information, see Managing Parameter Groups Using the Console.
en
To create a parameter group for testing using the CLI

Run the following CLI command.


fid

aws redshift create-cluster-parameter-group --parameter-group-


name concurrency-scaling-test --parameter-group-family
default.redshift-1.0 --description "Testing concurrency
on

scaling"

2. Configure a WLM queue to enable concurrency scaling


C

You enable Concurrency Scaling by configuring a Workload Manager (WLM) queue to


use Burst mode.

To configure a WLM queue to enable concurrency scaling

a. In the navigation pane, choose Workload management.


b. In Parameter groups, choose the concurrency-scaling-test parameter group.
c. On the Workload Management Configuration page, choose Edit.

d. On the default queue, in the Burst mode column, choose auto, then choose Save.

l
tia
en
fid
on

Step 2: Launch a Test Cluster


C

To use the Concurrency Scaling preview, you need to launch a new test cluster with the
following requirements:
• Must be in the us-east-1 AWS Region
• Node type must be dc2.8xl or ds2.8xl
• Maximum of 32 compute nodes
• EC2-VPC platform

In addition, configure your test cluster using the following configuration options:
• The parameter group you created for testing in Step 1.
• The Private Beta maintenance track named burst-beta.

You can launch a test cluster using the console or the CLI. If you have a snapshot with test data,
you can restore from the snapshot using the CLI.

For more information, see Creating a Cluster by Using Launch Cluster.


Note:
After you create a cluster, please send an email to
RedshiftConcurrencyScalingPreview@amazon.com
with your cluster endpoint. We will disable billing on your test cluster while the preview is in
progress, so you won't be charged for running your tests.

To launch a test cluster using the console

1. Sign in to the AWS Management Console and open the Amazon Redshift console
at https://console.aws.amazon.com/redshift/.
2. For AWS Region, choose US East (N. Virginia).

l
tia
3. Choose Launch Cluster.
en
Don't use Quick Launch to create your test cluster.

4. On the Cluster Details page, enter values for all options, and then choose Continue.
fid
5. On the Node Configuration page, specify values for the following options, and then
choose Continue.
• Node Type – Choose dc2.8xl or ds2.8xl
• Cluster type – Choose Multi Node
on

• Number of compute nodes* - Enter a maximum of 32 nodes

6. On the Additional Configuration page, specify values for the following options, and then
choose Continue.
C

• Cluster Parameter Group – Choose the test parameter group you created in Step 1.
• Choose a VPC – Choose a VPC and configure the VPC options. EC2-Classic is not
supported for the preview.
l
tia
7. For Maintenance Track, choose Private Beta, and then choose burst_beta. If you don't see
en
these options, contact the Amazon Redshift team to enable your account for the preview.
fid
on
C

To launch a test cluster using the using the CLI


Run the following CLI command to launch a cluster onto the BURST_BETA maintenance track.

aws redshift create-cluster --cluster-identifier <cluster_name> --cluster-type multi-node


--node-type <node_type> --number-of-nodes <number_of_nodes> --master-username
<master_user_name> --master-user-password '<password>' --maintenance-track-name
BURST_BETA --region us-east-1
For more information, see create-cluster.

To restore from a snapshot using the CLI


If you have a snapshot with test data, you can restore from the snapshot using the CLI. The new
cluster must meet the requirements for the concurrency scaling preview.

Restoring from a snapshot using the console is not supported during preview.

Run the following CLI command to restore from a snapshot onto the BURST_BETA maintenance
track.
aws redshift restore-from-cluster-snapshot --cluster-
identifier <cluster identifier> --snapshot-identifier <snapshot
identifier> --maintenance-track-name BURST_BETA --region us-east-1

For more information, see restore-from-cluster-snapshot.

l
tia
Step 3: Load Test Data

We provided a test data set you can load for testing Concurrency Scaling.
en
To load test data

1. Download the SQL script at https://github.com/awslabs/amazon-redshift-


fid
utils/blob/master/src/CloudDataWarehouseBenchmark/Cloud-DWB-Derived-from-
TPCDS/3TB/ddl.sql

The script contains the CREATE TABLE definitions and COPY statements to load the data
on

set.

2. Modify the COPY statements to add your credentials. You can use your access key
credentials or use an IAM role. For more information, see Authorization Parameters.
C

The following example shows a COPY statement with access key credentials.

copy store_sales from 's3://redshift-downloads/TPC-


DS/3TB/store_sales/'
credentials
'aws_access_key_id=<user_access_key_id>;aws_secret_access_key=
<user_secret_access_key>'
gzip delimiter '|' region 'us-east-1';

The following example shows a COPY statement with IAM role credentials.

copy store_sales from 's3://redshift-downloads/TPC-


DS/3TB/store_sales/'
credentials 'aws_iam_role=<iam-role-arn>'
gzip delimiter '|' region 'us-east-1';
Note:
After loading data, create a manual snapshot to ensure that the scaled clusters have the
current data.

Step 4: Run tests

View the Queries tab for your cluster. If a query was sent to a scaling cluster, the value in the
Executed on column is burst.

FAQ

l
tia
Q: I don’t use the US East (us-east-1) region. Can you make an exception and allow preview in
another region?
We only support US East (us-east-1) region for the preview. However you can copy a snapshot
cross region.
en
Q: How do I copy snapshot across regions?
See Configuring Cross-Region Snapshot Copy for a Non-Encrypted Cluster.
fid

Q: My cluster was created in the "current" track. How do I move it to the "BURST_BETA"
track?
If your cluster was created in the regular maintenance track called 'Current', it means it does not
on

have the concurrency scaling feature installed.


Use the following command to change the maintenance track:
aws redshift modify-cluster --cluster-identifier <cluster
identifier> --maintenance-track-name BURST_BETA
C

Change the maintenance window to your earliest convenience to allow the concurrency scaling
patch to be applied.

Q: Do I need to restart the cluster when I change the burst mode on a WLM queue?
Yes, for the preview, you need to restart the cluster.

Q: I don’t see queries flagged as executed on ‘burst’, what should I check?


1. Ensure that you restarted the cluster after updating a WLM queue with burst ‘auto’.
2. Check that the system table STL_QUERY has query with field burst_reason=0,
using a query such as:
SELECT count(*) WHERE burst_reason = 0;

If the query returns an error because the field burst_reason doesn't exist, the
cluster hasn't been patched with the concurrency scaling feature.
3. Ensure that you send more queries in parallel than the number of slots configured
for the WLM queue.
4. If you made updates to the tables, run a backup to be sure the data is on the scaling
cluster.

System tables
To help monitor concurrency scaling, new fields have been added to the following
system tables.

stv_inflight/stl_query
• New field named 'burst_reason' indicates where a query was run.
• Possible values:
o 0 - Concurrency scaling applied, ran on additional cluster
o 1 - Ran on main cluster

l
o 2 - Concurrency scaling was not enabled, ran on main cluster

tia
o 3 - Query not eligible for concurrency scaling, ran on main cluster
o 4 - System temporary table is used, ran on main cluster
o 5 - System temporary table is used, ran on main cluster
o 6 - System table is used, ran on main cluster
en
o 7 - Internal use, ran on main cluster
o 8 - Internal use, ran on main cluster
o 9 - Query contains Python UDF, ran on main cluster
o 10 - Catalog table is used, ran on main cluster
fid

o 11 - Internal use, ran on main cluster


o 12 - Internal use, ran on main cluster

SVB Views
on

A set of system views with the prefix SVB provide details for concurrency scaling queries. The
SVB views parallel SVL views with the same suffix.
C

svb_compile
• Equivalent: svl_compile
• https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_COMPILE.html
• Differences: None
svb_query_report
• Equivalent: svl_query_report
• https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_QUERY_REPORT.html
• Differences: SVB does not provide Slice level information.
svb_query_summary
• Equivalent: svl_query_summary
• https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_QUERY_SUMMARY.html
• Differences: None
svb_s3client
• Equivalent: stl_s3client
• https://docs.aws.amazon.com/redshift/latest/dg/r_STL_S3CLIENT.html
• Differences: None
svb_s3client_error
• Equivalent: stl_s3client_error
• https://docs.aws.amazon.com/redshift/latest/dg/r_STL_S3CLIENT_ERROR.html
• Differences: None
svb_s3query
• Equivalent: svl_s3query
• https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_S3QUERY.html
• Differences: SVB does not provide Slice level information.
svb_s3query_summary
• Equivalent: svl_s3query_summary
• https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_S3QUERY_SUMMARY.html
• Differences:None
svb_stream_segs
• Equivalent: stl_stream_segs

l
• https://docs.aws.amazon.com/redshift/latest/dg/r_STL_STREAM_SEGS.html

tia
• Differences: None
svb_unload_log
• Equivalent: stl_unload_log
en
• https://docs.aws.amazon.com/redshift/latest/dg/r_STL_UNLOAD_LOG.html
• Differences: SVB does not provide Slice level information anymore.

Current System Tables


fid

The following tables contain updated information and or columns related to concurrency
scaling queries.
on

• stl_explain
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_EXPLAIN.html
• stl_plan_info
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_PLAN_INFO.html
C

• stl_query
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_QUERY.html
• stl_query_metrics
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_QUERY_METRICS.html
• stl_querytext
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_QUERYTEXT.html
• stl_utilitytext
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_UTILITYTEXT.html
• stl_wlm_query
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_WLM_QUERY.html
• stl_wlm_rule_action
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_WLM_RULE_ACTION.html

S-ar putea să vă placă și