Documente Academic
Documente Profesional
Documente Cultură
Figure 1: Use the point-and-click visual interface to build and refine clusters.
A visual data exploration and Key Features (continued)
discovery environment
SAS Visual Statistics is an add-on to SAS • Supports holdout data (training and validation) for model assessment.
Visual Analytics, meaning the products share • Supports pruning with holdout data.
the same interactive data exploration and • Supports autotuning.
predictive modeling interface. You get an
integrated process for going from data to • Logistic regression:
analytically derived insights. • Models for binary data with logit and probit link functions.
• Influence statistics.
Both the point-and-click interface and the • Supports forward, backward, stepwise and lasso variable selection.
programming interface let you easily identify • Variable selection, including iteration plot.
predictive drivers among multiple explor- • Frequency and weight variables.
atory variables, and visually discover and • Residual diagnostics.
understand outliers and data discrepancies. • Summary table includes model dimensions, iteration history, fit statistics,
convergence status, Type III tests, parameter estimates and response profile.
Visual data exploration makes it much easier • Generate on-demand predicted labels and predicted event probabilities as
to understand relationships in your data, new columns. Adjust the prediction cutoff to label an observation as event or
derive new variables and select relevant vari- nonevent.
ables to improve your model development • Supports holdout data (training and validation) for model assessment.
efforts. Find out which variables are relevant • Linear regression:
as inputs to your model and which variables • Influence statistics.
best define your segmentation strategy. • Variable selection, including iteration plot.
• Supports forward, backward, stepwise and lasso variable selection.
With integrated model building and visual • Frequency and weight variables.
data discovery, you can maintain an uninter- • Residual diagnostics.
rupted workflow, cycling quickly between • Summary table includes overall ANOVA, model dimensions, fit statistics,
hypotheses and verification. This will boost model ANOVA, Type III test and parameter estimates.
your modeling confidence, productivity and • Generate on-demand predicted values and residuals as new columns.
accuracy. • Supports holdout data (training and validation) for model assessment.
Figure 2: Quickly build and refine logistic regression models using the visual interface.
More advanced features are available in the programming interface.
The visual interface provides point-and-click
access to:
Key Features (continued)
• Linear regression. • Generalized linear models:
• Logistic regression. • Distributions supported include beta, normal, binary, exponential, gamma,
geometric, Poisson, Tweedie, inverse Gaussian and negative binomial.
• Generalized linear models.
• Supports forward, backward, stepwise and lasso variable selection.
• Generalized additive models.
• Variable selection, including iteration plot.
• Nonparametric logistic regression. • Offset variable support.
• Clustering. • Frequency and weight variables.
• Decision trees. • Residual diagnostics.
• Summary table includes model summary, iteration history, fit statistics, Type III test
When working with large and complex data, table and parameter estimates.
dimension reduction techniques like clus- • Informative missing option for treatment of missing values on the predictor variable.
tering and decision trees can improve • Generate on-demand predicted values and residuals as new columns.
modeling accuracy. You can explore and • Supports holdout data (training and validation) for model assessment.
evaluate segments for further analysis using • Generalized additive models:
k-means clustering, scatter plots and • Distributions supported include normal, binary, gamma, Poisson, Tweedie,
detailed summary statistics. Decision trees inverse Gaussian and negative binomial.
can be built for both classification and • Supports one- and two-dimensional spline effects.
regression. After creating a decision tree, • GCV, GACV and UBRE methods for selecting the smoothing effects.
you can interactively prune trees and train • Offset variable support.
subtrees. • Frequency and weight variables.
• Residual diagnostics.
SAS Visual Statistics users can copy their • Summary table includes model summary, iteration history, fit statistics and
models into Model Studio and continue the parameter estimates.
data mining or machine learning processes • Supports holdout data (training and validation) for model assessment.
in SAS Visual Data Mining and Machine
• Nonparametric logistic regression:
Learning.
• Models for binary data with logit, probit, log-log and c-log-log link functions.
• Supports one- and two-dimensional spline effects.
Open, code-based model
• GCV, GACV and UBRE methods for selecting the smoothing effects.
development
• Offset variable support.
While the visual GUI of SAS Visual Statistics is • Frequency and weight variables.
powerful and appealing, many statisticians,
data scientists and quantitative specialists
prefer to code their own predictive models
and take advantage of more options to fine-
tune the models.
Figure 3: Use open source languages, including Python, to build descriptive and
predictive models.
In addition, public REST APIs enable coders
Key Features (continued)
to add proven SAS Analytics to existing
applications.
• Residual diagnostics.
• Summary table includes model summary, iteration history, fit statistics and
This puts the power of SAS in the hands of
parameter estimates.
programmers who may not be familiar with
• Supports holdout data (training and validation) for model assessment.
SAS – but know it’s the best analytics
software available.
Programming access to analytical techniques
SAS Visual Statistics on SAS Viya provides • Programmers and data scientists can access SAS Viya (CAS server) from SAS Studio using
programming access to many tested and SAS procedures (PROCs) and other tasks.
proven SAS algorithms running in the • Programmers can execute CAS actions using PROC CAS or use different programming
distributed in-memory environment, environments like Python, R, Lua and Java.
including those for: • Users can also access SAS Viya (CAS server) from their own applications using public
• Data manipulation. REST APIs.
• Provides native integration to Python Pandas DataFrames. Python programmers can
• Variable binning.
upload DataFrames to CAS and fetch results from CAS as DataFrames to interact with
• Missing value imputation. other Python packages such as Pandas, matplotlib, Plotly, Bokeh, etc.
• Supervised and unsupervised variable • Includes SAS/STAT® procedures and SAS/GRAPH®.
selection.
• Principal component analysis (PCA):
• Clustering using k-means and k-modes. • Performs dimension reduction by computing principal components.
• Decision trees. • Provides the eigenvalue decomposition, NIPALS and ITERGS algorithms.
• Principal component analysis. • Outputs principal component scores across observations.
• Creates scree plots and pattern profile plots.
• Linear and logistic regression.
• Nonlinear regression. • Decision trees:
• Supports classification trees and regression trees.
• Generalized linear regression.
• Supports categorical and numerical features.
• Ordinary least squares. • Provides criteria for splitting nodes based on measures of impurity and
• Partial least squares. statistical tests.
• Quantile regression. • Provides the cost-complexity and reduced-error methods of pruning trees.
• Supports partitioning of data into training, validation and testing roles.
• Generalized additive models.
• Supports use of validation data for selecting the best subtree.
• Proportional hazard regression.
• Supports the use of test data for assessment of final tree model.
• Statistical process control. • Provides various methods of handling missing values, including surrogate rules.
• Descriptive statistics. • Creates tree diagrams.
• Model assessment. • Provides statistics for assessing model fit, including model-based (resubstitution)
statistics.
SAS Visual Statistics on SAS Viya also • Computes measures of variable importance.
includes SAS/STAT® procedures and • Outputs leaf assignments and predicted values for observations.
SAS/GRAPH®. • Clustering:
• Provides the k-means algorithm for clustering continuous (interval) variables.
This open programming environment • Provides the k-modes algorithm for clustering nominal variables.
provides flexibility for data scientists and • Provides various distance measures for similarity.
statisticians, based on their programming • Provides the aligned box criterion method for estimating the number of clusters.
skills and preferences, to easily access the • Outputs cluster membership and distance measures across observations.
power of SAS for data manipulation and
• Linear regression:
advanced analytics.
• Supports linear models with continuous and classification variables.
• Supports various parameterizations for classification effects.
• Supports any degree of interaction and nested effects.
• Supports polynomial and spline effects.
• Supports forward, backward, stepwise, least angle regression and lasso selection
methods.
Dynamic group-by processing Key Features (continued)
With SAS Visual Statistics, many users can
concurrently build numerous models and • Supports information criteria and validation methods for controlling model selection.
process results for each group or segment • Offers selection of individual levels of classification effects.
without having to sort or index data each • Preserves hierarchy among effects.
time. The grouping variables, or their prop- • Supports partitioning of data into training, validation and testing roles.
erties, can change from one action to the • Provides a variety of diagnostic statistics.
next, and groups are processed without • Generates SAS code for production scoring.
shuffling or reordering the data.
• Logistic regression:
• Supports binary and binomial responses.
This means more models can be quickly
• Supports various parameterizations for classification effects.
created for more segments or groups on the
• Supports any degree of interaction and nested effects.
fly without additional processing overhead.
• Supports polynomial and spline effects.
The result? Models that meet the unique
• Supports forward, backward, fast backward and lasso selection methods.
needs of individual segments or groups.
• Supports information criteria and validation methods for controlling model selection.
• Offers selection of individual levels of classification effects.
Model comparison and
• Preserves hierarchy among effects.
assessment • Supports partitioning of data into training, validation and testing roles.
After models have been created, they can • Provides variety of statistics for model assessment.
be easily compared and assessed using a • Provides variety of optimization methods for maximum likelihood estimation.
variety of statistical comparison summaries
• Generalized linear models:
such as lift charts, ROC charts, concordance
• Supports responses with variety of distributions, including binary, normal, Poisson
statistics and misclassification tables on one
and gamma.
or more models from either the visual or
• Supports various parameterizations for classification effects.
programming interface.
• Supports any degree of interaction and nested effects.
• Supports polynomial and spline effects.
And, from the visual interface, an interactive
• Supports forward, backward, fast backward, stepwise and group lasso selection
slider lets you manipulate cutoff thresholds
methods.
so you can easily and visually evaluate lift at
• Supports information criteria and validation methods for controlling model selection.
different percentiles. Combine model fitting
• Offers selection of individual levels of classification effects.
with model diagnostics to quickly see and
• Preserves hierarchy among effects.
understand impacts on performance.
Figure 4: Generalized additive models have been added to SAS Visual Statistics.
Build them using the visual interface or with code.
Model scoring Key Features (continued)
After you have determined which model
performs best, your champion model can • Supports partitioning of data into training, validation and testing roles.
be easily applied against new data. • Provides variety of statistics for model assessment.
• Provides a variety of optimization methods for maximum likelihood estimation.
Moving all of the data preparation tasks and
• Nonlinear regression models:
algorithms of a sophisticated model from a
• Fits nonlinear regression models with standard or general distributions.
development environment to an operational
• Computes analytical derivatives of user-provided expressions for more robust
system is usually one of the most difficult
parameter estimations.
aspects of predictive modeling and
• Evaluates user-provided expressions using the ESTIMATE and PREDICT
machine learning.
statements (procedure only).
• Requires a data table that contains the CMP item store if not using PROC NLMOD.
With SAS Visual Statistics, you can export
• Estimates parameters using the least squares method.
your models as SAS DATA step code and
• Estimates parameters using the maximum likelihood method.
easily apply them to new data. Putting your
predictive models into production produces • Quantile regression models:
the insights needed for making better deci- • Supports quantile regression for single or multiple quantile levels.
sions and taking optimal actions. • Supports multiple parameterizations for classification effects.
• Supports any degree of interactions (crossed effects) and nested effects.
Distributed, in-memory analytical • Supports hierarchical model selection strategy among effects.
processing with SAS® Viya® • Provides multiple effect-selection methods.
• Provides effect selection based on a variety of selection criteria.
SAS Visual Statistics runs on SAS Viya, a new
• Supports stopping and selection rules.
high-performance runtime engine that uses
in-memory analytical processing to provide • Predictive partial least squares models:
answers to a range of business questions in • Provides programming syntax with classification variables, continuous variables,
a single, scalable and governed environ- interactions and nestings.
ment. It takes advantage of distributed, • Provides effect-construction syntax for polynomial and spline effects.
parallel processing for blazingly fast speed, • Supports partitioning of data into training and testing roles.
dramatically reducing data exploration and • Provides test set validation to choose the number of extracted factors.
model development time. • Implements the following methods: principal component regression, reduced
rank regression and partial least squares regression.
SAS Viya delivers high availability and
scalable processing so IT can scale
computing capacity up and out to meet the
needs of more users who are dealing with
more data and increasingly complex analyt-
ical problems.
Figure 5: Decision trees illustrate the most likely outcomes and consequences.
Build them using the visual interface or with code.
You need analytical processing power you
Key Features (continued)
can count on. The fault-tolerant design of
SAS Viya automatically detects server failure, • Generalized additive models:
even in multiplatform processing environ- • Fit generalized additive models based on low-rank regression splines.
ments, and redistributes processing as • Estimates the regression parameters by using penalized likelihood estimation.
needed. It also manages several copies of • Estimates the smoothing parameters by using either the performance iteration
data on the processing cluster. If a node in method or the outer iteration method.
the cluster becomes unavailable or fails, the • Estimates the regression parameters by using maximum likelihood techniques.
required data is retrieved from another • Tests the total contribution of each spline term based on the Wald statistic.
block to quickly continue processing. • Provides model-building syntax that can include classification variables, contin-
uous variables, interactions and nestings.
These self-healing mechanisms ensure high • Enables you to construct a spline term by using multiple variables.
availability for uninterrupted processing
• Proportional hazard regression:
and automated recovery. Support for multi-
• Fit the Cox proportional hazards regression model to survival data and
tenancy allows multiple tenants to share
perform variable selection.
resources, even though each tenant is logi-
• Provides model-building syntax with classification variables, continuous
cally isolated.
variables, interactions and nestings.
• Provides effect-construction syntax for polynomial and spline effects.
Concurrent access to • Performs maximum partial likelihood estimation, stratified analysis and variable
data in memory selection.
SAS Viya enables you to load and persist • Partitions data into training, validation and testing roles.
data in memory on demand, and execute • Provides weighted analysis and grouped analysis.
multipass analytical computations. All data,
• Statistical process control:
tables and objects are held in memory as
• Perform Shewhart control chart analysis.
long as required, whether it’s for interactive
• Analyze multiple process variables to identify processes that are out of statistical
visual investigations or advanced analytical
control.
processing. Many users can collaborate to
• Adjust control limits to compensate for unequal subgroup sizes.
explore the same raw data and build
• Estimate control limits from the data, compute control limits from specified
models simultaneously.
values for population parameters (known standards) or read limits from an input
data table.
• Perform tests for special causes based on runs patterns (Western Electric rules).
• Estimate the process standard deviation using various methods (variable charts
only).
• Save chart statistics and control limits in output data tables.
Figure 7: You can call analytical procedures and actions in SAS Visual Statistics using R,
as well as other open source programming languages.
TO LEARN MORE » Key Features (continued)
To learn more about SAS Visual Statistics, • Contingency tables, including measures of associations.
download white papers, view screenshots • Histograms with options to control binning values, maximum value thresholds,
and see other related material, please visit outliers and more.
sas.com/visualstatistics. • Multidimensional summaries in a single pass of the data.
• Percentiles for one or more variables.
• Summary statistics such as number of observations, number of missing values,
sum of nonmissing values, mean, standard deviation, standard errors, corrected and
uncorrected sums of squares, min and max, and the coefficient of variation.
• Kernel density estimates using normal, tri-cube and quadratic kernel functions.
• Constructs one-way to n-way frequency and cross-tabulation tables.
Group-by processing
• Build models, compute and process results on the fly for each group or segment
without having to sort or index the data each time.
• Build segment-based models instantly (i.e., stratified modeling) from a decision tree or
clustering analysis.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.
in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their
respective companies. Copyright © 2018, SAS Institute Inc. All rights reserved. 108780_G80627.0718