Sunteți pe pagina 1din 36

Augmented Analytics Is the Future of Data and

Analytics
Published: 27 July 2017 ID: G00326012

Analyst(s): Rita Sallam, Cindi Howson, Carlie Idoine

Augmented analytics, an approach that automates insights using machine


learning and natural-language generation, marks the next wave of disruption
in the data and analytics market. Data and analytics leaders should plan to
adopt augmented analytics as platform capabilities mature.

Key Findings
■ Augmented analytics is a next-generation data and analytics paradigm that uses machine
learning to automate data preparation, insight discovery and insight sharing for a broad range of
business users, operational workers and citizen data scientists.
■ Augmented analytics will enable expert data scientists to focus on specialized problems and on
embedding enterprise-grade models into applications. Users will spend less time exploring data
and more time acting on the most relevant insights with less bias than is the case with manual
approaches.
■ Both small startups and large vendors now offer augmented analytics capabilities that could
disrupt business intelligence (BI) and analytics, data science, data integration and embedded
analytic application vendors. Data and analytics leaders must therefore review their
investments.
■ As augmented analytics tools and capabilities become more accessible, data and analytics
leaders will need to adopt new approaches. They will also have to develop a strategy to address
the impact of augmented analytics on currently supported data and analytics capabilities, roles,
responsibilities and skills, and increase their investments in data literacy.

Recommendations
As a data and analytics leader planning to use augmented analytics for modernization, you should:

■ Launch a pilot to assess the viability of augmented analytics. Address a shortlist of business
problems that traditionally require manual, time-intensive analysis or are prone to bias.
■ Build trust in machine-assisted models by using expert data scientists to run them in parallel
with existing models to validate their accuracy, while fostering collaboration between expert
data scientists and citizen data scientists.
■ Monitor the augmented analytics capabilities and roadmaps of established BI and analytics,
data science and machine-learning platform vendors, startups and open-source products.
Focus on the requirements for upfront setup and data preparation, on the types of data that can
be analyzed, on the types and range of algorithms supported, and on the accuracy of findings.

Table of Contents

Strategic Planning Assumptions............................................................................................................. 3


Analysis.................................................................................................................................................. 3
Definition.......................................................................................................................................... 4
Description....................................................................................................................................... 5
Augmented Analytics Marks the Next Wave of Analytics Disruption............................................ 5
Preparing Data......................................................................................................................... 11
Finding Patterns in Data............................................................................................................14
Sharing and Operationalizing Findings From Data..................................................................... 20
Adoption Rate................................................................................................................................ 24
Risks.............................................................................................................................................. 26
Evaluation Factors.......................................................................................................................... 28
Recommendations......................................................................................................................... 30
Representative Vendors..................................................................................................................31
Gartner Recommended Reading.......................................................................................................... 35

List of Tables

Table 1. Examples of Augmented Data Discovery Vendors and Their Capabilities................................. 33

List of Figures

Figure 1. Disruption Points in the Analytics and Business Intelligence Market..........................................7


Figure 2. What Drives Student Earnings?................................................................................................9
Figure 3. Current Data Analytics Workflow............................................................................................ 10
Figure 4. Emerging Augmented Analytics Workflow.............................................................................. 11
Figure 5. Use of Machine Learning to Harmonize Complex and Difficult Datasets................................. 13
Figure 6. Smart Self-Service Data Preparation...................................................................................... 14

Page 2 of 36 Gartner, Inc. | G00326012


Figure 7. How Augmented Data Discovery and Augmented Data Science Platforms Differ................... 16
Figure 8. Automated Machine Learning Uncovers Loan Default Drivers................................................ 18
Figure 9. Smart Visualization.................................................................................................................19
Figure 10. Smart Labeling Automatically Focuses Users on Outliers (1).................................................20
Figure 11. Smart Labeling Automatically Focuses Users on Outliers (2).................................................20
Figure 12. Dynamic Narration of the Load Time Analysis...................................................................... 22
Figure 13. Adoption Across the Analytics Spectrum............................................................................. 24
Figure 14. Augmented Data Discovery Embedded in a Sales Application..............................................25

Strategic Planning Assumptions


By 2020, due largely to the automation of data science tasks, citizen data scientists will surpass
data scientists in terms of the amount of advanced analysis they produce and the value derived
from it.

By 2020, augmented analytics — a paradigm that includes natural-language query and narration,
augmented data preparation, automated advanced analytics and visual-based data discovery
capabilities — will be a dominant driver of new purchases of business intelligence, analytics and
data science and machine learning platforms and of embedded analytics.

By 2020, the number of users of modern business intelligence and analytics platforms that are
differentiated by augmented data discovery capabilities will grow at twice the rate — and deliver
twice the business value — of those that are not.

By 2020, natural-language generation and artificial intelligence will be a standard feature of 90% of
modern BI platforms.

By 2020, 50% of analytical queries will be generated via search, natural-language processing or
voice, or will be automatically generated.

By 2020, organizations that offer users access to a curated catalog of internal and external data will
derive twice as much business value from analytics investments as those that do not.

Through 2020, the number of citizen data scientists will grow five times faster than the number of
expert data scientists.

Analysis
Analytics, the core of digital business, is at a critical inflection point. Across the analytics stack,
tools have become easier to use and more agile, enabling greater access and self-service. And yet
organizations' processes for preparing data for analysis, analyzing data, building advanced analytics
models, interpreting results and telling stories with data remain largely manual and prone to bias.

Gartner, Inc. | G00326012 Page 3 of 36


Data volumes are increasing and becoming more complex to optimize cross-functional digital
business decisions. As a result, the number of variables driving an outcome or best action is
growing to the point where exploring every possible pattern and determining the most relevant and
actionable findings is either impossible or impractical using current manual approaches, which
leaves business people and analysts increasingly prone to confirmation bias. They often resort to
exploring their own biased hypotheses, miss key findings, and draw incorrect or incomplete
conclusions, which adversely affects decisions and outcomes. Furthermore, data science modeling,
which is also largely manual, requires specialist skills that are in short supply at time when insights
from advanced analytics must be pervasive to fuel digital business transformation.

There is hope, however. A new paradigm — augmented analytics — has emerged. Central to this
development is the use of machine-learning automation to augment human intelligence and
contextual awareness across the entire data and analytics workflow — from data to insight, to
action, to impact the entire data management, BI and analytics, and data science and machine
learning analytic workflow. Augmented analytics will be crucial for delivering unbiased decisions and
impartial contextual awareness. It will transform how users interact with data, and how they
consume and act on insights.

We are already seeing augmented analytics features make their way into modern BI and analytics
and data science and machine learning platforms. This is happening largely in response to
disruptive innovations from startups such as BeyondCore (acquired by Salesforce in 2016 and
rebranded Salesforce Einstein Discovery, a part of the Salesforce Einstein Analytics portfolio) and
DataRobot, as well as from traditional BI vendors like IBM (with IBM Watson Analytics). The same is
happening to self-service data preparation platforms, where machine-learning augmented data
preparation vendors such as Paxata, Trifacta and UniFi are driving innovation.

Definition
Augmented analytics includes:

■ Augmented data preparation, which uses machine-learning automation to augment data


profiling and data quality, harmonization, modeling, manipulation, enrichment, metadata
development and cataloging.
■ Augmented data discovery (formerly "smart data discovery"), which enables business
people and citizen data scientists to use machine learning to automatically find, visualize and
narrate relevant findings (such as correlations, exceptions, clusters, links and predictions)
without having to build models or write algorithms. Users explore data via visualizations, search
and natural-language query technologies, supported by natural-language-generated narration
for interpretation of results. It can be used by citizen data scientists to analyze data without
preconceived notions for early prototyping and hypothesis development with less manual
experimentation. Consequently, highly skilled data scientists have more time to focus on
building and operationalizing the most relevant models.
■ Augmented data science and machine learning, which automates key aspects of advanced
analytic modeling, such as feature selection. This reduces the requirement for specialized skills
to generate, operationalize and manage an advanced analytics model.

Page 4 of 36 Gartner, Inc. | G00326012


Many autogenerated and human-augmented machine-learning models created through augmented
analytics will also be embedded in enterprise applications — for example, those of the HR, finance,
sales, marketing, customer service, procurement and asset management departments — to
optimize the decisions and actions of all employees, not just those of analysts and data scientists.

Augmented analytics will also be a key feature of conversational analytics. This is an emerging
paradigm that enables business people to generate queries, explore data, and receive and act on
insights in natural language (voice or text) via mobile devices and personal assistants. For example,
instead of accessing a daily dashboard, a decision maker with access to Amazon Alexa might say,
"Alexa, analyze my sales results for the past three months!" or "Alexa, what are the top three things
I can do to improve my close rate today?"

Conversational analytics applications are not yet available "out of the box," and early integrations
are immature. Analytics vendors are using APIs and building integrations with the help of partners to
make these applications easier to deploy. We expect out-of-the-box and enterprise-ready instances
to appear over the next two to five years (see "Hype Cycle for Business Intelligence and Analytics,
2017").

Description
This document explores augmented analytics capabilities and their ramifications for organizational
and market disruption. It provides guidance to data and analytics leaders planning to adopt these
capabilities in order to modernize and to drive digital transformation and innovation.

Augmented Analytics Marks the Next Wave of Analytics Disruption


Over the past 10 years, visual-based data discovery tools have disrupted the traditional BI market.
These easy-to-use tools enable users to assemble data rapidly, explore hypotheses visually, and
find new insights in data. They have transformed how business users explore data, in comparison
with the IT-centric, semantic-layer-based approach of traditional BI platforms. Even so, many
activities associated with preparing data, finding patterns in large, complex combinations of data,
and sharing insights with others remain highly manual and prone to bias.

Although visual-based data discovery tools are easy to use, because users analyze data manually
by creating queries to investigate hypotheses, it is not possible for them to explore every possible
pattern and combination, let alone determine whether their findings are the most relevant,
significant and actionable. Relying on business users to find patterns manually may result in them
exploring their own biased hypotheses, missing key findings, and drawing their own incorrect or
incomplete conclusions, which may adversely affect decisions and outcomes.

That "a picture is worth a thousand words" has long been assumed in the field of data and
analytics. And rightfully so, as visualizations are a powerful and consumable way to find and
communicate patterns in data (more so than tables or lists). However, they do not always highlight
statistically significant findings. That requires user interpretation or further statistical analysis to
determine whether findings are relevant, significant and actionable. Moreover, finding insights from

Gartner, Inc. | G00326012 Page 5 of 36


advanced analytics — a key aspirational goal for most companies as they undertake the transition
to digital business — requires expert data science skills, which are extremely scarce.

Whereas manual interactive exploration using visualizations is the defining feature of visual-based
data discovery platforms, machine-learning automation of the insight discovery and exploration
process is a defining feature of augmented analytics in next-generation data and analytics platforms
(see Figure 1). It enables business users and citizen data scientists to automatically find, visualize
and narrate relevant findings, such as correlations, exceptions, clusters and predictions, without
having to build models or write algorithms. Users explore data via visualizations, search and natural-
language query technologies, supported by text- and voice-based natural-language-generated
narration and interpretation of results or the most statistically important findings in the user's
context. We are beginning to see these capabilities emerge in some existing data integration, BI and
analytics, and data science and machine-learning platforms, largely in response to, and as
imitations of, the innovations of disruptive startups (see the Representative Vendors section below).

Augmented analytics can reduce time-consuming exploration and the identification of false or less
relevant insights. Applying a range of algorithms and ensemble learning to data in parallel, and
explaining actionable findings to users, reduces the risk of missing important insights in the data, in
comparison to manual exploration. It also optimizes resulting decisions and actions. This paradigm
shift requires investment in data literacy throughout organizations, as insights are distributed to all
employees.

Page 6 of 36 Gartner, Inc. | G00326012


Figure 1. Disruption Points in the Analytics and Business Intelligence Market

Source: Gartner (July 2017)

Gartner, Inc. | G00326012 Page 7 of 36


Case study: How Salesforce Einstein Discovery showed that attendance at a top university is not
the main predictor of high earning power:

■ At Gartner's 2016 "BI Bake-Off" at the Data and Analytics Summit in Dallas, Texas, we gave
representatives of several modern BI and analytics platform vendors university and college
student demographic data, payroll data and a demo script. In addition to showcasing functional
differences across critical capabilities, we asked them to combine the datasets and derive
insights about which university graduates would have the most earning power 10 years after
graduation. Given the number of variables and combinations available to explore manually, the
representatives did what expert analysts typically do. They explored their own hypotheses first.
In this case, it was the "usual suspects" of leading universities — "because going to Harvard
means you out-earn those going to state universities, right?" While there was a relationship in
the data between attendance at top universities and earning power, all missed the most
important driver, one that is not intuitive. The biggest indicator of students' future earning power
in the data was not their university. It was their parents' income, and secondarily whether they
completed their degrees. We cannot say precisely why this is. Is it due to work and study habits
learned at home from high-performing parents? Is it because wealthier parents can pay for their
children to finish college, even if that means it takes five or six years? We can, however, say that
parental income was not a driver that the respondents knew to look for.
■ By contrast, although we gave all the vendors in the vendor exhibit hall the same dataset, only
Salesforce Einstein Discovery uncovered the main driver after just a few seconds of ingesting
the data, automatically analyzing it and generating a narrative about the results (see Figure 2).

How often do business people draw suboptimal conclusions from their data? How often do they
explore what they think are the key drivers or attributes of an outcome variable and stop when they
confirm their hypotheses? How many times might there be other more important factors affecting
the outcome variable that they have not thought to explore? This is the root of the challenge with
the current paradigm. The desire to overcome it will drive the transformational nature of the next
wave of market disruption, namely automation of all aspects of the analytics workflow in order to
improve the accuracy and timeliness of advanced analysis (in light of the human context), remove
bias, and elevate the skills of more users to citizen data scientists.

Since automation will enable expert data scientists to focus on specialized problems and on
operationalizing and embedding enterprise-grade models into applications, only the most accurate
and significant insights will be acted on by users. Expanded use of automation should also translate
into fewer errors from the bias inherent in manual exploration.

Page 8 of 36 Gartner, Inc. | G00326012


Figure 2. What Drives Student Earnings?

Screen shot from Salesforce Einstein Discovery

Source: Salesforce

Augmented Analytics Will Transform the Entire Analytics Workflow and How All Employees
Access and Act on Insights

Augmented analytics capabilities will rapidly achieve mainstream adoption as a key feature of self-
service data preparation, modern BI and analytics and data science platforms. More importantly,
automated insights will also be embedded in enterprise applications and conversational analytics —
and thereby reach beyond citizen data scientists to enable operational workers to assist in business
transformation.

Currently in analytics, content authors, such as analysts, citizen data scientists and expert data
scientists, perform the following data-to-insight-to-action activities iteratively to find meaningful
insights:

■ Preparing the data


■ Finding patterns in the data and building models
■ Sharing and operationalizing findings from the data

The augmented analytics paradigm accelerates the time it takes to get accurate insights for
business users and augments their analysis by using machine-learning algorithms to automate the
three main analytic processes used in current visual-based data discovery platforms (see Figures 3
and 4). In both cases, users often iterate between preparing data and finding patterns in data.

Gartner, Inc. | G00326012 Page 9 of 36


Figure 3. Current Data Analytics Workflow

Source: Gartner (July 2017)

Page 10 of 36 Gartner, Inc. | G00326012


Figure 4. Emerging Augmented Analytics Workflow

Source: Gartner (July 2017)

Preparing Data
Preparing data for analysis is the most time-consuming task facing data discovery users. Most data
modern BI and analytics platforms offer basic data preparation capabilities for joining, data
manipulation and transformation. Data science platforms offer some pipelining capabilities, but they
too are often incomplete and difficult to use. This leaves much of the data profiling, quality,
modeling, manipulation, enrichment, metadata development and harmonization work to the
business user or data scientist (if self-service), or to IT staff (if deployments are centralized). Either
way, this creates a bottleneck for business users and data scientists. It also creates business risk
due to a lack of governance as organizations give more users the ability to build analytic content.
Augmented data preparation, a component of augmented analytics, uses algorithms to find
relationships in data, and to profile and recommend the best approaches for cleaning, reconciling,
enriching, manipulating and modeling data with capabilities to capture metadata and lineage for
reuse and governance.

Most current BI and analytics and data science platform vendors are making self-service data
preparation — with varying degrees of machine-learning automation — an investment priority, due
to the major impact it can have on improving time to insight and governance. The stand-alone self-
service data preparation market is crowded with venture-funded startups and established data
integration players. Many of these are using machine learning to streamline and accelerate the data

Gartner, Inc. | G00326012 Page 11 of 36


preparation process and make curated and described data accessible to all analytics content
authors in data catalogs via easy-to-use interfaces such as search (see "Market Guide for Self-
Service Data Preparation," "Embrace Self-Service Data Preparation Tools for Agility but Govern to
Avoid Data Chaos," "Rebalance Your Integration Effort With a Mix of Human and Artificial
Intelligence" and "Establish a Framework for Analytics Governance"). Self-service data preparation
enables users to combine more and more data sources from both trusted sources and external or
ad hoc sources. Moreover, it provides a more agile way to access models and reuse data than the
semantic layer approaches of traditional BI platforms. It supports the growing requirement to deploy
modern BI and analytics at scale with governance (see "How to Be Agile With Business Analytics").

Case Study 1: How a leading global manufacturer of confectionery, pet food, and other food
products reduced requirements for data preparation from five people and five weeks to one person
and one hour, and enabled one-click updates:

■ For a leading confectionary, pet food and other food products manufacturer, it used to take five
people four to five weeks to access, clean, blend, harmonize, model and reconcile its retail
points of sale, Nielsen data, pricing, and brand/category data. The company wanted to analyze
the seasonal and nonseasonal granular category performance of products across all its lines
and brands. It needed automated data preparation and blending, so that business decision
makers could see insights as soon as data sources updated. It hired ClearStory Data to fully
automate this process, so that it now takes a single person no more than an hour, with one-click
updates.

Figure 5 shows how, for augmented self-service data preparation, ClearStory Data runs algorithms
to show a matching score for harmonizing data sources. The algorithms also generate detailed
information based on every data value and every unique categorical value found in custom
categories, with semantic inferences based on all content details (not just the names of column
labels). Many vendors also profile data for quality issues and provide the user with
recommendations (not shown in the screen shot) for how to improve data quality.

Page 12 of 36 Gartner, Inc. | G00326012


Figure 5. Use of Machine Learning to Harmonize Complex and Difficult Datasets

Source: ClearStory Data

Case Study 2: How a multinational banking and financial services company reduced the length of
regulatory compliance and anti-money-laundering processes by 95% (from 21 days to one day) with
the help of augmented data preparation:

■ To protect its brand reputation and avoid regulatory risks and penalties, the bank monitors
money-laundering activities and tracks transactions made by politically exposed persons
(PEPs). The established processes were manual, using Microsoft Excel, and script-based. With
a staff of over 70 analysts, and a backlog of transactions across 43 countries, the bank needed
to automate these processes. Using Paxata, it replaced the processes to harmonize data from
internal transactions across the bank's network, ATMs, tellers, deposits, withdrawals, accounts
and credit cards, across 43 countries, along with data originating from external sources, such
as money transfers and PEP data sources. Augmented data preparation reduced the processes
from 21 days to one day. It also improved data quality and PEP monitoring (from 25% matching
to 77%). Because of the ease of use enabled by machine-learning automation in the user
workflow, these data preparation and cleaning processes are now carried out by business staff,
rather than IT staff.

Gartner, Inc. | G00326012 Page 13 of 36


Figure 6 shows how Paxata recommends joins between two datasets. In this example, there is a
92% fuzzy match for E-mail and 91% match for Full Name. The human (citizen data scientist) has
full visibility and interactivity with all the data values (not just a sample), to decide on which of these
machine learning join suggestions they should choose.

Figure 6. Smart Self-Service Data Preparation

Source: Paxata

Finding Patterns in Data


Current visual-based data discovery approaches used in modern BI and analytics platforms enable
business people to visually explore relationships and patterns in data using interactive techniques
such as filtering, sorting, pivoting, linking, grouping and user-defined calculation. These approaches
have been very effective in helping users find important insights to drive competitiveness and
efficiencies, in comparison to traditional BI. However, when the data is complex, large and highly
dimensional (even just 10 or more columns), users either focus their time on exploring their own
hypotheses in a subset of the data, or must manually explore all possible combinations and
permeations to ensure a complete and accurate result. This can be very time-consuming. In many
cases, therefore, users default to the former approach for expediency; or they may not even know
all the possible permeations to explore. As such, they are likely to miss important insights and
relationships.

With augmented data discovery, instead of an analyst manually testing all the combinations of data,
algorithms for detecting correlations, segments, clusters, outliers and relationships are automatically
applied to the data. Only the most statistically significant and relevant result are presented to the
user, in the form of smart visualizations that are optimized for the user's interpretation. Applying a

Page 14 of 36 Gartner, Inc. | G00326012


range of algorithms to the data in parallel reduces the risk of missing important insights in the data.
Most, augmented data discovery platforms make the underlying models open for inspection, testing
and validation by specialist data scientists. This is important for building trust and confirming the
accuracy of automated insights.

Establishing processes that encourage, or even require,


collaboration between citizen data scientists and expert data
scientists is important to broaden adoption of governed
augmented analytics deployments.

Machine-learning automation is also making its way into data science platforms to streamline the
feature-engineering and model generation process. Whereas the user for augmented data discovery
is a businessperson or citizen data scientist, and the output is an insight (both visual and narrated in
natural language), the output of smart data science is a model and the user is an expert data
scientist. The intent is to make the specialist data scientist more productive and the enterprise-
grade models they build less prone to bias.

Given the scarcity of expert data scientists in the market and the ever-increasing demand for their
skills, even higher productivity will be expected and more analytics work will need to be performed
by a new class of citizen data scientist.

Differences Between Augmented Data Discovery and Augmented Data Science Platforms

As shown in Figure 7, augmented data discovery platforms deliver insights to citizen data scientists.
A model is generated and can be embedded in an application, after further vetting by a specialist
data scientist. But the goal or deliverable is insight. Natural-language query (NLQ) and natural-
language generation (NLG) are important user experience features.

Augmented data science platforms, by contrast, automatically generate a model for either a citizen
data scientist or a specialist data scientist, or for embedding. These platforms assist in model
building, life cycle management and governance.

The differences between the two types of platform are subtle and narrowing to the point where
greater convergence overtime is likely.

Gartner, Inc. | G00326012 Page 15 of 36


Figure 7. How Augmented Data Discovery and Augmented Data Science Platforms Differ

Source: Gartner (July 2017)

Augmented data discovery and augmented data science and machine learning both reduce time-
consuming exploration and the identification of false and less relevant insights. They require a
collaborative process that focuses a business analyst on what is important and provides a data
scientist with a starting point or early prototype to explore and operationalize models for only
relevant patterns. Both the analyst and the data scientist become more productive by reducing the
experimentation and initial exploration phase. This ultimately results in faster times to insight and
action.

Case Study 1: How a leading U.S. healthcare provider discovered a key driver of length of stay and
unexpected drivers of transportation costs, and improved preventative care:

■ In many state healthcare markets, a leading U.S. healthcare provider pays a "case rate" for
hospital stays on behalf of patient members. This is a "one size fits all" rate that is designed to
be fair to both the insurance company (the healthcare provider) and the hospital. The case rate
eliminates a financial incentive for insurance companies to have patients to go home sooner
than they should or for hospitals to keep them in the hospital longer than they should. The case
rates do contain some provisions for additional payments for outlier cases that have unforeseen
extra days. Those payments are designed to be a compromise and are not really desirable for

Page 16 of 36 Gartner, Inc. | G00326012


either hospitals or insurance companies. By using Salesforce Einstein Discovery, the healthcare
provider discovered that, for the same procedures, the day of the week on which the member
was admitted to some hospitals made a difference to the overall hospital care cost, despite the
one-size-fits-all case rate, due to the occasional triggering of outlier payments. This finding
revealed that some patients were getting suboptimal care over the weekend, when many
specialist physicians were unavailable. Patients admitted on a Friday were not getting specialist
care over their first weekend in hospital, which made their stay longer than necessary and
increased costs for all parties. The healthcare provider communicated this finding to the
hospitals, which were then able to schedule these admissions on Monday, instead of Thursday
or Friday, in rural locations where specialists are not available on weekends.
■ In another example, unexpected insights led the same healthcare provider to change its
contracts with transportation companies that drive patients to hospitals and treatment
providers. Higher costs for more serious conditions are expected, but the healthcare provider
found that costs were unexpectedly high for patients under 12 years. Further analysis revealed
that children who are transported for treatment often have both parents traveling with them.
Even though the parents did not require additional vehicles or resources, by contract, the
transportation company could charge the healthcare provider more when more than one
guardian or parent accompanied the child. This unexpected finding prompted the healthcare
provider to restructure some transportation contracts that were written with adults in mind, to
allow for two additional passengers for children at no cost to the healthcare provider or the
family. The money saved could then be used to improve care elsewhere in the system. The
result was also good for patient outcomes, as full parental support is desirable for sick children,
for the parents, for the physicians, and for the healthcare provider, as there were really no
additional costs.
■ The healthcare provider was also able to optimize the total cost of care and outcomes through
unexpected insights derived from its use of Salesforce Einstein Discovery. By analyzing the total
cost of care across patient demographics, geographies, treatment modalities and providers, it
found that some providers are more efficient than others at handling certain kinds of population.
Enabling members that fit a certain profile to get their primary preventative care in the best
setting for them made patients healthier and kept them out of hospital, which also reduced
costs.

Case Study 2: How Jacobi Medical Center discovered a way to identify a perforated appendix in
children using ultrasound, instead of potentially more harmful computed tomography (CT) scans:

■ For a child with acute appendicitis, the course of treatment depends largely on one question:
Has the appendix already burst? If it has, conservative treatment may be considered. If it has
not, immediate surgery is called for, with the goal of removing the appendix before it bursts. It is
therefore critically important that doctors have a quick and reliable way to determine whether a
young patient's appendix has already burst. To determine the condition of the appendix, CT
scans had been used, due to their level of accuracy in differentiating between perforated and
nonperforated appendicitis in young patients. However, CT scans involve a potentially harmful
amount of radiation for young patients. DataRobot Nutonian's Eureqa discovered parameters
strongly correlated with perforation and revealed the importance of two categories within the

Gartner, Inc. | G00326012 Page 17 of 36


age variable. This led to a formula whose application gives ultrasound a level of diagnostic
accuracy equal to CT scans in pediatric patients. As a result, doctors can now diagnose
perforated appendicitis via ultrasound alone in more cases, thereby reducing the number of
cases requiring exposure to the potentially harmful radiation of CT scans. (See further
"Ultrasound for Differentiation Between Perforated and Nonperforated Appendicitis in Pediatric
Patients," American Journal of Roentgenology, May 2013.)

Case Study 3: How a bank built credit scores for "thin file" consumers in less than an hour (as
opposed to weeks):

■ For the millions of people worldwide who lack a credit history, traditional credit-scoring
techniques are inapplicable. But by employing automated machine-learning models that use the
DataRobot augmented data science platform, a bank was able to microsegment customers into
granular buckets, testing 10 models per hour (as opposed to one every two to four weeks using
traditional techniques). Figure 8 shows the models on the right, which were autogenerated
based on the features on the left, to predict the likelihood of loan default.
Figure 8. Automated Machine Learning Uncovers Loan Default Drivers

Source: DataRobot

The Difference Between Augmented Data Discovery and Smart Visualization

Augmented data discovery should be clearly distinguished from smart visualization. The latter is a
feature of pattern detection that automatically presents data in the best visualization type, order,

Page 18 of 36 Gartner, Inc. | G00326012


color, label generation or level of detail, to optimize insight for the user, without additional
manipulation (filtering, sorting, label positioning, and so on).

Figure 9 shows how, when a user drags and drops a number of measures onto the visualization
pallet, a SAS smart visualization system automatically generates the best chart for the data (in this
case a correlation matrix in a best-practice color scheme), without the user having to write
algorithms.

Figures 10 and 11 show how, with Qlik, labels are exposed to show outliers and an ideal size and
format for insight consumption as the user drills and filters (without additional manual formatting).

Figure 9. Smart Visualization

Screen shot from SAS Visual Analytics 8.1 on Viya

Source: SAS

Gartner, Inc. | G00326012 Page 19 of 36


Figure 10. Smart Labeling Automatically Focuses Users on Outliers (1)

Source: Qlik

Figure 11. Smart Labeling Automatically Focuses Users on Outliers (2)

Source: Qlik

Sharing and Operationalizing Findings From Data


Modern BI and analytics platforms have made significant advances by visualizing data in interactive
dashboards or storyboards and offering collaboration capabilities to assist with the sharing and

Page 20 of 36 Gartner, Inc. | G00326012


socializing of findings. However, visualizations often obscure what is truly significant in the data, and
many users lack the ability to fully interpret statistically significant visual-based insights.

With the addition of NLG, augmented data discovery platforms automatically present a written or
spoken context-based narrative of findings in the data that, alongside the visualization, inform the
user about what is most important for them to act on in the data.

Case Study: How a healthcare analytics vendor improved patient outcomes through narration of
insights:

■ A healthcare analytics vendor developed a proprietary platform on top of Qlik Sense to help
payers and providers visualize data, so that they could understand their market's population
cohort, track progress, and assess performance. However, users without advanced analytical
skills found it difficult to quickly understand and interpret the visualizations. The vendor
integrated Narratives for Qlik into its analytics platform, customizing language in accordance
with its particular domain. With Narratives for Qlik customized with domain-specific language,
the healthcare analytics vendor integrated the extension into its analytics platform. With NLG
coupled with interactive visualizations, payers and providers can now immediately understand
whether reimbursement rates correlate with high-quality outcomes, while communicating
shared savings and trends in the reimbursement rate.

Figure 12 shows an example of how Narratives for Microsoft Power BI identifies and explains
insights about factors that influence transit time, trends that impact freight costs, and metrics on
load times and capacity utilization for a logistics company.

Gartner, Inc. | G00326012 Page 21 of 36


Figure 12. Dynamic Narration of the Load Time Analysis

Source: Narrative Science and Microsoft

As organizations transform into digital businesses, analytics becomes a critical enabler. Expanding
access to insights from analytics to all workers will be key to driving transformative business
impact. However, access to analytics content from BI and analytics and data science platforms has
mostly been limited to power users, business analytics users and specialist data scientists with
varying degrees of analytical and technical skills. Gartner surveys show that only around 32% of
employees have access to BI and analytics tools (see "Survey Analysis: Why BI and Analytics
Adoption Remains Low and How to Expand Its Reach").

Increasingly, automated models are being embedded in enterprise applications for sales, marketing,
HR and finance teams.

Conversational analytics — the combination of NLQ, augmented data discovery, natural-language


narration and chatbots — will be enabled by personal digital assistants in mobile devices, such as
Apple Siri and Microsoft Cortana Analytics, and devices such as Amazon Alexa and Google Home.

Conversational analytics has the potential to address analytics adoption challenges by enabling any
employee to interact with data using natural language when mobile or outside a dashboard in order
to gain the most relevant, optimized and actionable insights for their role and context. For example,
instead of logging into a dashboard, any user — from C-level to analysts and operational workers —
can interact with personal digital (analytics) assistants (such as Amazon Alexa, Cortana Analytics
and Google Home) or their mobile phone (via voice) to ask for an analysis that is relevant to them.
Sales managers might ask for an analysis of sales or sales pipeline, based on their role. They will be

Page 22 of 36 Gartner, Inc. | G00326012


served an explanation or narrative of statically important drivers of change, and might be sent
visualizations (on a device) to show important trends, patterns or outliers, based on their role.
Conversational analytics will also be embedded in the workflow of applications that every employee
uses.

Conversational analytics applications are not available "out of the box" today, and early integrations
are immature. Analytics vendors are using APIs and building integrations through partnerships to
make them easier to deploy. We expect out-of-the-box and enterprise-ready applications to appear
over the next two to five years.

Recommendations

Data and analytics leaders should:

■ Evaluate the extent of machine-learning automation features, and their role in the data
preparation and cataloging process, when evaluating a self-service data preparation and data-
cataloging vendor.
■ Consider augmented data discovery as a complement to existing visual-based exploration
capabilities if you need to deliver more advanced insights to a broader range of users without
expanding your use of data scientists; or if analysts are exploring highly dimensional data that is
very time-consuming using current data discovery approaches.
■ Assess and plan your organization's readiness for augmented data discovery in terms of
alignment with business outcomes, current challenges with existing data discovery approaches,
and skills. Like visual-based data discovery self-service users of modern BI and analytics
platforms, citizen data scientists will need ongoing training and support to hone their skills.
■ Identify where automating algorithms to detect patterns in data could reduce the exploration
phase of insight generation and model building and improve highly skilled data scientists'
productivity. Recognize, however, that you still need specialized data scientists to validate the
model, the findings and their application.
■ Adapt current deployment models in order to upskill citizen data scientists through mainstream
adoption of augmented analytics. Anticipate the need for analytics governance and
collaboration between data engineers, analysts and data scientists.
■ Run augmented data discovery initiatives in parallel with existing analytics and decision
processes to prove their value and build trust in augmented data discovery. Expert data
scientists are often wary of "black box" approaches. They are also likely to be wary of someone
less skilled doing what has historically been their job.
■ Encourage citizen data scientists to work collaboratively and iteratively with expert data
scientists — both internal, if available, and external.
■ Invest in data literacy training to ensure users of augmented analytics have the skills to interpret
and act on insights.

Gartner, Inc. | G00326012 Page 23 of 36


Adoption Rate
Organizations' leaders want their organizations to be data-driven and recognize the need to adapt
them to become digital businesses. And yet, for the most part, organizations are simply amassing
more data, rather than transforming data into actionable insights.

Judging from Gartner ITScore assessments shown in Figure 13, only 34% of organizations agree
that they are able to undertake diagnostic analytics or investigate why KPIs are performing in a
certain way ("Why is one product selling better than another?", "Why are expenses higher this
month?", "Why did this patient respond better to a particular treatment?"). An even smaller
percentage can fully undertake predictive and prescriptive analytics, although 72% and 3% are
partially or minimally doing these respective tasks. (Note that ITScore assessments may be skewed
in favor of lower-maturity organizations as they are often used to diagnose challenges with evolving
data and analytic maturity.)

Figure 13. Adoption Across the Analytics Spectrum

Note: ITScore assessments may be skewed in favor of lower-maturity organizations as they are often used to diagnose challenges with
evolving data and analytic maturity.

Source: Gartner (July 2017)

Augmented data discovery has the potential to shift organizations' analytic maturity, as the
performance of root-cause analysis, predictive analysis and prescriptive analysis will no longer rely
exclusively on data scientists. Instead, existing information analysts will evolve into citizen data
scientists able to spend less time on data preparation and basic descriptive analysis, and more on
advanced analysis aided by smarter software. Furthermore, augmented data discovery capabilities
will be embedded in front-line applications to optimize the actions of operational workers.

Page 24 of 36 Gartner, Inc. | G00326012


Workforce analytics, supply chain analytics and CRM analytics will be the largest functional domains
to benefit from machine learning (see "Market Opportunity Map: Analytics and Business
Intelligence, Worldwide").

Salesforce is embedding its Einstein Discovery capability within its sales, service and marketing
applications. Figure 14 shows Salesforce's "sales pipeline waterfall" dashboard with Einstein
Discovery embedded in the right-hand panel. Einstein Discovery makes recommendations about
how to increase a sales pipeline by taking various actions.

Figure 14. Augmented Data Discovery Embedded in a Sales Application

Source: Salesforce

The enablement of citizen data scientists through mainstream adoption of augmented data
discovery will require BI and analytic leaders to further emphasize the need for analytics governance
and collaboration between analysts and data scientists.

Existing BI organizational models will need to evolve in order to support adoption of augmented
data discovery capabilities and a growing footprint of citizen data scientists embedded within
business units. The rise of self-service visual-based data discovery stimulated the first wave of
transition from centrally provisioned traditional BI to decentralized data discovery (see "Organizing
Your Teams for Modern Data and Analytics Deployment"). However, the emergence of augmented
data discovery represents an entirely new level of business user autonomy, which could not only
yield sizable returns but, if left unchecked, could also have adverse results.

Gartner, Inc. | G00326012 Page 25 of 36


BI and analytics leaders must develop guidelines outlining the "rules of the road" with respect to
where primary responsibility lies for accessing, preparing, provisioning and validating data accessed
by augmented data discovery tools. Similar rules must be outlined to govern the use of analytic
content and insights created as outputs of augmented data discovery and data science and
machine learning tools to ensure the accuracy and validity of findings and recommendations.
Enabling citizen data scientists within an organization to use augmented data preparation,
augmented data discovery and augmented data science and machine learning tools will promote
widespread use of higher-value analytics within business processes. However, inputs and outputs
need to be validated, which requires collaboration between IT staff, business users and data
science teams.

Recommendations

Data and analytics leaders should:

■ Educate business leaders and decision makers about the potential transformational impact that
augmented analytics can have, if used by a wide audience. Stress the need for responsible use
and governance to capitalize on the analytics produced and avoid potential unintended
consequences.
■ Develop guidelines for appropriate use of augmented analytics tools and capabilities, with an
emphasis on people and process.

Risks
The emergence of visual-based data discovery has democratized analytics by enabling a broad
range of less technical users to prepare and analyze data using easy-to-use, visually interactive, yet
much more sophisticated tools. As augmented analytics tools and capabilities become more
accessible, BI leaders must understand the impact of the new technologies, plan for adoption of
new approaches to BI and analytics, invest in the necessary data literacy for users to fully capitalize
on insights, and develop a strategy to address the impact on currently supported BI and data
discovery capabilities.

To date, data discovery tools have required a significant amount of manual analysis and human
intervention and interpretation. This approach is not scalable or extensible in today's environment,
given the exponential growth in data volume and complexity, and the fact that all employees need
insights to do their jobs. Augmented analytics represents the next wave of market disruption that
data and analytics leaders will need to embrace in order to build and sustain a competitive
advantage, particularly in industries undergoing digital transformation Efforts to incorporate
augmented analytics will likely encounter resistance for several reasons:

■ Reliance on traditional analytic processes and the misguided assumption that manual data
discovery through interactive exploration of data can identify all actionable and statistically
relevant findings.
■ The perception that augmented analytics tools are not transparent and represent a "black box"
approach to decision making.

Page 26 of 36 Gartner, Inc. | G00326012


■ The threat to job security, as redivision of workloads and redefinition of work processes upsets
the status quo.
■ Business leaders' reliance on intuition and traditional decision-making practices, and their
resistance to change.
■ The belief that analytics maturity follows a linear progression and maturation process, such that
predictive and prescriptive analytics can be considered only once a solid data foundation has
been established.

These challenges will require a focused effort from data and analytics leaders to challenge the
current processes and approaches to analytics, demonstrate the gaps and flaws in traditional BI
and manual visual-based data discovery approaches, and create an environment and culture that
supports change. An effective way to demonstrate the potential value of augmented analytics is to
identify business problems or specific business decisions where efforts to use traditional
approaches to BI have failed to deliver results in a timely, relevant and actionable way. This
approach can help business users form a relevant, contextual connection between a specific
problem and a technical solution, identifying gaps, errors or issues in the historical analytic
processes, and lay the foundations for the opportunity presented by new, more automated
approaches.

Previously unknown patterns and insights will be discovered using augmented analytics, which
should be used to strengthen the argument that manual approaches are not as effective. This
document contains case studies that demonstrate this potential. In many cases, manual
approaches lead to incorrect and biased assumptions because only a subset of data combinations
has been analyzed, or because statistical significance was mistakenly assumed. Conversely,
however, it should not be assumed that all findings and insights surfaced through the use of
augmented analytics should be taken at face value and do not need to be verified or tested.
Instead, any findings should inform a decision maker who can interpret the discovered insights
using experience and human intuition in order to decide on a course of action. Augmented analytics
represents a new approach to problem solving that supports humans in the decision-making
process. It enables faster time to insight and, ultimately, faster time to action and impact on
business outcomes.

Beyond adoption risk, acquiring the necessary data literacy skills will be a challenge to responsible
use of augmented analytics. As augmented analytics makes actionable insights available to users,
they must provide context, interpret findings and act on the discoveries or prescriptive information.
This requires everyone (not just managers) in the organization to have knowledge of data analysis,
statistics and interpretation. Organizations need to recruit people with analytics skills across all job
categories and invest in data literacy as an ongoing priority. According to research done by CEB
(see "The Talent Implications of Digitization: Changes in Enterprise Demand for Technology Skills
and Experience") and the CEB TalentNeuron data service, demand (as reflected in job descriptions)
for skills related to analyzing job descriptions from 2012 to 2016 was 4.3 times higher in non-IT jobs
than in IT jobs. This figure will only increase as analytics becomes a necessary component of every
job.

Recommendations

Gartner, Inc. | G00326012 Page 27 of 36


Data and analytics leaders should:

■ Start with a small list of specific business problems that cannot be solved, or that are too time-
consuming to solve, using traditional BI and data discovery methods, and launch a pilot to
assess the viability of augmented analytics.
■ Use augmented analytics tools to confirm or challenge findings surfaced by human
interpretation of manual data discovery exercises. Use augmented analytics capabilities as the
first step in identifying patterns that can be further explored and presented using traditional data
discovery tools and techniques.
■ Engage both business analysts and data scientists in learning about and incorporating
augmented analytics tools into the analytic process in an effort to define the best division of
responsibility between the roles.
■ Recruit people with analytics skills across all job functions, and expand investments in
companywide data literacy.

Evaluation Factors
Startups and some large vendors are beginning to offer a range of augmented data discovery
capabilities that have the potential to disrupt current visual-based data discovery vendors in the
long term. This will force BI and analytic leaders to re-evaluate investments. Initially, most
organizations should augment modern BI and analytic platforms with augmented data discovery
tools.

Many of the normal considerations when evaluating vendors apply to vendors of augmented data
discovery tools. They include strategic factors (vendor viability, global presence, support and
pricing) and functionality factors.

In terms of functionality, distinct considerations for the evaluation of augmented data discovery
vendors include:

■ Data access and preparation: The breadth of data sources to analyze differentiates
augmented data discovery tools from most current modern BI and analytics platforms.
However, less mature products may initially only support well-modeled, relational data sources.
More advanced products support ingestion of messy, multistructured data sources, stored in a
variety of formats, including JSON, Hadoop and NoSQL, both on-premises and in the cloud. A
product should profile the data and make intelligent recommendations about how to cleanse
data, in addition to recommending datasets to combine. For example, when analyzing traffic
fatalities, it may make sense to combine data about these fatalities with public data about
population density.
■ Algorithms, transparency and interoperability: Consider the range of algorithms supported.
Ideally, the product should allow additional algorithms to be added to the out-of-the-box
libraries, or refined using data science languages such as Python, R and Scala.
■ Interactivity and narration of insights: Users should be able to ask questions using NLQ or
search, either entered via a search box, or via a conversational chatbot. Relevant results should

Page 28 of 36 Gartner, Inc. | G00326012


be narrated by text or voice. Vendors may develop their own NLQ and NLG interfaces or partner
with third-party providers. For example, SAS and Sisense integrate with Amazon (for Alexa). Qlik
and Microsoft partner with Narrative Science, while MicroStrategy partners with Automated
Insights, and Information Builders with Yseop. Consider ease of configuration of language tone,
verbosity and domain or vertical-specific ontologies, APIs and native integrations, and
languages and types of algorithms supported within the narration.
■ Presentation of insights: Modern BI and analytic platforms often recommend the best way to
display data, once the report author has selected the relevant data. All the features necessary
for visual exploration (chart types, data manipulation, interactivity) continue to apply to
augmented data discovery tools. However, in addition, the software should automatically
generate the most statistically relevant insights. Assess how the product supports automated
forecasting, trends, predictions, clustering, segments, correlations, factor analysis, decision
trees and so on. Evaluate the types of model and algorithm supported (linear, Holt-Winters and
so on), as well as the openness of models that are autogenerated.
■ Deployment options: Increasingly, advances in machine learning and NLG are cloud-only
options, or at least cloud-first options. IBM Watson Analytics, SAP Analytics Cloud and
Salesforce Einstein, for example, are cloud-only. Microsoft Power BI, whose Quick Insights
capability automatically generates findings, is primarily a cloud offering; the newly introduced
on-premises option lacks the augmented data discovery capabilities. DataRobot, on the other
hand, offers both cloud and on-premises deployment options, as do Sisense and ThoughtSpot.

Augmented data discovery is a critical capability evaluated for vendors that qualify for inclusion in
the "Magic Quadrant for Business Intelligence and Analytics Platforms" (see "Critical Capabilities for
Business Intelligence and Analytics Platforms").

Recommendations

Data and analytics leaders should:

■ Look for opportunities to use sandboxing and free trials to test and explore how augmented
data discovery complements existing data discovery and data science initiatives.
■ Recognize that these tools will mature and evolve over the next couple of years. Consult "Cool
Vendors in Analytics, 2017" and "Cool Vendors in Data Science and Machine Learning, 2017."
Also monitor moves by the vendors in "Magic Quadrant for Business Intelligence and Analytics
Platforms."
■ Familiarize yourself with, and monitor, the augmented data discovery capabilities and roadmaps
of your BI and analytics and data science platform vendors, as well as emerging startups as
they mature. Do so particularly in terms of:
■ The upfront set-up required and the data preparation required
■ The types of data that can be analyzed
■ The types of algorithms supported

Gartner, Inc. | G00326012 Page 29 of 36


■ The accuracy of the findings
■ The extent to which models can be audited and refined by more specialized data scientists
■ The integration of analytic workflow across descriptive, diagnostic, predictive and
prescriptive capabilities

Recommendations
Data and analytics leaders planning to modernize using next-generation analytics should:

■ Pilot and validate:


■ Launch an augmented analytics pilot to assess viability and prove value. Start with a small
list of specific business problems that are currently tackled manually and time-intensive or
prone to bias.
■ Identify where automating algorithms to detect patterns in data could reduce the
exploration phase of analysis and improve highly skilled data scientists' productivity, while
recognizing that they still need to validate models, findings and applications.
■ Evaluate the extent of machine-learning automation features and their role in the data
preparation and cataloging process when evaluating a self-service data preparation and
data cataloging vendors.
■ Assess organizational readiness; invest in updating roles and responsibilities and in
ongoing and corporatewide data literacy training:
■ Assess and plan your organization's readiness for augmented data discovery in terms of
alignment with business outcomes, current challenges with existing data discovery
approaches, and skills. Like visual-based data discovery self-service users in modern BI
and analytics platforms, citizen data scientists will need ongoing training and support to
hone their skills.
■ Not neglect governance:
■ Educate business leaders and decision makers about the potential transformational impact
that augmented data discovery and advanced analytics can have if used by a wider
audience. Remember to stress the need for responsible use and governance to avoid
unintended consequences.
■ Develop guidelines for appropriate use of augmented data discovery tools and capabilities,
with an emphasis on people and the process pillars of the BI and analytics framework.
■ Modify current deployment models, emphasizing the need for analytic governance and
providing incentives for collaboration between citizen data scientists and expert data
scientists.
■ Reduce the risks of expert data scientist push-back and user misinterpretation of
insights:

Page 30 of 36 Gartner, Inc. | G00326012


■ Use augmented analytics tools to confirm or challenge findings surfaced by human
interpretation of manual data discovery exercises. Validate findings with expert data
scientists.
■ Accelerate adoption of augmented analytics and reduce risks from errors in interpretation
through user enablement and data literacy programs to develop the citizen data scientist
and data engineer roles.
■ Engage both business analysts and data scientists in learning about and incorporating
augmented analytics tools into the analytic process in an effort to identify the best division
of responsibility between the roles.
■ Begin evaluation:
■ Look for opportunities to use sandboxing and free trials to test and explore how augmented
analytics complements existing data integration, BIA and data science initiatives.
■ Recognize that these tools will mature and evolve over the next couple of years. Consult
"Cool Vendors in Analytics, 2017" and "Cool Vendors in Data Science and Machine
Learning, 2017." Also monitor moves by the vendors in "Magic Quadrant for Business
Intelligence and Analytics Platforms."
■ Familiarize yourself with, and monitor, the augmented analytics capabilities and roadmaps
of your BI and analytics, data science and machine-learning and self-service data
preparation platform vendors, as well as emerging startups as they mature. Do so
particularly in terms of the upfront setup required, the data preparation required, the types
of data that can be analyzed, the types and range of algorithms supported, and the
accuracy of the findings.

Representative Vendors
IBM (Watson Analytics), SAP (Lumira), Microsoft (Power BI) and SAS (Visual Analytics) are using
their assets in advanced analytics to innovate in the area of augmented data discovery. Salesforce
has acquired an augmented analytics innovator, BeyondCore, now Salesforce Einstein Discovery,
part of the Salesforce Einstein Analytics portfolio, as a way to differentiate itself from the rest of the
market and to drive the next wave of market disruption. Because these large vendors are creating
awareness for these innovations, they will accelerate the adoption of new augmented analytics
players, such as well-funded startups and organizations like Ayasdi, DataRobot (which recently
acquired Nutonian), Endor, Progress (DataRPM) and SparkBeyond.

NLG players that are investing in analytics include Narrative Science, Yseop and Automated
Insights.

What is notable in terms of market dynamics is that while the traditional BI players that were leaders
in the semantic-layer-based era were slow to react to the visual-based exploration market shift, they
are earlier to invest in augmented analytics than the visual-based data discovery disruptors, such as
Tableau and Qlik, which are now beginning to invest in this area of innovation.

Gartner, Inc. | G00326012 Page 31 of 36


On the data integration front, while augmented data preparation innovators include Paxata, Trifacta,
Datawatch, UniFi and cataloging vendors such as Alation, traditional data integration vendors such
as IBM, SAS, Informatica and Oracle are also investing to create new business-user-oriented and
machine-learning-assisted, augmented data integration and cataloging environments.

However, offerings are still immature, and it is early days in terms of adoption. Table 1 shows a
sample of augmented analytics vendors and capabilities (the depth and breadth of capabilities vary
by vendor). Others are emerging all the time. The vendors are listed by primary category. Some
vendors may fit into multiple categories. For example, Salesforce's Einstein Analytics could be
placed in "modern BI and analytics platforms," "augmented data discovery specialists" or
"augmented data science and machine learning platforms," but is placed in the first category
because it is targeted at business users.

Most augmented data discovery capabilities can run a range of advanced descriptive algorithms to
identify what factors affect an outcome variable. Vendors are differentiated by the depth and
breadth of algorithms that run automatically and that are on their roadmap.

Most of these tools allow more highly skilled data scientists to view the underlying model and
export it to an advanced analytics platform for further refinement into enterprise grade and
governed advanced descriptive, predictive and prescriptive models.

A number of data science and machine learning platforms — such as those of DataRobot, SAS
(Visual Statistics and SAS Visual Data Mining and Machine Learning), RapidMiner and KNIME — are
making it easier for expert data scientists and less-skilled citizen data scientists to build advanced
descriptive, predictive and prescriptive models. These differ from augmented data discovery
offerings, which automatically run a number of algorithms in parallel without the less skilled user
building a model, with most allowing an expert data scientist to look at and modify the underlying
model and export it for use in another advanced analytics platform or capability (SAS, R, Python,
Scala, and so on) for refinement.

Page 32 of 36 Gartner, Inc. | G00326012


Table 1. Examples of Augmented Data Discovery Vendors and Their Capabilities

Prepare Data Find Patterns in Data Share Findings

Sample Vendors Augmented Self- Smart Augmented Insight/ NLG of Findings


Service Data Visualizations Model Generation
Preparation

Modern BI and Analytics Vendors

ClearStory ✔ ✔ ✔ (basic) R ✔ partner R

IBM (Watson Analytics) R ✔ ✔ ✔ (basic) R

SAP (Lumira) ✔ ✔ (basic) R ✔ (basic) R ✔ (basic) R

SAP (Analytics Cloud) ✔ (basic) R ✔ (basic) R ✔ ✔ (basic) R

SAS (Visual Analytics) ✔ (basic) R ✔ ✔ (basic) R R

Microsoft (Power BI) ✔ (basic) R ✔ (basic) R ✔ partner R

Salesforce (Einstein ✔ (basic) ✔ ✔ ✔


Analytics, including
Einstein Discovery)

Qlik ✔ (basic) R ✔ R ✔ partner R

Tableau ✔ (basic) R R R ✔ partner R

Sisense ✔ (basic) R ✔ (basic) R ✔ partner R

AnswerRocket ✔ (with NLQ) ✔

Augmented Data Discovery Specialists

Big Squid ✔ (basic) ✔

Emcien ✔

Endor ✔ (basic) R ✔

NEC (Predictive Analysis ✔


Automation Platform)

Progress (DataRPM) ✔ ✔ ✔ ✔ (basic)

Augmented Data Science and Machine Learning Platforms

DataRobot ✔

Gartner, Inc. | G00326012 Page 33 of 36


Prepare Data Find Patterns in Data Share Findings

Sample Vendors Augmented Self- Smart Augmented Insight/ NLG of Findings


Service Data Visualizations Model Generation
Preparation

KNIME ✔ (basic) R

RapidMiner ✔ (basic) R

SAS (Visual Statistics, ✔ (basic) R ✔ ✔ (basic) R R


Visual Data Mining and
Machine Learning)

SparkBeyond ✔ ✔ ✔ (basic) R

Augmented Data Preparation Vendors

Paxata ✔

Trifacta ✔

Datawatch ✔

UniFi ✔

Alation ✔ (catalog focus)

Cambridge Semantics ✔ (catalog focus)

SAS (Cognitive Data ✔ R


Preparation in SAS Data
Preparation and SAS
Data Loader for Hadoop)

Natural-Language Generation

Automated Insights ✔ ✔

Narrative Science ✔ ✔

Yseop ✔ ✔

NLG = natural-language generation


BI = business intelligence; NLG = natural-language generation; NLQ = natural-language query; R = on the roadmap
A check mark with a "(basic)" designation means there are some augmented analytics capabilities in a generally available product, but
they are basic when compared to stand-alone disruptors in the space.
This is not an exhaustive list. New vendors enter the market all the time.

Source: Gartner (July 2017)

Page 34 of 36 Gartner, Inc. | G00326012


Acronym Key and Glossary Terms
Citizen Gartner defines a "citizen data scientist" as a person who creates or generates models that
data use predictive or prescriptive analytics but whose primary job function is outside the field of
scientist statistics and analytics. The person is not typically a member of an analytics team (for
example, an analytics center of excellence) and does not necessarily have a job description
that lists analytics as his or her primary role. This person is typically in a line of business,
outside IT and outside a BI team. However, an IT or BI professional may be a citizen data
scientist if the professional's work on analytics is secondary to his or her primary role. Citizen
data scientists are "power users" who are able to use simple and moderately sophisticated
analytic applications that would previously have required more expertise.

Gartner Recommended Reading


Some documents may not be available as part of your current Gartner subscription.

"Citizen Data Science Augments Data Discovery and Simplifies Data Science"

"Pursue Citizen Data Science to Expand Analytics Use Cases"

"Doing Machine Learning Without Hiring Data Scientists"

"Critical Capabilities for Business Intelligence and Analytics Platforms"

"Cool Vendors in Data Science and Machine Learning, 2017"

"Cool Vendors in Analytics, 2017"

"Hype Cycle for Analytics and Business Intelligence, 2017"

"Hype Cycle for Data Science and Machine Learning, 2017"

"Rebalance Your Integration Effort With a Mix of Human and Artificial Intelligence"

Evidence
This document draws on Gartner analysts' research; surveys of vendors' reference customers,
vendor briefings and hands-on testing of platforms conducted for 2017's Magic Quadrant and
Critical Capabilities reports; and discussions with users of Gartner's client inquiry service.

More on This Topic


This is part of three in-depth collections of research. See the collections:

■ The Salesforce Vendor Rating Companion Guide, 2017


■ Applying Artificial Intelligence to Drive Business Transformation: A Gartner Trend Insight Report
■ Fostering Data Literacy and Information as a Second Language: A Gartner Trend Insight Report

Gartner, Inc. | G00326012 Page 35 of 36


GARTNER HEADQUARTERS

Corporate Headquarters
56 Top Gallant Road
Stamford, CT 06902-7700
USA
+1 203 964 0096

Regional Headquarters
AUSTRALIA
BRAZIL
JAPAN
UNITED KINGDOM

For a complete list of worldwide locations,


visit http://www.gartner.com/technology/about.jsp

© 2017 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This
publication may not be reproduced or distributed in any form without Gartner's prior written permission. It consists of the opinions of
Gartner's research organization, which should not be construed as statements of fact. While the information contained in this publication
has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of
such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice
and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner Usage Policy.
Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research
organization without input or influence from any third party. For further information, see "Guiding Principles on Independence and
Objectivity."

Page 36 of 36 Gartner, Inc. | G00326012

S-ar putea să vă placă și