Documente Academic
Documente Profesional
Documente Cultură
FIGURE 1
12 90
80
10
70
8 60
50
6
40
Total: 8.4
4 30
2.1
20
2
10
1.8
0 0
2017 2022
The content for this excerpt was taken directly from Worldwide Storage for Cognitive/AI Workloads
Forecast, 2018–2022 . (Doc # US43707918). All or parts of the following sections are included in this
excerpt: Executive Summary, Market Forecast, Appendix and Learn More. Also included is Figures 1,
2, and 3, and Tables 1, 2, and 3.
EXECUTIVE SUMMARY
Intelligent applications based on artificial intelligence (AI), machine learning (ML), and continual deep
learning (DL) are the next wave of technology transforming how consumers and enterprises work,
learn, and play. While data is at the core of the new digital economy, it's about how you sense the
environment and manage the data from edge to core to cloud, how you analyze it in near real time,
learn from it, and then act on it to affect outcomes. IoT, mobile devices, big data, machine learning,
and cognitive/artificial intelligence all combine to continually sense and collectively learn from an
environment. What differentiates winning organizations is how they leverage that to deliver meaningful,
value-added predictions and actions for personalized life efficiency/convenience, improving industrial
processes, healthcare, experiential engagement, or any enterprise decision making.
AI has been around for decades, but due to the pervasiveness of data, seemingly infinite scalability of
cloud computing, availability of AI accelerators, and sophistication of the ML and DL algorithms, AI has
grabbed the center stage of business intelligence. IDC predicts that, by 2019, 40% of digital
transformation (DX) initiatives will use AI services; by 2021, 75% of commercial enterprise apps will
use AI, over 90% of consumers will interact with customer support bots, and over 50% of new
industrial robots will leverage AI. AI solutions will continue to see significant corporate investment over
the next several years.
As per IDC research, IT automation tops the use case for AI workloads, followed by workforce
management, customer relationship management, supply chain logistics, fraud analysis and
investigation, and predictive analytics (e.g., sales forecasting or failure prediction for preventive
maintenance). Privacy and security concerns will present serious landmines for AI-based DX efforts.
The IT organization will need to be among the enterprise's best and first use case environments for AI
— in development, infrastructure management, and cybersecurity.
Artificial intelligence is not only changing the way data and business processes are carried out; it is
leading to a broad reconfiguration of underlying infrastructure as well. Machine learning and deep
learning algorithms need huge quantities of training data and AI effectiveness depends heavily on
high-quality (and diverse) data inputs. Both training and inferencing are compute intensive and need
high performance for fast execution. All this drives the need for lots of compute, storage, and
networking resources, and to be truly successful, one needs to employ intelligent infrastructure.
IDC has predicted that, by 2021, 50% of enterprise infrastructure will employ artificial intelligence to
prevent issues before they occur, improve performance proactively, and optimize available resources.
IT and businesses will benefit from self-configurable, self-healing, self-optimizing infrastructure to
improve enterprise productivity, manage risks, and drive overall cost reduction.
This IDC study presents the worldwide 2018–2022 forecast for storage hardware and software for
cognitive/AI workloads.
"Cognitive/AI is poised to transform next-generation IT. Machine learning and deep learning require
huge amounts of training (and diverse) data. Storage for cognitive/AI needs to be dynamically
scalable, adaptable, high performing, cloud integrated, and intelligent," said Ritu Jyoti, research
director, IDC's Enterprise Storage, Server and Infrastructure software team. "AI services continue to be
integral to digital transformation (DX) initiatives."
In the past few years, we have witnessed the rise of digital transformation and the disruptions and
opportunities it poses for traditional businesses and society. Organizations of every size and industry
risk fundamental disruption because of new technologies, new players, new ecosystems, and new
ways of doing business. IDC predicts worldwide spending on digital transformation technologies will
expand at a CAGR of 17.9% through 2021 to more than $2.1 trillion. Large and diverse data sets
create new challenges, but when combined with AI technologies and exponential computing power,
they create ever-greater opportunities.
Cognitive/AI applications and workloads will continue to shape the storage industry, with the central
theme "scale, performance, intelligence, and self-management." Naturally, storage suppliers are
aggressively building or need to build AI workload–centric products and solutions:
Make your offerings intelligent to help customers improve enterprise productivity, manage risks, and
drive overall cost reduction. Self-configurable, self-healing, and self-optimizing storage can help:
Table 1 provides a top-line view of IDC's forecast of the storage for cognitive/AI workloads for 2017–
2022 (also refer back to Figure 1). For major market forces driving this forecast, see the Market
Context section.
TABLE 1
Revenue ($B)
Note: The data includes a new definition of internal storage, and it does not include tape media.
Both storage systems and storage software for cognitive/AI workloads are expected to exhibit
growth in the next five years, exceeding growth of the respective broad markets (enterprise
storage systems and enterprise storage software).
In the DX era, with massive growth in data volumes and greater emphasis in analytics
(including real time), storage systems revenue will grow, incorporating larger capacity and
newer memory/flash technologies.
As cognitive/AI workloads become business critical, storage software revenue for data
protection, availability, integration, security, and data location optimization will grow.
Table 2 provides estimates for spending on storage systems for deployment in support of a variety of
cognitive/AI workloads use cases.
IDC estimates:
IT automation is the topmost use case for AI workloads. Revenue for storage used for IT
automation will start from a larger percentage and have decent growth.
Organizations are prioritizing use cases that have immediate revenue and cost impact. Back-
office functions are prioritized as there is already extensive amounts of computers in use,
alleviating the skill set and human concerns related to AI adoption.
Front-office functions usage of AI is expected to have a large impact on business, and with
change management in place, retooling of existing staff and improvement in trustworthiness of
data and algorithms, AI adoption for front-office functions are expected to grow the fastest.
Table 3 provides estimates for spending on storage systems for deployment in private cloud, public
cloud, edge, and traditional deployment scenario.
IDC estimates:
Revenue for storage used for cognitive/AI workloads in the public and private cloud
consumption/storage deployment scenario will grow much faster when compared with
traditional (on/off-premises) storage deployment scenario. Agility and flexibility are
fundamental to cognitive/AI workloads deployments. Many businesses don't have the ability to
scale their existing infrastructure in capacity or performance, skill set to support newer
technologies, or the floor space to support the scale of capacity. All of these are leading to
more and more of cognitive/AI deployments on the public cloud or with hosted service
providers. However, private cloud/on-premises traditional deployment is expected to be
significant for the training phase of AI workload, especially where intellectual property
protection is critical. Today, in some cases, the training phase is run on-premises, also due to
the lack of powerful infrastructure availability in public cloud. Inferencing is typically run on
public cloud.
Revenue for storage used for cognitive/AI workloads at the edge is expected to grow the
fastest as the bulk of inferencing for IoT data will happen at the edge.
MARKET CONTEXT
Modern data sources and characteristics are different. Today, data consists of small to large files,
structured, semistructured, and unstructured content and data access varies from random to
sequential. By 2025, more than a quarter of the global data set will be real time in nature and real-time
IoT data will make up more than 95% of it. In the new age of big data, modern applications and
compute technologies leverage massively parallel architecture for performance.
If we examine the data pipeline for AI workloads (see Figure 2), one can see that the application
profiles, compute, and I/O profiles change from ingestion to real-world inferencing. Machine learning
and deep learning need huge quantities of training data. Both training and inferencing are compute
intensive and need high performance for fast execution. Artificial intelligence applications push the
limits on thousands of GPU cores or thousands of CPU servers. Parallel compute demands parallel
storage. While the training phase requires large data stores, inferencing has less need for it. The
inference models are often stored in a DevOps-style repository where they benefit from ultra-low-
latency access. While the training phase is all-set once the execution model has been developed
based on the data and the workload has moved to the inferencing stage, often retraining of the model
is needed as new or modified data comes to light. In some cases, the real-time nature of the
application may require near-constant retraining and update of the model. Also, over a period of time,
organizations may benefit from retraining the model due to additional data sources and insights in
play.
Today, customers are using different infrastructure solutions and approaches to support the data
pipeline for AI, generally leading to data silos. Some of them create duplicate copies of the data for the
AI pipeline with the intent of not disturbing the stable applications.
If data doesn't flow smoothly through the entire pipeline, productivity will be compromised and
organizations will need to commit increasing amounts of effort/resources to manage the pipeline.
Organizations need to adopt dynamically adaptable and scalable intelligent infrastructure, that is tuned
for varied data formats and access, has the power to process and analyze large volumes of data, and
speed to support faster compute calculations and decision making. It also needs to be efficient to help
generate more inferencing models, more accurately, than traditional analytical and programming
approaches leading to overall improvement in productivity.
As per IDC's Cognitive, ML and AI Workloads Infrastructure Market Survey conducted in January 2018
(n = 405, 1,000+ U.S. employees and 500+ Canadian employees), today, traditional SAN/NAS is
largely used for on-premises run of AI/ML/DL workloads due to their existing deployment footprint and
earlier stages of AI adoption, but with the need to scale dynamically, store large volumes of data at
relatively low cost, and support high-performance, software-defined storage, hyperconverged
infrastructure, and all-flash arrays with newer memory technologies will gain adoption, aligned with the
individual offering-specific advantages and the data pipeline stage of AI deployment.
A large percentage of the enterprise that is using the public cloud right now for AI are using it as a test
bed — an inexpensive way to get started and to figure out which applications are going to be amenable
The decision to run the AI pipeline on public cloud versus on-premises is also typically driven by data
gravity — where the data currently is or is likely to be stored, easy access to compute resources,
applications, and the speed by which the capabilities need to be explored and deployed. Movement of
large data sets out of the cloud is cost prohibitive, so it is more than likely that the total pipeline will run
on the public cloud once the data is committed there.
Intelligent Storage
The future of enterprise storage is not just feeds and speeds, but intelligence and self-management
lead to self-configurable, self-healing, and self-optimizing storage. Predictive analytics and artificial
intelligence will enable companies to sharply reduce downtime and ensure optimal application
performance, essentially switching from "firefighter mode" to a more proactive IT strategy.
Today, AI/ML capabilities in storage are in infancy and advancements are being made. Below we
cover a few representative vendor examples supporting/working toward intelligent storage. Note that
this list is not meant to be comprehensive.
Hewlett Packard Enterprise (HPE) is updating its acquired Nimble Storage InfoSight array
management system with a machine learning-driven recommendation engine and adding InfoSight to
3PAR arrays. The new AI Recommendation Engine (AIRE) is supposed to improve both infrastructure
management and application reliability. InfoSight now, HPE says, preemptively advises IT how to
avoid issues, improve performance, and optimize available resources. HPE's vision is to create
autonomous datacenters.
IBM recently announced plans to add new artificial intelligence functionality to its cloud-based storage
management platform, IBM Spectrum Control Storage Insights. The new cognitive capability will
stream device metadata from IBM Storage systems to the cloud, augmenting human understanding
with machine learning insights to help optimize the performance, capacity, and health of clients'
storage infrastructure. IBM cognitive storage management capabilities offered by this platform already
include optimization with data tiering and reclamation of unused storage. IBM plans to add further
artificial intelligence that will reduce the time it takes to monitor complex infrastructures and to find and
resolve issues that impact application performance.
NetApp Active IQ leverages machine learning to teach the telemetry system new patterns, so it's
continually learning and adapting to your evolving environments.
Dell EMC CloudIQ proactively monitors and measures the overall health of an unlimited number of Dell
EMC Unity storage systems through intelligent, comprehensive, and predictive analytics. It's these
context-aware insights backed by extensive telemetry data that help you ensure stringent application
performance requirements are being met while proactively differentiating standard and "normal" spikes
in performance from atypical adverse behaviors and providing potential root causes and remediation
for the latter. CloudIQ dynamically compares the behavior of each Dell EMC Unity storage array
against the "norms" while looking for anomalies that could indicate potential problems.
Eventually cloud-based AI and ML will be used to fully automate the operation of the storage. This will
mean storage systems that do more of the data management themselves, enabling organizations to
shift dollars away from today's IT maintenance budgets over to tomorrow's DX initiatives.
In addition, there are several other viable emerging memory candidates that will be valuable for the
cognitive/AI workloads. Intel's 3DXP, new storage class memory, is expected to go mainstream in a
year or so. They should operate in 10–20 microsecond realm, instead of the 100–200 microsecond
range for flash. This 10x performance improvement will manifest as both storage cache and tier to
deliver better, faster storage for cognitive/AI workloads.
MARKET DEFINITION
Artificial intelligence(AI), machine learning (ML), and deep learning (DL) are all interrelated. AI is
defined as the study and research of providing software and hardware that emulates human beings.
ML is a discipline and sets of algorithms that have evolved from statistics and is now considered a part
of AI. ML does not require much of explicit programming in advance to gain intelligent insight from
data, because of its ability to use learning algorithms that simulate human learning capabilities by
developing statistical models based on data, lots of data. The learning can be supervised by humans,
unsupervised or reinforced. DL is a subset of ML. It relies on a type of algorithm called neural networks
with extensive layers between input/output and, again, lots of data.
IDC defines the storage for cognitive/AI workloads as the storage used for running the cognitive/AI
software platforms, content analytics, advanced and predictive analytics, and search systems using
machine learning and deep learning algorithms.
Cognitive/AI software platforms provide the tools and technologies to analyze, organize, access, and
provide advisory services based on a range of structured and unstructured information. These
platforms facilitate the development of intelligent, advisory, and cognitively enabled applications. The
technology components of cognitive software platforms include text analytics, rich media analytics
(such as audio, video, and image), tagging, searching, machine learning, categorization, clustering,
hypothesis generation, question answering, visualization, filtering, alerting, and navigation. These
platforms typically include knowledge representation tools such as knowledge graphs, triple stores, or
Content analytics systems provide tools for recognizing, understanding, and extracting value from text
or by using similar technologies to generate human readable text. This submarket also includes
language analyzers and automated language translation as well as text clustering and categorization
tools. This submarket also includes software for recognizing, identifying, and extracting information
from audio, voice, and speech data as well as speech identification and recognition plus converting
sounds into useful text. Finally, this submarket includes software for recognizing, identifying, and
extracting information from images and video, including pattern recognition, objects, colors, and other
attributes such as people, faces, cars, and scenery. These tools are used for computer vision
applications and clustering, categorization, and search applications. Representative software vendors
in this submarket include SAP (HANA Text Analysis), Google (Cloud Speech API), Nuance (Automatic
Speech Recognition and Natural Language Understanding), and IBM Intelligent Video Analytics.
Advanced and predictive analytics tools include data mining and statistical software. These tools use a
range of techniques to create, test, and execute statistical models. Some techniques used are
machine learning, regression, neural networks, rule induction, and clustering. Advanced and predictive
analytics tools and techniques are used to discover relationships in data and make predictions that are
hidden, not apparent, or too complex to be extracted using query, reporting, and multidimensional
analysis software. Products on the market vary in scope. Some products include their own
programming language and algorithms for building models, but other products include scoring engines
and model management features that can execute models built using proprietary or open source
modeling languages. Representative software vendors in this market include SAS, IBM (SPSS),
RapidMiner, MathWorks (MATLAB), Fuzzy Logix, Microsoft (Microsoft R Server), SAP (SAP Predictive
Analytics), Oracle (Oracle Advanced Analytics), Quest Software (Statistica), and Wolfram
(Mathematica).
Search systems include departmental, enterprise, and task-based search and discovery systems as
well as cloud-based and personal information access systems. This submarket also includes unified
information access tools and systems that combine text analytics, clustering, categorization, and
search into a comprehensive information access system. Representative software vendors in this
market include Palantir (Gotham), Google (Site Search and Cloud Search), IBM (Watson Explorer),
and Elastic (Elasticsearch).
For further details on the cognitive/AI software platforms, content analytics, advanced and predictive
analytics, and search systems, please refer IDC's Worldwide Big Data and Analytics Software
Taxonomy, 2017 (IDC #US42353216, March 2017).
Worldwide storage for cognitive/AI workloads is a subset of the worldwide storage for big data
analytics market (see Figure 3). It draws upon most of the revenue from the "rest of the BDA" segment,
which covers cognitive/AI software platforms, content analytics tools, and advanced and predictive
analytics tools; "nonrelational data stores" and "continuous analytic tools," and a small subset from
"relational data warehouses." The storage can be varied by storage model, array and infrastructure
type, and data organization. It can be consumed as nonservice or as a service.
For details on IDC's definition of worldwide storage for big data and analytics, refer to IDC's Worldwide
Storage for Big Data and Analytics Taxonomy, 2017 (IDC #US42555117, May 2017).
FIGURE 3
METHODOLOGY
The five-year annual forecast published in this study is an annual rollup of IDC's worldwide storage for
cognitive/AI workloads. This forecast covers all of the storage systems and associated storage
software.
Worldwide Storage for Big Data and Analytics Forecast, 2017–2021 (IDC #US43013117,
September 2017)
IDC's Worldwide Storage for Big Data and Analytics Taxonomy, 2017 (IDC #US42555117,
May 2017)
IDC models of historical IT technology adoption/diffusion
The forecast was developed by integrating this information with IDC-developed forecast assumptions
about key market growth drivers and inhibitors.
The tables and figures in this study are generated from a proprietary IDC database and analytical
tools. Our census process researches enterprise storage information on a product-, vendor-, and
Note: All numbers in this document may not be exact due to rounding.
RELATED RESEARCH
What Type of Storage Architecture Will Be Used for On-Premises Run of AI/ML/DL
Workloads? (IDC #US43587818, February 2018)
Are You Ready for Intelligent Infrastructure in Enterprise Datacenters? (IDC #DR2018_T5_RJ,
February 2018)
IDC's Forecast Scenario Assumptions for the ICT Markets and Historical Market Values and
Exchange Rates, Version 4, 2017 (IDC #US43531218, January 2018)
Applying Cognitive/Artificial Intelligence Techniques to Next-Generation IT (IDC
#US43176317, November 2017)
Worldwide Storage for Big Data and Analytics Forecast, 2017–2021 (IDC #US43013117,
September 2017)
IDC's Worldwide Storage for Big Data and Analytics Taxonomy, 2017 (IDC #US42555117,
May 2017)
Global Headquarters
5 Speen Street
Framingham, MA 01701
USA
508.872.8200
Twitter: @IDC
idc-community.com
www.idc.com
Copyright Notice
This IDC research document was published as part of an IDC continuous intelligence service, providing written
research, analyst interactions, telebriefings, and conferences. Visit www.idc.com to learn more about IDC
subscription and consulting services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please
contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or sales@idc.com for information on
applying the price of this document toward the purchase of an IDC service or for information on additional copies
or web rights.
Copyright 2018 IDC. Reproduction is forbidden unless authorized. All rights reserved.