Documente Academic
Documente Profesional
Documente Cultură
Future Architectures,
Skills and Roadmaps
for the CIO
September 2011
Sponsored by
By Philip Carter
w h i t e pa p e r
The ‘Big Data Era’ has arrived — multi-petabyte data warehouses, social
media interactions, real-time sensory data feeds, geospatial information
and other new data sources are presenting organisations with a range of
challenges, but also significant opportunities. IDC believes that as CIOs
start to adopt the new class of technologies required to process, discover
and analyse these massive data sets that cannot be dealt with using
traditional databases and architectures, it will become clear that the real
value will be derived from the high-end analytics that can be performed
on the increasing volumes, velocity and variety of data that organisations
are generating – or Big Data analytics.
One of the key differences between analytics in the traditional mode, and what we are dealing with in terms
of the Big Data era is that we are gathering data that we may or may not need – and from the perspective of
analysis, this means ‘we don’t know what we don’t know’ – hence, the variables and models are likely to be
entirely new, requiring a different infrastructure strategy and perhaps most importantly, new skill sets.
The objective of this white paper is to explore the initial impact that Big Data is having on organisations,
particularly the IT departments – which is being forced to re-assess architectures, delivery models and future
roadmaps. It will explore the following areas in more detail:
Defining Big Data. This is not in the context Hadoop, Mapreduce, Key Value
of the quantity or threshold that actually Store? There is a lot of hype around the new
quantifies Big Data (as this is changing all the technologies that are being used by the market
time, and will be applied differently, depending to deal with the Big Data phenomenon. We
on the vertical and market segment), but more will highlight some of these and their relative
in terms of a new generation of technologies importance.
and architectures, designed to economically
extract value from very large volumes of a The Value of Big Data… in Analytics.
wide variety of data, by enabling high-speed The bottom line here is that it is getting more
capture, discovery and/or analysis. complicated to process and analyse these
1
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
large and growing data sets – and it essentially that need to be put in place as the Big Data
requires a re-assessment of the broader phenomenon becomes a reality, there will be
information management strategies for the increasing demand for ‘data scientists’ – the
majority of organisations that have started their next-generation analytical professionals who
business analytics journey. are able to extract information from large data
sets and then present value-added content of
Why Big Data Analytics is Important
business value to non-data experts – who also
(and Different). Many have asked the
have the unique skill of understanding the new
question – what is new with this trend? This
models that need to be put in place.
section will highlight the traditional use of
business analytics in the old ‘pre-Big Data’
Mapping out the Big Data Analytics
world, versus Big Data analytics in the ‘New
Journey. The Big Data analytics journey will
World’. This will also look at the various use
be an iterative one – it is therefore important
cases that IDC expects to see being most
to map this out in the context of a broader
commonly used across a variety of industries.
framework. This section aims to do exactly
The Skill Factor – the Rise of the that, and also provide some recommendations
Data Scientist. With the raft of new to CIOs as they embark on this exciting journey
technologies and organisational structures into the brave new world of Big Data analytics.
Situation Overview
The Rise of Business Analytics
Much has been written on how the amount of data in the world is
exploding in volume. According to the recent IDC Digital Universe
study, the amount of information created and replicated will surpass 1.9
zettabytes (1.8 trillion gigabytes) in 2011 – growing by a factor of 9 in just
five years.
Big data is a dynamic that seemed to appear from technology area is rising on the radars of CIOs
almost nowhere. But in reality, Big Data is not new and line-of-business (LOB) executives. To validate
– and it is moving into mainstream and getting a this, as part of a recent survey of 5,722 end users
lot more attention. The growth of Big Data is being in the US market, business analytics ranked
enabled by inexpensive storage, a proliferation of in the top five IT initiatives of organisations.
sensor and data capture technology, increasing The key drivers for business analytics adoption
connections to information via the cloud and remained conservative or defensive. The
virtualised storage infrastructure, as well as focus on cost control, customer retention and
innovative software and analysis tools. It is optimising operations is likely a reflection of
no surprise then that business analytics as a the continued economic uncertainty. However,
2
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
top drivers vary significantly by organisation According to more than 1000 CIOs and LOB
size and industry. Similarly, IDC surveyed 693 executives that were interviewed as part of the
European organisations in February 2011 where Asia/Pacific C-Suite Barometer in February 2011,
51% of respondents said that BI and analytics business analytics was rated as the number
are high-priority technologies. In emerging one technology area that would enable their
markets such as Asia/Pacific, the focus is very organisations to gain a competitive edge in the
much on capturing the next wave of growth. year ahead.
TOP 5
Business intelligence/
analytics
Network
Social media/
online channel
Collaboration
(including video, mobility,)
Cloud computing/
services
0 5 10 15 20 25 30 35 %
With more businesses in Asia investing in IT to questionable data quality) at the right time (due
ride the hyper growth wave in emerging markets, to performance and scalability issues) to the right
they are harnessing analytics-led solutions to gain stakeholders within their organisations for the
better customer insights, manage risk and financial critical decision-making capabilities needed to
metrics more effectively, and at the same time, drive the necessary business impact. And where
strive for unique market differentiation. Historically, they are unable to do this, the line of business is
organisations have made significant investments procuring and deploying their own solutions in a
in applications with the objective of automating new wave of ‘shadow IT’ investments focusing
business processes and capturing data to improve on business analytics, thereby forcing CIOs to
operational efficiency. Many of these projects are re-examine these issues with a specific focus on
still ongoing, but what is becoming increasingly driving better IT-business alignment. These are
clear to the senior management of these entities taking place even without the ‘Big Data’ dynamic
is that they (and their business managers) have in the picture – which when added, creates the
not been able to get hold of the right information ‘perfect storm’ for Big Data analytics to take
(mainly due to poorly integrated systems and centre stage.
3
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
A Note on Terminology:
BI or Analytics?
We have some challenges when defining and analytics, the term ‘analytics’ simply means a
using terminology for business analytics. Because dashboard on top of some data.
the BI market is mature, many terms have been
around for a long time and have either become For the purpose of this white paper, we
obsolete or have been redefined over the years. interpret ‘BI’ to mean either QRA tools or
For example, the term ‘BI’ itself is sometimes BI across the board (in its narrow definition),
used in a narrow sense (only query, reporting, or ‘business analytics’ (in its broad definition)
and analysis [QRA] technology) and at times, in IDC terminology. We interpret ‘analytics’ to
in a broad sense to refer to the whole of what mean either advanced analytics (data mining,
IDC calls business analytics (including data statistics, optimisation and forecasting) or analytic
warehousing and analytic applications in addition applications (FPSM, CRM and marketing analytics,
to front-end tools). The term ‘analytics’ is relatively supply chain analytics, etc.). Business Analytics is
new and its meaning is often unclear — does it a combination of the above (and also includes data
refer to advanced analytics including predictive warehousing technologies) and this is highlighted
analytics, optimisation and forecasting, or analytic by IDC’s Business Analytics Taxonomy for 2011
applications? In some submarkets, such as Web (see figure 2 below):
4
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
Unstructured
Data Data (Video,
Volumes rich media etc)
Semi-Structured
(e.g. Weblogs,
social media feeds)
Data =
Big, Complex,
High Velocity &
Wide Variety
Time
Source: IDC, 2011
The Volume. One is embodied more in the complex in Asia with local social media sites
structured data realm. Some of this is held in like RenRen in China and Nate in Korea.
transactional data stores and is linked to the
The Velocity. There will also be demand to
ever-present electronic trail that individuals
analyse this data on a more regular basis – for
and businesses create in the wake of rapidly
example, taking into account all transactions
increasing online activity. Sensory data
rather than a sample to obtain a more
(machine-to-machine) contribute to this area
complete view of risk on a trade in real time.
too. The other is in existing data warehouses
or data marts, which have over time grown to
petabyte scale.
In summary, Big Data refers to data sets whose
The Variety. The other aspect of this Big volume, variety, velocity and complexity make it
Data phenomenon is the need to analyse impossible for current databases and architectures
semi-structured and unstructured data. to store and manage. IDC intentionally does not
Text, video and other forms of media will define Big Data as larger than a certain threshold
require a completely different architecture (i.e. terabytes), mainly since this threshold would
and technologies to perform for the required be a moving target depending on the sector, as
analysis. For example, if you look at the well as the fact that it will obviously grow over time.
social media phenomenon, many marketing More important is the value that organisations can
departments are looking at ways to do derive from this phenomenon – and the resulting
sentiment and brand analysis based on need to rethink their information strategies to
what is being posted on Facebook, Twitter extract the value.
and YouTube. This dynamic becomes more
5
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
Other Definitions:
Hadoop, Mapreduce, Key Value Store
With the focus on Big Data going mainstream, a range of new technologies have hit the market. The table
below gives an overview of these technologies, with associated context (note that the list is not exhaustive).
Technology Context
Proprietary distributed database system built on the Google File System.
Big Table Inspiration for HBase.
An open source (free) software framework for processing huge data sets
on certain kinds of problems on a distributed system. Its development was
Hadoop inspired by Google’s MapReduce and Google File System. It was originally
developed at Yahoo! and now managed as a project of the Apache
Software Foundation.
An open source (free) distributed, non-relational database modeled on
Google’s Big Table. It was originally developed by Powerset and is now
HBase managed as a project by the Apache Software Foundation as part of
Hadoop.
A software framework introduced by Google for processing huge data sets
MapReduce on certain kinds of problems on a distributed system. Also implemented in
Hadoop.
A non-relational database is one that does not store data in tables (rows
Non-relational database/ and columns) – in contrast to a relational database. Key Value Stores allow
Key Value Store for the management of schema-less (noSQL) entities.
Although some of these terms will be used it. Having said that, most IT executives are not
throughout this white paper, the focus is not to aware of the technologies and trends developing
examine them in too much detail – because as in this area – and where they are aware of it,
one IT executive recently mentioned – ‘to know their strategy is to put a couple of people in their
the technology is one thing, but to apply it in the enterprise architecture team to experiment with
right environment is something entirely different’. the new technologies (i.e. in memory, Hadoop,
The new technology needs to be tied back to MapReduce, Key Value Stores etc) that are being
business requirements as much as possible – not used to deal with the ‘Big Data’ phenomenon.
just examining the technology for the sake of
6
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
Many have asked the question – what is new with business users – complete with SLAs that have
this trend? This section highlights the traditional the security, performance, availability and cost
use of business analytics in the old ‘pre-Big Data’ profiles transparent to all in the form of a service
world, versus Big Data analytics in the ‘Brave catalog. Very few organisations, if any, have
New World’. This will also look at the various use achieved this state of infrastructure ‘nirvana’,
cases that IDC expects to see being used most and are still battling with a spaghetti-like tangle
commonly across a variety of industries. The of compute resources in their datacenter. And
majority of IT organisations have progressed in now, we have this external force of Big Data as
terms of their infrastructure architectures over mentioned earlier that is forcing CIOs to re-
time; from predominantly mainframe-based architect their infrastructure – particularly in the
environments in the 1980s to a focus on client- context of how analytics capabilities are deployed
server in the 1990s and the Web at the turn of the in an enterprise-wide fashion.
century, to what is now popularly known as ‘private
cloud’. This supposed state of ‘nirvana’ constitutes Below is an overview of the changes that IDC
a consolidated, virtualised set of infrastructure sees happening in the infrastructure world that
resources (server, storage and network) that can is increasingly impacting the Big Data analytics
be self-provisioned in an automated fashion by world:
7
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
Based on IDC’s research in this space, here are the high-end analytical skills needed to help
three suggestions for CIOs in dealing with these drive the necessary business impact across
issues: multiple functions.
8
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
The bottom line here is that it is getting more analytics journey. But the impact is potentially
complicated to process and analyse these large, enormous. If you look at optimising the price on
complex and growing data sets – and it essentially every item in a global retail chain or detecting
requires a re-assessment of the broader fraud in real time – you get a sense of the type of
information management strategy for the majority problems that Big Data analytics can be used to
of organisations that have started their business solve.
However, despite the clear potential of such cases can be best mapped out across two of
analytics – it is important to understand that it the Big Data dimensions – namely velocity and
will not necessarily be relevant or applicable variety as outlined below:
to every use case. IDC believes that these use
Data Predictive
Social Media
Velocity Maintenance in
Sentiment Analysis
Aerospace
Disease Analysis
Demand Forecasting
on Electronic Health
in Manufacturing
Records
Data Variety
9
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
A better sense of the potential impact of deploying wide range of products in real time based on
Big Data analytics to drive high value impact can demand forecasting scenarios (that include
be derived by exploring these use cases in more the impact of promotions, seasonality and
detail: important calendar events) has a major impact
on margins. These capabilities can also be
Real-time Fraud Detection in Banks. augmented by social media sentiment analysis
Involves the ability to detect, prevent and to ascertain customer demand for certain
manage fraud across multiple products, lines products on a more real-time basis.
of business and channels for a bank. This
Disease Analysis on Electronic
requires the ability to capture the history
Health Records. As healthcare services
for different types of entities (e.g. card,
evolve, analysts can get hold of a patient’s
account, customer, terminal ID or IP address)
entire medical history in electronic format.
involved in transactions, amplifying accuracy
This will present a major opportunity for Big
in detecting customer behaviours that fall
Data analytics. For example, in the case of
outside the norm during point-of-sale (POS)
a disease such as diabetes, the ability to
transactions. This information can be used by
correlate patient medical history with dietary
multiple predictive models, for fraud detection
data (potentially from market basket analysis
and credit risk assessment.
in retail) and optimised exercise schedules will
Markdown Optimisation in Retail. provide medical practitioners with new insights
The ability for retailers to optimise prices for a that they had only previously dreamt of.
High-end analytics will require new sets of the software interacts with the hardware to
skills in two key categories: leverage the data will be required.
10
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
what we don’t know’ – i.e. there is so much the analytics that needs to be done on these
unstructured data that the variables and new data types and structures.
analytical models are likely to be entirely new.
This means that there is a need to re-think For example, if you look at the social media
the way the analytical power users approach phenomenon (contributing to the semi-structured
their work by creating a ‘Sandbox Mentality’ and unstructured data part of Big Data), many
where discovery is always the starting point. marketing departments are looking at ways to do
Generally, a background in data mining and sentiment and brand analysis based on what is
statistics would be a good starting point for being posted on Facebook, Twitter and YouTube
this type of analysis. Moving forward, there (massive amounts as you can expect). This
will be increasing demand for ‘data scientists’ dynamic becomes more complex in Asia with local
– the next-generation business analyst with social media sites like RenRen in China and Nate
strong statistical skills who are able to extract in Korea. Currently, IT is not the first port of call
information from large data sets and then for the chief marketing officer since it lacks the
present value to non-analytical experts – but skills to understand what needs to be done (and
with the unique skill of understanding the new in many cases, is still trying to work out what role
algorithms and analytical models that will it should play in the policy or governance of the
have the most significant business impact in use of social media). So the make-up of the IT
the short term. Globally, IDC is seeing a lot of department needs to be re-assessed in terms of
interest in this more analytically inclined skill technical, business and relationship skills.
set. Roles and responsibilities have not been
defined – but it basically fits in with the earlier The maturity model below highlights how IDC sees
comments in terms of ‘we don’t know what we these skills (both technical and business) mapping
don’t know’ – i.e. there is so much unstructured out in the context of the organisations that have
data that the variables and analytical models adopted business analytics over time – with a
are likely to be entirely new. It requires a very view to how this could evolve in the era of Big
‘out-of-the-box’ type and creativity in terms of Data analytics:
11
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
Widespread adoption
In database mining,
Data warehouse implemented, of appliance for multiple
Technology Simple historical BI and limited usage of parallel
& Tools reporting and dashboards broad usage of BI tools, limited processing and analytical workloads. Architecture and
analytical data marts governance for emerging
appliance technologies
Data Initial data warehouse model Data definitions and models Clear master data
Little or none (Skunk works)
Governance and architecture standardised management strategy
% of Customers
(IDC Estimates) 20% 65% 10% 5%
In terms of capturing and developing the right policies and guidelines around master data
skills in the era of Big Data analytics, the creation management, data quality and data models
of a Business Analytics Competency Centre that
Ensure IT/Business alignment by involving the
sits across the business and IT departments will
critical stakeholders at the right time
be critical. IDC believes that this type of structure
not only clarifies the roles and responsibilities of Involve the CIO as the supporter of the
key stakeholders for this transformation, it also necessary transformation from an IT
drives internal visibility, provides a mechanism for perspective that will in turn create the
education as well as bridging the IT/business gap necessary business impact
(and the marketing and sales teams in particular
– as key individuals from these departments will Very few organisations have reached the level of
need to be represented) since improving decision maturity that can truly harness the potential that
making amongst front-office staff will be the Big Data analytics represents – and practically
primary focus of these projects. speaking, it is a major challenge to have ticked off
all the relevant boxes, but this transformation is a
In conjunction with the skills dimension, IDC necessary one in order for organisations to truly
believes that this structure should be involved in differentiate themselves in the current economic
the following areas: environment. The CIO (and the IT department)
Technology identification/deployment needs to play a critical role in this transformation.
The next section highlights some suggestions that
Business case creation and ROI justification
IDC believes should be taken into account in the
Data governance frameworks with clear context of this journey.
12
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
13
Big Data Analytics:
Future Architectures, Skills
and Roadmaps for the CIO
Conclusion
Despite the varying levels of maturity and adoption of business analytics, businesses are definitely gearing
up for the utilisation of more advanced solutions and offerings in this space. In line with this, organisations
need to plan strategically and build a robust roadmap before adopting business analytics. The new
generation of business managers is more aware of the benefits of competing on business analytics and will
be looking to drive adoption of this technology area more aggressively. Moving forward, IDC believes that a
new approach is required to proactively ‘effect’ the necessary change, with a specific focus on the following
areas:
Elevating the status of the CIO to that of one with more transformative impact on the organisation
by playing an integral role in the deployment of the enterprise analytics strategy – and ensuring that
these technologies have the expected business impact
An assessment of alternative delivery models (such as the appliance, in memory and Hadoop for Big
Data)
Capturing higher-level LOB attention and visibility as the next wave of business analytics projects are
integrated with complex event processing (CEP) and business activity monitoring (BAM) technologies
to drive a new class of projects that IDC defines as ‘decision management’
The role of the CIO is gradually becoming much more important in the boardroom and is playing a key role
in the purchase behaviour of advanced applications such as business analytics. Moreover, the CIO and the
IT department need to leverage a broader set of business analytics capabilities to create a new information
management strategy that deals with the emerging Big Data dynamic as well as delivering improved
decision-making capabilities to the business stakeholders across the organisation.
14
#AP14962U