TDWI DIS Agile Data Innovations 2016

DATA INNOVATIONS
MARCH 2016
SHOWCASE
Agile Self-Service Accelerates

Decision Making
AGILE SELF-SERVICE
ACCELERATES DECISION
MAKING
A new class of self-service tool optimized for the
caching capabilities of modern microprocessors has
in-chip technology that can radically reduce the
need for upfront data preparation and modeling.
Such accelerated analytics can help drive faster
decision making and bottom-line impact.
Traditional takes on self-service business intelligence (BI)
do little to address the single biggest problem with BI
and analytics: data engineering. By far, the bulk of the
self-service analysts or data scientists time is consumed
with engineering datai.e., identifying, profiling,
and preparing data for analysis. Because most selfservice tools do not provide built-in tools or facilities
to assist with data prep, analysts and data scientists
must undertake this on their own. More commonly,
someonenominally an IT-someonemust prepare
and provision new data sources or feeds for them and
then model that data before it can be analyzed.
Until recently, self-service products did little to address
this. Putting unprecedented do-it-yourself capabilities
into the hands of analysts and data scientistsand
exposing these capabilities in the context of a highly

intuitive visual discovery user experience (UX)was
an accomplishment decades in the making. Tackling
the first and last mile of analytics developmentdata
prepcould come later, if ever. Call it a case of kicking
the proverbial can down the road.
Enter a new class of self-service tool that is optimized
for the caching capabilities of modern microprocessors
i.e., the Level 1 (L1) through Level 3 (L3) memory
caches integrated into the silicon of commodity
microprocessors.
Proponents claim this in-chip technology radically
reduces the need for upfront data preparation and
modeling. Because of the speed were able to get from
in-chip analytics, we can minimize the back and forth
between the microprocessor and the RAM, says Jeremy
Sokolic, vice president of product with self-service
analytics specialist Sisense Inc. The idea is to persist
small sets of data in the CPU cache and to reuse those
small sets across multiple queries.
Sokolic and Sisense argue that most of the work of
preparing data for analysis involves optimizing data
models for performance and data sets for consistency.
1 TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING
In the Sisense model, he says, analysts and data

scientists use wizard-based tools to quickly prepare
their data without additional modeling. Because of the
speed of Sisenses in-chip, cache-optimized architecture,
its possible to perform analytics on the resulting data
sets up to 10 times faster. The in-chip design is fast
enough to make up for insufficiencies of the quick-anddirty data modeling.
In other words, Sokolic argues, Sisense is an end-to-end
self-service platform. It combines do-it-yourself selfservice analysis with a functional data prep automation
feature set. The IT umbilicus truly goes away. Because of
the speed of processing we can get, we dont care how
messy your data model is. Typically, if you have many
data sources, it requires a DBA to come in and figure out
the different data sources and figure out how you can
pre-join all of those tables together to make processing
fast enough. In a traditional configuration, if I want to
do a join across five or six tables, it actually takes a lot of
processing time, Sokolic says. With Sisense, if you have
to link across tables, for example, if you have logical
relationships between, say, five different tables, the
in-chip processing works so fast that the extensive data
prep and modeling can be eliminated.
The Business-Analytics Transformation

Business analytics matters now more than ever. A
company that creates a data-driven decision-making
culture has an essential advantage relative to its
competitors. On the one hand, data-driven decision
making is all about analytics breadththat is, putting
analytics insights at the fingertips of as many potential
decision-makers as possible. On the other hand, its about
the capacity to iterate quickly, easily, and sanely.
The faster the iterative cyclethe development, testing,
and (in most cases) failing of analytical prototypesthe
faster insights can be identified and refined. You fail fast
in order to find what works. The insights you discover can
be used to drive down capital expenditures and maximize
marketing spend, as well as to permit the emergence of a
performance-based culture tied to meaningful KPIs that
evolve as business conditions change.
In a data-driven culture, decision making just becomes
much more fact-based. Measuring people helps get
them focused. Theres huge value in KPIs. The challenge
is giving people a real-time way to measure, analyze,
and view them. People tend to respond very well

when they have a metric thats transparent and visible.
Analytics helps you to do that across all your business
especially when you have a tool like Sisense, where you
can mash up multiple disparate data sets, says Sokolic.
The Holy Grail of decision support, BI, and analytics is
the capacity to askand to discover answers tonew
or random questions. For the most part, traditional BI
tries to anticipate the most common questions business
people might ask. Ad hoc query is one tool for asking
not-so-common (or emergent) questions, but it usually
entails significant IT-led data prep and modeling work,
along with the pre-computation of multidimensional
cubes at different levels of granularity. Specialized BI
technologies, such as ROLAP, are able to work around
this, but these technologies are no less IT-dependent.
Self-service data visualization products purport to
address these shortcomings, but these tools tend to
work best with flat sources (such as spreadsheets) or
when extracting, blending, and/or integrating data
from a limited number of relational sources. They
flounder and founder when used in ways that mimic
the traditional data warehouse. (Trying to join data
from a half-dozen OLTP sources will break almost all
self-service data visualization tools.)
Self-service vendors are working on this. Some are
augmenting the basic data blending capabilities that
theyve long provided with automated data import and
export facilities, designed primarily for spreadsheets.
Some offer wizard-driven cross-data-source join
capabilities, which permit data to be integrated (in
accordance with a predefined structure) from multiple
data sources. Others have invested in and continue to
enhance data engineering tools that theyve either built
themselves or acquired from best-of-breed players.
Nearly all partner with best-of-breed data integration
and data preparation vendors, too.
None of these approaches is ideal, however. The
traditional, MOLAP-driven BI model sacrifices
granularity and business agility; if and when more
detail is needed, the line of business must go back to IT.
The ROLAP-driven BI model is typically less performant
than MOLAP and also depends on IT for critical
data integration (DI) and data source provisioning
capabilities. The self-service BI front-end model
cannot scale to support complex data and complex
2TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING
analyses. The self-service data prep model, unlike

the IT-led DI paradigm, isnt designed for reuse and
operationalization.
Business users typically want to bring together
multiple data sets in almost any discipline. Theyre
always thinking: Whats my 360-degree view of the
customer? How are customers using my product? How
did customers buy my product? What support network
do they have? All of this data tends to sit in disparate
systems, so in order to get a 360-degree view of the
customer, you have to be able to bring it all together in
the same place at the right time, Sokolic says.
Timeliness Is Next to Godliness

At the right time. Thats the trick. A data-driven,
decision-making culture is inescapably time-dependent.
People need relevant data when they need iti.e.,
in the context of the call center interaction with a
dissatisfied customer, or when they need to know
which of several potentially contra-indicated medicines
to prescribe for an ICU patient. Delay of any kind can
be disastrous.
As soon as you get beyond the simple one to two data
sources, or one or two tablesas soon as you start to
see larger data setswith most tools, you really need IT
support. In order to make that data analyzable, the IT
organization needs to remodel it so that the queries can
be executed with sufficiently acceptable performance.
Another alternative is that they can take an extract
or subset of the data. You trade the capacity to have
granularity for performance. In order to do that,
however, you have to as a business userbe able to
predefine the questions youre going to ask, he says.
The analyst doesnt know where theyre going to go.
They dont know where the data is going to take them.
In many tools, because of the way the data extract
is created, the tool is kind of bounded. You have to
then go back to your IT organization and say, Can
you change the data? Can you get me into the queue?
There are other sets of tools that do try to give you
all of the data, but in order to do that, they put you
through a very rigorous remodeling effort, where, at
the outset, you have to spend a lot of time setting the
data upmodeling the data, etc. when you have a
new IT project.
The upshot, Sokolic argues, is that traditional BI

and analytics technologies are inescapably slow and
reactive. They introduce too much latency to be used
as a basis for data-driven decision making. To a real
degree, this is because they depend in critical ways
on IT for support and provisioning. The traditional
MOLAP or ROLAP approach explicitly so; the selfservice approach implicitly. (Self-service requires IT
intervention once it reaches a sufficient degree of
development or complexity.)
Sokolic says Sisenses end-to-end BI and analytics stack
effectively breaks this dependence on IT. It marries a
richly visual UX with wizard-driven data prep and data
integration capabilities.
There are two primary ways to eliminate data prep:
one, dont force the use of extracts; two, dont force
the creation of star schema or of other dimensional
models. Those two things account for 90+ percent
of the work in analytics. Thats why weve built ETL
into our tool. As data comes in, you can use Excel-like
formulas if you want to change a data formatfor
example, by turning hours into days or turning weeks
into months. You can do all of that by means of
formulas, he indicates.
In the same way, Sokolic claims, Sisenses in-chip, cacheoptimized architecture permits it to preserve detail
i.e., granularitywithout sacrificing performance or
agility.
We dont aggregate the data. Our processing works
fast enough that we can give you the detail data even
when youre joining information from multiple tables
pulled from disparate sources, he asserts. Our ability to
process complex data and take advantage of the in-chip
speed of commodity hardware is a real differentiator
in terms of keeping total cost of ownership low and
helping people solve the business problems they want
to solve. In the conventional BI model, as your data gets
more complex, the length of your data prep process gets
longer and much more complicated.
If youre trying to take four or five or six or more
tables from two or three different data sources, all of
a sudden you need a DBA to do that. You need that
expertise.
Growing Data Complexity, In-Chip

Processing, and the Challenge of
Tomorrow
According to a recent IDC survey, data storage volumes
are expected to more than double every two years
between 2013 and 2020. Thats enough for a compound
annual growth rate (CAGR) of 40 percent. In some
vertical markets, such as healthcare, IDC says that data
volumes are growing at CAGRs approaching 50 percent.
Thats almost mind-boggling growth.
Now, imagine a data lake of, say, 12 PB. In seven years
time, at a 40 percent CAGR, that same lake will expand
to 126.5 PB in size. Of course, not all of the data in
that lake will be of interest to potential analysts.
Just a fraction of it, by todays standards. Heres the
thing, however: that fraction is getting bigger, too.
As data volumes grow, so, too, is the proportion
of data that could potentially be useful to business
analysts, statisticians, and data scientists. In 2013,
that proportion came to approximately 22 percent,
according to IDC; by 2020, its expected to exceed 37
percent. For comparisons sake, 37 percent of 126.5 PB is
46.8 PB. Thats a mind-boggling volume of data.
Its even more complicated, however. The data lake isnt
a single, homogenous data management environment,
but actually consists of multiple, fit-for-purpose
repositories. In other words, a data lake could host
one or more relational database-like engines (Hive,
Impala, and Presto, just to name a few), one or more
key-value or columnar data storage facilities (HBase
and Parquet, for example), and one or more file system
layers for persisting binaries, objects, and other kinds of
polystructured data. This complicates the complexity of
accessing, extracting, and preparing data for analysis.
The data-lake example is just a thought experiment.
Its a way of describing the kinds of problems that selfservice analysts are facing and will continue to face
as data volumes grow in size and complexity. Im not
sure these projections fully account for the Internet of
Things. I saw a statistic that by 2020, two-thirds of all
connected devices will be things, and the interesting
thing about things is that they can spit data out at
a much, much faster rate than can human beings.
Humans are limited in terms of how much data they can
create, but machines are not, Sokolic argues.
The key thing for an analytics tool is that it helps

people to swimnot to drownin all of this data.
A good tool makes testing and prototyping much
easier. It should eliminate that part of the problem,
he continues. Data visualization cant help you with
that. It can help you with the formulation and testing
of hypotheses, but it does little to accelerate data prep
and modeling. With Sisense, this is exactly what we do.
We have the rich visualization capabilities, but we also
accelerate data prepand we dont do so by creating
extracts or views of a subset of the data.
Conclusion
BI projects fail because BI uptake and use fails. The
single best indicator of BI success is a thriving ecosystem
of BI consumers. Okay, you ask, but why do so many BI
projects fail? Why is the history of BI littered with so
many prominent implementation failures?
Broadly speaking, people dont use BI technologies
for two reasons: first, because the tools are hard or
counter-intuitive; second, because theyre plagued by
performance problems.
Data visualization and self-service help to address
the first of these issues. They do nothing, however
to address the second. In practice, few self-service
data visualization technologies meaningfully exploit
hardware-specific optimizations to accelerate
performance.
This includes most so-called in-memory analytics
technologies. The value of in-memory as an accelerant
is determined by (a) the total amount of physical RAM
in a system, (b) the input-output (I/O) throughput
of the bus that connects the system CPUs to physical
RAM, and (c) the extent to which the analytics
software is able to exploit any of several hardwarespecific features, from in-chip parallelism and vector
processing to the L1, L2, and L3 memory caches that are
embedded into the microprocessor itself. Its possible
to run an analytics tool or even a database entirely
in system memory and to not realize any meaningful
performance benefits. Simply put, the software itself
must be architected to fully exploit the capabilities of
the underlying hardware.
Unlike almost all self-service data visualization-based
technologies, Sisense constitutes a complete, single-
stack offering, he argues. It supports analytics query

and visual data discovery, automated in-tool data
preparation, and finally, advanced (also in-tool) ETL
transformations. Its an all-in-one product.
This is the reason Sisense can be quickly deployed,
Sokolic maintains.
Another key advantage is time-to-market. In
the course of a 90-minute proof-of-concept, we
can take a customer from nothing to a working
production dashboard. If they have the appropriate
access credentials, we can create a data store, do
some visualizations on top of it, and show working
dashboards and other kinds of analytics, he says.
In some proofs-of-concept, customer prospects are even
able to mash together data from disparate data sources.
This, too, is a function of Sisenses cache-optimized
architecture, Sokolic maintains.
Being able to process data without the complexity of
modeling all of it beforehand is critical. We dont create
extracts or views, so we dont limit what elements you
can combine together in a visualization. With Sisense,
you can pull in all of your data and access any data
attribute in any visualization. As youre doing that kind
of ad hoc analysis and thinking, Let me try that, maybe
I need to slice the data by person, or by region, or by
product, you never have to go back to the IT person
and say, I need to remodel this or I need you to give
me a different extract, he explains.
This creates real agility for the analyst because youre
just free to roam about a data set at any depth of
detail you want. The other part of agility is that as
your business changes, your data changes, too. You
get new data sources. Older data sources go away, he
notes. We have a better way. Access that new data
source with your credentials, drop it into the messy data
modelit doesnt matter how messy it islink it up
logically, and youre off and running. Theres the agility
of being able to answer granular questions and theres
the agility of adding new data sources.
sisense.com
tdwi.org
Sisense simplifies business analytics for complex data.

Powered by its unique In-Chip and Single Stack
technologies, Sisense delivers unmatched performance,
agility, and value, eliminating much of the costly data
preparation traditionally needed with business analytics
tools and providing a single, complete tool to analyze
and visualize large, disparate data sets without IT
resources. With more than one thousand customers
in over 50 countries, including global brands like
Target and Samsung, Sisense was recently designated
a hot company to watch by CIO, CRN and Information
Management and recognized as one of the 10 Most
Innovative IT Ventures at Under the Radar. Its solution
won the Audience Choice award at the OReilly Strata
conference and its CTO won the World Technology
Award for the invention of In-Chip analytics.
TDWI is your source for in-depth education and

research on all things data. For 20 years, TDWI has been
helping data professionals get smarter so the companies
they work for can innovate and grow faster.
Try Sisense for free

Product page
TDWI provides individuals and teams with a

comprehensive portfolio of business and technical
education and research to acquire the knowledge and
skills they need, when and where they need them. The
in-depth, best-practices-based information TDWI offers
can be quickly applied to develop world-class talent
across your organizations business and IT functions to
enhance analytical, data-driven decision making and
performance.
TDWI advances the art and science of realizing business
value from data by providing an objective forum where
industry experts, solution providers, and practitioners
can explore and enhance data competencies, practices,
and technologies.
TDWI offers five major conferences, topical seminars,
onsite education, a worldwide membership program,
business intelligence certification, live Webinars,
resourceful publications, industry news, an in-depth
research program, and a comprehensive website:
tdwi.org.
2016 by TDWI, a division of 1105 Media, Inc. All rights reserved.

Reproductions in whole or in part are prohibited except by written permission.
E-mail requests or feedback to info@tdwi.org.
Product and company names mentioned herein may be trademarks and/or
registered trademarks of their respective companies.

TDWI DIS Agile Data Innovations 2016

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

TDWI DIS Agile Data Innovations 2016

Încărcat de

Drepturi de autor:

Formate disponibile

DATA INNOVATIONS

Agile Self-Service Accelerates

exposing these capabilities in the context of a highly

1 TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING

In the Sisense model, he says, analysts and data

The Business-Analytics Transformation

and view them. People tend to respond very well

2TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING

analyses. The self-service data prep model, unlike

Timeliness Is Next to Godliness

The upshot, Sokolic argues, is that traditional BI

3TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING

Growing Data Complexity, In-Chip

The key thing for an analytics tool is that it helps

4TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING

stack offering, he argues. It supports analytics query

5TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING

Sisense simplifies business analytics for complex data.

TDWI is your source for in-depth education and

Try Sisense for free

TDWI provides individuals and teams with a

2016 by TDWI, a division of 1105 Media, Inc. All rights reserved.

6TDWI DATA INNOVATIONS SHOWCASE AGILE SELF-SERVICE ACCELERATES DECISION MAKING

S-ar putea să vă placă și