Sunteți pe pagina 1din 4

Infobright Approximate Query (IAQ):

Statistics-driven Analytics for


Big Data
Overview

Companies store and use data with expectations of profitable insight; otherwise, data is useless. As data
volumes and concurrency requirements grow, those same companies turn to massive scale-out architectures
to shoulder the burden. In many cases, large clusters of Hadoop, Teradata, and Vertica (among others) can
process this information; however, there are many implications to maintain and add on to these state-of-the-
art technologies. Over time, these ventures become oversaturated and ultimately, companies are forced to
re-evaluate their data lake architecture and incorporate stricter data retention, user access, and query
complexity constraints.

This whitepaper discusses how Infobright Approximate Query (IAQ) solves this problem by complementing
existing data lakes to deliver insight with efficiency and performance while providing direct access to your
data.

Changing the Paradigm of Data Analysis

Using strong mathematical foundations, IAQ changes the paradigm on the fundamental question about data:
how do we gain insight and make decisions at optimum scale? While the data industry has focused their
energy on brute-force methodologies and using hardware to tackle massive data sets, these approaches only
prove to extract exact answers at the cost of time and resources. More often than not, exact answers are not
a requirement for timely business decisions. Motivated by innovating a new approach, Infobright’s team of
PhD data scientists threw out standard database design theory and applied core mathematical concepts to
redefine complex query algorithms. IAQ utilizes patented statistical models which reduce the overall size of
the resulting data volumes. This reduction lowers the cost of infrastructure while improving query time.
Users gain rapid access and insight into multi-petabyte environments in a fraction of the time and cost of
existing options.

Rather than evaluating each data point on query execution, IAQ intelligently captures knowledge from each
data point during data ingestion. This knowledge then serves as the foundation for any pending query. As
the knowledge captured is of much smaller size than the full data set itself, queries return quickly with
analogous accuracy of traditional approaches. Even in comparison with other data sets or tables, IAQ
understands the data and can return similar result sets as to traditional enterprise technologies. With the
ability to compare two disparate data sets, IAQ outperforms any sampling technique.

Chart A: IAQ generates highly accurate insight that aligns with traditional approaches, while utilizing 95-98%
less hardware and executing at 100-1000x faster speeds.
Chart B: IAQ generates the same top ten IMSI with flagged events with the same order and similar height.

IAQ consumes data directly from flat files or from ingest agents. These ingest agents pull data from specific
data storage technologies, evaluate the data, and create knowledge. With this method of knowledge creation,
IAQ uses only 2-5% disk space as compared to the original data set. Given the size and depth of knowledge,
IAQ returns queries at 100-1000x the speed of traditional technologies with comparable hardware. When
coupled with a large cluster environment, IAQ would handle the vast majority of trending and investigative
queries, while alleviating the existing infrastructure to resolve exact queries with greater resource availability.

IAQ also natively supports massively parallel ingest agents for Hadoop. Additional capabilities will be added.

Exploiting the Power of Statistics

Similar to columnar technologies, IAQ splits raw data into vertical segments. After many rows, IAQ cuts these
vertical segments into “data packs”. IAQ evaluates these packs and creates statistics. Using
comprehensive, intelligent pattern recognition algorithms, IAQ determines the ‘best fit’ statistical model for
that given pack. Depending on spread, outliers, frequency, and other factors, IAQ creates single and multi-
column histograms that represent the originating data set. As needed, IAQ will store outliers and other
information to assist in providing the most accurate representation of data. When combined, these packs
form an enhanced statistics layer that models the original data set.

When a user issues a query, the engine optimizer evaluates the enhanced statistics layer and applies
probability rules to generate a result set. As the query moves from phase to phase, IAQ transforms these
statistics to each corresponding result. At the conclusion of the query, the final set of histograms are used to
generate a result set. This process uniquely leverages the histograms to decide how a result will be generated.
As the complexity of the query increases, IAQ continues to deliver highly accurate results. The efficiency of
the IAQ engine is targeted to be better than 15% sample, so overall accuracy is much higher than traditional
techniques.
IAQ in Action
Using IAQ

IAQ was purpose-built to co-exist with massive scale-out architectures as well Security & Network
as replace traditional architectures for data profiling and trend analysis. Use Intrusion Detection
cases can range from customer experience profiling for telecommunication
providers, airlines, and pharmaceutical drugs, to detection for security Given the sizable volumes of
analysis in network intrusion and impression bids in the adtech industry. data in each investigation,
each query can be expected to
execute for multiple hours.
As the networks engage in
The IAQ Advantage interactive data mining
sessions looking for
IAQ is purpose-built to provide rapid insight with very little infrastructure for correlations, IAQ can drop
those with multi-terabytes to petabytes of data. By complementing Hadoop, overall time to resolution to
Vertica, Teradata or other scale-out architecture, IAQ will alleviate minutes, allowing networks to
considerable burden off existing clusters and deliver high value answers for quickly remediate bad actors
complex queries. IAQ can perform 100 to 1000 times faster than the time to looking to harm the stability
generate exact queries with costly solutions. of the network.

To discuss your use case and learn more about IAQ, contact us to speak with Impression Bidding in
our Specialists and join our beta program. AdTech

The sheer volume of data on


impression bids can grow
dramatically over a span of
hours. When attempting to
merge bid data with client
data to generate a target
About Infobright Inc. profile, much of the detailed
auction data is discarded in
Infobright delivers a high performance analytic database platform that serves
order to achieve fast (enough)
as a key underlying infrastructure for The Internet of Things. Specifically
focused on enabling the rapid analysis of machine-generated data, Infobright results. More powerful than
powers applications to perform interactive, complex queries resulting in sampling, IAQ will join
better, faster business decisions enabling companies to decrease costs, massive data sets to generate
increase revenue and improve market share. Infobright’s platform is used by result sets at scale.
market-leading companies such as Dell, Mitel, Exfo, Bango, Viavi and Polystar.
For more information on Infobright’s customers and solutions please visit
www.infobright.com and follow us on Twitter @Infobright.

© 2016 Infobright Inc. All Rights Reserved.

S-ar putea să vă placă și