Sunteți pe pagina 1din 8

BROUGHT TO YOU IN PARTNERSHIP WITH

CONTENTS

Working with Time


ö ö What is time series data?

ö ö Time series data: use cases across


industries

Series Data
ö ö Open source is key

ö ö Purpose-built databases are better

ö ö Time series data full stack in a nutshell

ö ö Hands-on learning

WRITTEN BY DANIELLA PONTES


PRODUCT MARKETER TEAM AT INFLUXDATA

Working with Time Series Data


According to DB-Engines, since 2016, time series has been the
fastest growing database category. This popularity is fueled by
the "sensorification" of the physical world (i.e., IoT) and the rapidly
increasing instrumentation requirements of the next generation
of software.

As we enter the era of workflow automation, machine learning and


artificial intelligence, it is time for time series data.

What is Time Series Data?


What classifies the data around you as time series depends on what
A common question that comes to mind when the subject is
it represents: a collection of measurements of the same thing over
times series data is: "What actually is it?" The more you think
time, where you will find the time axis as you graph it. Bringing it
about it, the more you see it everywhere. Time is a constituent and
inexorable part of everything that is observable. If it does not have
a timestamp, it has no place in our universe. If time is always there
as a dimension, time series means to treat every piece of recorded
data for what it is: unique. Therefore, it should not be replaced, but
accumulated, appended as the next chapter in the "secret" time
series life of the data.

Think of weather records showing that the Earth is getting warmer


and ice caps are melting beyond historic limits. Or consider
economic indicators showing how well a current administration
is doing its job, such as changes in the GDP, income and wages,
unemployment rate, inflation, etc. Other common examples include
records of the evolving status of a patient under treatment, or tweets
per hour for a specific hashtag, e.g. #royalbaby. The reality is that
time series data is everywhere.

1
Time series is everywhere
and is an integral part of a
modern and sensorized
environment.
Open source to leverage collaboration and keep
freedom of choice and purpose-built for the required
leap in speed and efficiency

For more information, visit:


influxdata.com
LOW-CODE APPLICATION DEVELOPMENT

to the tech world, time series are measurements that are tracked, as well as monitoring business performance indicators, is the way to
monitored, downsampled, and aggregated over time. This could be detect trends early, anticipate resource requirements, and provide
server metrics, application performance monitoring, network data, means to resolve issues in a timely manner.
sensor data, events, clicks, trades in a market, and many other types
of analytics data. Here are some case studies that illustrate specific scenarios where
operational and business challenges can be addressed with the
Time series: adoption of a time series platform solution.

SaaS companies looking for ensuring customer success find in time


series a good strategy to achieve the ultimate goal of 100% uptime.
They invest in DevOps monitoring solutions that provide visibility into
performance, detecting early trends that could negatively impact the
service. Monitoring is offered to the whole organization as a tool to
Time Series Example: Stock Ticker Prices Over Time assist teams (not just operations) to collect metrics that are relevant
to them. Planning for scale, which includes not only being able to
Also time series: collect high volumes of data, but also automate the data life cycle
while avoiding being locked in disparate systems is fundamental for a
comprehensive approach.

E-commerce companies also rely on time series to keep user


experience at its best. Ops monitoring teams work around the
clock, 24x7, assuring the health of IT infrastructure and storefront
applications. They integrate application monitoring as part of the
application development process, so they come to production under
the seal of "real user monitoring" (RUM) approval. For e-commerce
companies with multiple data centers, handling volume and having
HA becomes essential.

Companies that provide mobile services via smartphone applications


are also benefiting from time series monitoring. Many of those
companies have convenience and agility as core values. And
So, what do time series give you besides a gigantic collection of
data consuming your resources? If you can handle time series data although the service sometimes relies on a single smartphone

properly, you can have application environments with close to app, the infrastructure and backend part of the application could

zero-downtime, more and faster asset and resource consumption be massive, with many components and microservices. As being

predictions using machine learning, workflow automation integrated fast, always ready, and at reach are all part of mobile services, they

with business KPIs, and ultimately, better customer experience must detect problems before their users experience them. Once a
leading to happier users. user tries another service that works, this other service becomes the
preferred choice. So, it is paramount to monitor metrics from systems,

Time Series Data: Use Cases Across Industries applications, and KPI at high sampling rates to catch anything that
Time series data personas are those responsible for maintaining a goes wrong practically as it happens. Certainly, the volume of data
sustainable functional and evolving environment without operational collected at rates of millions of reads per second will be high, but as
downtime, inefficiencies and performance issues, while providing data gets cold, it could be aggregated for trending analysis. Therefore,
business units with means to monitor their own performance in beyond high volume and HA, the data life cycle is also fundamental
order to stay on track of their goals. By doing so, time series personas for companies using time series monitoring to keep ahead of
provide the pillars for growth and success. competition in the mobile applications market.

Time series projects usually start at DevOps and NetOps. They are Cloud service providers also turn to multi-tenant metric-driven
the ones that must keep up with the speed of growth and complexity visibility to provide their customers with ways to monitor their
of the application environments, while coping with increasing applications environment, regardless if it runs on Kubernetes clusters,
intolerance to downtime or low performance. Monitoring production containers, virtual machines, bare metal, or a combination of these
and pre-production as part of the application development lifecycle, options — which is very probable. Regardless of the diversity of

3 BROUGHT TO YOU IN PARTNERSHIP WITH


LOW-CODE APPLICATION DEVELOPMENT

environments, everything needs to be ready and available from one Although time must be a central point in the overall platform
data source. Collecting all metrics in one time series platform that can architecture design of the database, just being able to query on
support multi-tenancy and HA is a necessary foundation for scalable time doesn't cover all the requirements of an effective and efficient
cloud hosting services. solution. To achieve the envisioned scalability and usability paradigm
of a very large and granular data set, it is necessary to devise a
In the time of time series, professionals are getting insights on how combined strategy in data model and storage engine design. Add
change happens over time and ways to predict their course in the to that a query language that eases selection of series, continuous
future while monitoring the present. But in order to leverage the queries, and transformation of queried data, and you have a complete
benefits of time series to its full extent, it is necessary to be able to time series platform, such as InfluxData's TICK Stack.
collect, store, and analyze data in real-time and at scale.
Time Series Data Full Stack in a Nutshell
Open Source is Key InfluxData is an open source, purpose-built full-stack time series
Open source means, among other things, free to use, but, most platform for managing time series data. We'll begin by explaining
importantly, it means that ideas and information are shared openly each component that comprises the platform before moving into
and the community is encouraged to collaborate transparently. Every examples of how it can be used with time series data.
breakthrough is celebrated by all and takes everyone one step ahead,
driving innovation at a much faster pace. Because of the many eyes The functional architecture consists of four components — Telegraf,
and brains continuously testing and applying it in different use cases, InfluxDB, Chronograf, and Kapacitor (TICK Stack). These components
open source is more reliable and secure. are built as an end-to-end solution for time series data use cases.
Telegraf is built for data collection; InfluxDB is the database and
In complex environments, open source provides the necessary
storage tier; Chronograf is for visualization; and Kapacitor is the rules
freedom to adapt. With more shared code, APIs, plugins, and custom
engine for processing, monitoring, and alerting. InfluxDB is a key
scripts, the community helps you to implement solutions that fit
component in the time series data platform architecture because it
with your legacy applications and future choices, giving you more
is designed to take the peculiar characteristics of time series data
bandwidth to concentrate on delivering the business logic for your
as insights to better handle it, in contrast to legacy storage engines
specific use cases.
and databases designs that only try to work around them, leading to

The power of the open source community to drive innovation is limitations and performance issues.

unsurpassed by any proprietary software solutions because it is not


about license, it is about collaboration that makes the whole greater
than the sum of its parts, and transparency, where nothing is hidden,
giving you a chance to make educated decisions. Open source also
keeps your options open by avoiding vendor lock-in.

Purpose-Built Databases are Better


Time series data platforms pose intrinsic challenges in scalability, high
availability, and usability when professionals try to use legacy database
and query engine models. Addressing these challenges properly led
to the development of time series databases. Those embarking on
time series data projects face a major decision: try to adapt an existing
relational database to manage times series data, or adopt time series
Getting Started with InfluxDB
databases with purpose-built storage and query engines? INFLUXDB DATA MODEL DESIGN
Efficiency and effectiveness have to start in the data structure,
Purpose-built time series database engines reach benchmarks in ensuring that time-stamped values are collected with the necessary
the order of hundreds of millions of time series, millions of writes precision and metadata to provide flexibility and speed in graphing,
per second, and over thousands of queries per second. Time series querying, and alerting. The InfluxDB data model has a flexible schema
databases have a few properties that make them very different that accommodates the needs of diverse time series use cases. The
than other data stores: automated data lifecycle management, data format takes the following form:
summarization, continuous queries, and large range scans of many
records. There is still, of course, the obvious differentiation of having <measurement name>,<tag set> <field set> <timestamp>
a time dimension present in everything done with the data: collecting,
storing, replicating, querying, transforming, and evicting. The measurement name is a string, the tag set is a collection of key/

4 BROUGHT TO YOU IN PARTNERSHIP WITH


LOW-CODE APPLICATION DEVELOPMENT

value pairs where all values are strings, and the field set is a collection management is difficult for application developers to implement on top
of key/value pairs where the values can be int64, float64, bool, or of regular databases, but it is fundamental for time series.
string. There are no hard limits on the number of tags and fields.
Hands-on Learning
Being able to have multiple fields and multiple tags under the same The easiest way to get a feel for what time series data can do for you is
measurement optimizes the transmission of the data, avoiding to try it. Since the TICK Stack is open source, you can freely download
multiple retransmissions which can render network protocols bloated and install Telegraf, InfluxDB, Chronograf, and Kapacitor. Let's start
when transmitting data with shared tag sets. This design choice is with InfluxDB, feed some data, and build some queries.
also particularly important for IoT use cases where the agent stored
With InfluxDB installed, it is time to interact with the database. Let's
on the monitored remote devices sending the metrics has to be
use the command line interface (influx) to write data manually, query
energy-efficient for longer lifespan.
that data interactively, and view query output in different formats.
Furthermore, support for multiple types of data encoding beyond To access the CLI, first launch the influxd database process and then
float64 values means that metadata can be collected along with the launch influx in your terminal. Once you've connected to an InfluxDB
time series, and not limited to monitor only numeric values. Precision node, you'll see the following output:
is another parameter that must be taken into account when defining
the data model. Timestamps in InfluxDB can be second, millisecond, $influx

microsecond, or nanosecond precision. Connected to


[[http://localhost:8086]{.underline}](http://
The measurement name and tag sets are kept in an inverted index localhost:8086) version
which make lookups for specific series very fast. 1.7x

See below CPU metrics mapped in InfluxDB line-protocol:


InfluxDB shell version 1.7.x
cpu,host=serverA,region=uswest
idle=23,user=42,system=12 1464623548s Note that the version of InfluxDB and CLI must be identical.

INFLUXDB TIME SERIES DATABASE SPECIALTIES Creating a Database in InfluxDB


A good strategy when adopting a platform for time series should take A fresh install of InfluxDB has no databases (apart from the system

into account the management of all types of time series data — not _internal ). You can create a database with the CREATE DATABASE

only metrics (numeric values regularly collected or pulled), but also <db-name> InfluxQL statement, where <db-name> is the name of the

events (pushed at irregular time intervals) like faults, peaks, and database you wish to create. Let's create a database called mydb.

human-triggered ones, such as clicks.


> CREATE DATABASE mydb
Since we are talking about a constant influx of granular data from
To see if the database was created, use the following statement:
a number of data sources, a performant database solution for time
series has to handle high write rates with fast queries at scale, and > SHOW DATABASES
that is where most other types of databases stumble when used for name: databases
time series. In order to reach the bar raised raised by time series, Name
InfluxDB utilizes an architecture design with high compression, super- ----
fast storage and query engines, and a purpose-built stack. _internal
mydb
InfluxDB uses an append-only file for new data arrival in order to
make new points quickly ingested and durable; a columnar on-disk Now, it is time to populate the mydb database.
storage for efficient queries and aggregations over time; a time-bound
file structure that facilitates management of data in shards' sizes; a Writing and Querying the Database
reverse index mapping measurements to tags, fields, and series for InfluxDB is populated with points from popular clients via HTTP /

quick access to targeted data; and data compaction and compression write Endpoint POST or via the CLI. Datapoints can be inserted

for read optimization and volume control. individually or in batches.

InfluxDB was also designed to provide easy and automated data Points consist of a timestamp, a measurement ("cpu_load", for
lifecycle management: retention policies, sharding, replication, and example), at least one key-value field (the measured value itself, e.g.
rollup capabilities are built-in. In time series it's common to keep high- "value=0.64", or "temperature=21.2"), and zero to many key-value
precision data around for a short period of time, and then aggregate tags containing any metadata about the value (e.g. "host=server01",
and downsample into longer-term trend data. This kind of data lifecycle "region=EMEA", "dc=Frankfurt").

5 BROUGHT TO YOU IN PARTNERSHIP WITH


LOW-CODE APPLICATION DEVELOPMENT

Conceptually, you can think of a measurement as an SQL table, where InfluxDB is defined as the desired output.
the primary index is always time. Tags and fields are effectively
telegraf -sample-config -input-filter cpu:mem
columns in the table. Tags are indexed, and fields are not.
-output-filter influxdb > telegraf.conf

Points are written to InfluxDB using line protocol:


Start the Telegraf service and direct it to the relevant configuration file:

<measurement>[,<tag-key>=<tag-value>...] <field- MACOS HOMEBREW


key>=<field-value>[,<field2-key>=<field2-
telegraf -config telegraf.conf
value>...] [unix-nano-timestamp]
LINUX (SYSVINIT AND UPSTART INSTALL ATIONS)

To insert a single time series datapoint with the measurement name Sudo services telegraf start
of cpu and tags host and region, with the measured value of 0.64
LINUX (SYSTEMD INSTALL ATIONS)
into InfluxDB using the CLI, enter INSERT followed by a point in line
protocol format: Systemctl start telegraf

> INSERT cpu,host=serverA,region=us_west You will see the following output (Note: this output below runs
value=0.64 on a Mac):

Now query for the data written: NetOps-MacBook-Air:~ Admin$ telegraf --config
telegraf.conf
> SELECT “host”, “region”, “value” FROM “cpu”
2019-01-12T18:49:48Z I! Starting Telegraf 1.8.3
Name: cpu
2019-01-12T18:49:48Z I! Loaded inputs: inputs.cpu
--------
inputs.mem
Time host region 2019-01-12T18:49:48Z I! Loaded aggregators:
value 2019-01-12T18:49:48Z I! Loaded processors:
2015-10-21T19:28:07.5806643472 serverA us_west
2019-01-12T18:49:48Z I! Loaded outputs: influxdb
0.64
2019-01-12T18:49:48Z I! Tags enabled: host=NetOps-

Great! You successfully installed InfluxDB and can write and query MacBook-Air.local
the data. 2019-01-12T18:49:48Z I! Agent Config:
Interval:10s, Quiet:false, Hostname:”NetOps-
Next step is to collect data via Telegraf and send it to InfluxDB.
MacBook-Air.local”, Flush Internal:10s

Data Collection With Telegraf


Once Telegraf is up and running, it will start collecting data and
Telegraf is a plugin-driven server agent for collecting and reporting
writing it to the desired output.
metrics. Telegraf has plugins to pull metrics from third-party APIs, or
to listen for metrics via StatsD and Kafka consumer services. It also Returning to our sample configuration, we show what the cpu and
has output plugins to send metrics to a variety of datastores. mem data look like in InfluxDB below. Note that we used the default
input and output configuration settings to get this data.
Install Telegraf from InfluxData download page: portal.influxdata.
com/downloads List all measurements in the Telegraf database:

Before starting the Telegraf server, you need to edit and/or create > SHOW MEASUREMENTS
an initial configuration that specifies your desired inputs (where the Name: measurements
metrics come from) and outputs (where the metrics go). Telegraf ------------------
can collect data from the system it is running on. That is just what is Name
needed to start getting familiar with Telegraf. Cpu
mem
The example below shows how to create a configuration file called
telegraf.conf with two inputs: List all field keys by measurement:

1. One input reads metrics about the system's cpu usage (cpu). > SHOW FIELD KEYS
name: cpu
2. Another input reads metrics about the system's memory ---------
usage (mem). fieldKey fieldType

6 BROUGHT TO YOU IN PARTNERSHIP WITH


LOW-CODE APPLICATION DEVELOPMENT

usage_guest float • MacOS: tar zxvf chronograf-1.6.2_darwin_amd64.tar.gz


usage_guest_nice float
• Ubuntu & Debian: sudo dpkg -i chronograf_1.6.2_amd64.deb
usage_idle float
usage_iowait float • RedHat and CentOS: sudo yum localinstall > chronograf-
usage_irq float 1.6.2.x86_64.rpm
usage_nice float
usage_softirq float 3. Start Chronograf:
usage_steal float
• MacOS: tar zxvf chronograf-1.6.2_darwin_amd64.tar.gz
usage_system float
usage_user float • Ubuntu & Debian: sudo dpkg -i chronograf_1.6.2_amd64.deb

name: mem
4. Connect Chronograf to your InfluxDB instance or InfluxDB
--------- Enterprise cluster:
fieldKey fieldType
active integer • Point your web browser to > localhost:8888.
available integer • Fill out the form with the following details:
available_percent float
−− Connection String: Enter the hostname or IP of the
buffered integer machine that InfluxDB is running on, and be sure to include
cached integer InfluxDB's default port 8086.
free integer
−− Connection Name: Enter a name for your connection string.
inactive integer
total integer −− Username and Password: These fields can remain blank
used integer unless you've enabled authorization in InfluxDB.
used_percent float −− Telegraf Database Name: Optionally, enter a name for your
Telegraf database. The default name is Telegraf.
Select a sample of the data in the field usage_idle in the measurement
cpu_usage_idle: • Click “Add Source.”

> SELECT usage_idle FROM cpu WHERE cpu = 'cpu- Pre-Canned Dashboards
total' LIMIT 5
Pre-created dashboards are delivered with Chronograf, and you just
name: cpu
--------- have to enable the respective desired Telegraf plugins. In our example,
time usage_idle we already enabled the cpu and mem plugins at the config file
2016-01-16T00:03:00Z 97.56189047261816 creation. Let's take a look (Note: this example runs on a Mac):
2016-01-16T00:03:10Z 97.76305923519121
2016-01-16T00:03:20Z 97.32533433320835 Select the Dashboard icon in the navigation bar on the left, and then
2016-01-16T00:03:30Z 95.68857785553611 select "System:”
2016-01-16T00:03:40Z 98.63715928982245

That's it! You now have the foundation for using Telegraf to collect
and write metrics to your database.

Voilà!
Data Visualization and Graphing With Chronograf
Chronograf is the administrative interface and visualization engine
for the TICK Stack. It is simple to use and includes templates and
libraries to allow you to build dashboards of your data and to create
alerting and automation rules.

The Chronograf builds are available on InfluxData's Downloads page.

1. Choose the download link for your operating system.


• Note: If your download includes a TAR package, we
recommend specifying a location for the underlying datastore,
chronograf-v1.db, outside of the directory from which you start
Chronograf. This allows you to preserve and reference your
Taking a closer look at cpu measurement, you can see that it shows
existing datastore, including configurations and dashboards,
three measured fields (cpu.idle, cpu.user and cpu.system). You can
when you download future versions.
filter each and move the measurement line to show exact values
2. Install Chronograf: and timestamp.

7 BROUGHT TO YOU IN PARTNERSHIP WITH


LOW-CODE APPLICATION DEVELOPMENT

The selected "System" App option for the example is just one of many
pre-created dashboards available in Chronograf. See the list below:

• apache • influxdb
• consul • kubernetes
• docker • memcached
• elasticsearch • mesos
• haproxy • mysql
• iis • nginx
• nsq • redis
• phpfpm • riak
• ping • system
Cpu.user metric filtered: • postgresql • varnish
• rabbitmq • win_system

Now that you are set to start exploring the world of time series data,
what is next? Learning from others experience is always a good idea.
Get more insights from case studies in various industry segments:
telecom and service providers, e-commerce, financial markets, IoT,
research, manufacturing, telemetry, and of course, the horizontal
case of DevOps and NetOps in any organization, and see how time
series monitoring generated positive results in the organizations,
from better resource management via automation and prediction to
five star customer experience.

Written by Daniella Pontes, Product Marketer at InfluxData


Daniella Pontes is part of the product marketing team in InfluxData. She started her career in telecommunications,
wireless technology, and global Internet service provisioning. As security became a major concern for enterprises, she
worked on enterprise policy management, SaaS, and data encryption solutions. Prior to joining InfluxData, she spent
some years living in Japan, Germany, and Brazil. Having worked in various market segments, from embedded smart
antenna technology to Internet security and e-commerce doing product management, partnerships, marketing, and
business development, she has a broad experience working cross-functionally and with customers and partners.

Devada, Inc.
600 Park Offices Drive
Suite 150
Research Triangle Park, NC
DZone communities deliver over 6 million pages each
month to more than 3.3 million software developers, 888.678.0399 919.678.0300
architects, and decision makers. DZone offers something for
Copyright © 2019 DZone, Inc. All rights reserved. No part of this publication
everyone, including news, tutorials, cheat sheets, research
may be reproduced, stored in a retrieval system, or transmitted, in any form
guides, feature articles, source code, and more. "DZone is a or by means electronic, mechanical, photocopying, or otherwise, without
developer’s dream," says PC Magazine. prior written permission of the publisher.

8 BROUGHT TO YOU IN PARTNERSHIP WITH

S-ar putea să vă placă și