Documente Academic
Documente Profesional
Documente Cultură
CONTENTS
Series Data
ö ö Open source is key
ö ö Hands-on learning
1
Time series is everywhere
and is an integral part of a
modern and sensorized
environment.
Open source to leverage collaboration and keep
freedom of choice and purpose-built for the required
leap in speed and efficiency
to the tech world, time series are measurements that are tracked, as well as monitoring business performance indicators, is the way to
monitored, downsampled, and aggregated over time. This could be detect trends early, anticipate resource requirements, and provide
server metrics, application performance monitoring, network data, means to resolve issues in a timely manner.
sensor data, events, clicks, trades in a market, and many other types
of analytics data. Here are some case studies that illustrate specific scenarios where
operational and business challenges can be addressed with the
Time series: adoption of a time series platform solution.
properly, you can have application environments with close to app, the infrastructure and backend part of the application could
zero-downtime, more and faster asset and resource consumption be massive, with many components and microservices. As being
predictions using machine learning, workflow automation integrated fast, always ready, and at reach are all part of mobile services, they
with business KPIs, and ultimately, better customer experience must detect problems before their users experience them. Once a
leading to happier users. user tries another service that works, this other service becomes the
preferred choice. So, it is paramount to monitor metrics from systems,
Time Series Data: Use Cases Across Industries applications, and KPI at high sampling rates to catch anything that
Time series data personas are those responsible for maintaining a goes wrong practically as it happens. Certainly, the volume of data
sustainable functional and evolving environment without operational collected at rates of millions of reads per second will be high, but as
downtime, inefficiencies and performance issues, while providing data gets cold, it could be aggregated for trending analysis. Therefore,
business units with means to monitor their own performance in beyond high volume and HA, the data life cycle is also fundamental
order to stay on track of their goals. By doing so, time series personas for companies using time series monitoring to keep ahead of
provide the pillars for growth and success. competition in the mobile applications market.
Time series projects usually start at DevOps and NetOps. They are Cloud service providers also turn to multi-tenant metric-driven
the ones that must keep up with the speed of growth and complexity visibility to provide their customers with ways to monitor their
of the application environments, while coping with increasing applications environment, regardless if it runs on Kubernetes clusters,
intolerance to downtime or low performance. Monitoring production containers, virtual machines, bare metal, or a combination of these
and pre-production as part of the application development lifecycle, options — which is very probable. Regardless of the diversity of
environments, everything needs to be ready and available from one Although time must be a central point in the overall platform
data source. Collecting all metrics in one time series platform that can architecture design of the database, just being able to query on
support multi-tenancy and HA is a necessary foundation for scalable time doesn't cover all the requirements of an effective and efficient
cloud hosting services. solution. To achieve the envisioned scalability and usability paradigm
of a very large and granular data set, it is necessary to devise a
In the time of time series, professionals are getting insights on how combined strategy in data model and storage engine design. Add
change happens over time and ways to predict their course in the to that a query language that eases selection of series, continuous
future while monitoring the present. But in order to leverage the queries, and transformation of queried data, and you have a complete
benefits of time series to its full extent, it is necessary to be able to time series platform, such as InfluxData's TICK Stack.
collect, store, and analyze data in real-time and at scale.
Time Series Data Full Stack in a Nutshell
Open Source is Key InfluxData is an open source, purpose-built full-stack time series
Open source means, among other things, free to use, but, most platform for managing time series data. We'll begin by explaining
importantly, it means that ideas and information are shared openly each component that comprises the platform before moving into
and the community is encouraged to collaborate transparently. Every examples of how it can be used with time series data.
breakthrough is celebrated by all and takes everyone one step ahead,
driving innovation at a much faster pace. Because of the many eyes The functional architecture consists of four components — Telegraf,
and brains continuously testing and applying it in different use cases, InfluxDB, Chronograf, and Kapacitor (TICK Stack). These components
open source is more reliable and secure. are built as an end-to-end solution for time series data use cases.
Telegraf is built for data collection; InfluxDB is the database and
In complex environments, open source provides the necessary
storage tier; Chronograf is for visualization; and Kapacitor is the rules
freedom to adapt. With more shared code, APIs, plugins, and custom
engine for processing, monitoring, and alerting. InfluxDB is a key
scripts, the community helps you to implement solutions that fit
component in the time series data platform architecture because it
with your legacy applications and future choices, giving you more
is designed to take the peculiar characteristics of time series data
bandwidth to concentrate on delivering the business logic for your
as insights to better handle it, in contrast to legacy storage engines
specific use cases.
and databases designs that only try to work around them, leading to
The power of the open source community to drive innovation is limitations and performance issues.
value pairs where all values are strings, and the field set is a collection management is difficult for application developers to implement on top
of key/value pairs where the values can be int64, float64, bool, or of regular databases, but it is fundamental for time series.
string. There are no hard limits on the number of tags and fields.
Hands-on Learning
Being able to have multiple fields and multiple tags under the same The easiest way to get a feel for what time series data can do for you is
measurement optimizes the transmission of the data, avoiding to try it. Since the TICK Stack is open source, you can freely download
multiple retransmissions which can render network protocols bloated and install Telegraf, InfluxDB, Chronograf, and Kapacitor. Let's start
when transmitting data with shared tag sets. This design choice is with InfluxDB, feed some data, and build some queries.
also particularly important for IoT use cases where the agent stored
With InfluxDB installed, it is time to interact with the database. Let's
on the monitored remote devices sending the metrics has to be
use the command line interface (influx) to write data manually, query
energy-efficient for longer lifespan.
that data interactively, and view query output in different formats.
Furthermore, support for multiple types of data encoding beyond To access the CLI, first launch the influxd database process and then
float64 values means that metadata can be collected along with the launch influx in your terminal. Once you've connected to an InfluxDB
time series, and not limited to monitor only numeric values. Precision node, you'll see the following output:
is another parameter that must be taken into account when defining
the data model. Timestamps in InfluxDB can be second, millisecond, $influx
into account the management of all types of time series data — not _internal ). You can create a database with the CREATE DATABASE
only metrics (numeric values regularly collected or pulled), but also <db-name> InfluxQL statement, where <db-name> is the name of the
events (pushed at irregular time intervals) like faults, peaks, and database you wish to create. Let's create a database called mydb.
quick access to targeted data; and data compaction and compression write Endpoint POST or via the CLI. Datapoints can be inserted
InfluxDB was also designed to provide easy and automated data Points consist of a timestamp, a measurement ("cpu_load", for
lifecycle management: retention policies, sharding, replication, and example), at least one key-value field (the measured value itself, e.g.
rollup capabilities are built-in. In time series it's common to keep high- "value=0.64", or "temperature=21.2"), and zero to many key-value
precision data around for a short period of time, and then aggregate tags containing any metadata about the value (e.g. "host=server01",
and downsample into longer-term trend data. This kind of data lifecycle "region=EMEA", "dc=Frankfurt").
Conceptually, you can think of a measurement as an SQL table, where InfluxDB is defined as the desired output.
the primary index is always time. Tags and fields are effectively
telegraf -sample-config -input-filter cpu:mem
columns in the table. Tags are indexed, and fields are not.
-output-filter influxdb > telegraf.conf
To insert a single time series datapoint with the measurement name Sudo services telegraf start
of cpu and tags host and region, with the measured value of 0.64
LINUX (SYSTEMD INSTALL ATIONS)
into InfluxDB using the CLI, enter INSERT followed by a point in line
protocol format: Systemctl start telegraf
> INSERT cpu,host=serverA,region=us_west You will see the following output (Note: this output below runs
value=0.64 on a Mac):
Now query for the data written: NetOps-MacBook-Air:~ Admin$ telegraf --config
telegraf.conf
> SELECT “host”, “region”, “value” FROM “cpu”
2019-01-12T18:49:48Z I! Starting Telegraf 1.8.3
Name: cpu
2019-01-12T18:49:48Z I! Loaded inputs: inputs.cpu
--------
inputs.mem
Time host region 2019-01-12T18:49:48Z I! Loaded aggregators:
value 2019-01-12T18:49:48Z I! Loaded processors:
2015-10-21T19:28:07.5806643472 serverA us_west
2019-01-12T18:49:48Z I! Loaded outputs: influxdb
0.64
2019-01-12T18:49:48Z I! Tags enabled: host=NetOps-
Great! You successfully installed InfluxDB and can write and query MacBook-Air.local
the data. 2019-01-12T18:49:48Z I! Agent Config:
Interval:10s, Quiet:false, Hostname:”NetOps-
Next step is to collect data via Telegraf and send it to InfluxDB.
MacBook-Air.local”, Flush Internal:10s
Before starting the Telegraf server, you need to edit and/or create > SHOW MEASUREMENTS
an initial configuration that specifies your desired inputs (where the Name: measurements
metrics come from) and outputs (where the metrics go). Telegraf ------------------
can collect data from the system it is running on. That is just what is Name
needed to start getting familiar with Telegraf. Cpu
mem
The example below shows how to create a configuration file called
telegraf.conf with two inputs: List all field keys by measurement:
1. One input reads metrics about the system's cpu usage (cpu). > SHOW FIELD KEYS
name: cpu
2. Another input reads metrics about the system's memory ---------
usage (mem). fieldKey fieldType
name: mem
4. Connect Chronograf to your InfluxDB instance or InfluxDB
--------- Enterprise cluster:
fieldKey fieldType
active integer • Point your web browser to > localhost:8888.
available integer • Fill out the form with the following details:
available_percent float
−− Connection String: Enter the hostname or IP of the
buffered integer machine that InfluxDB is running on, and be sure to include
cached integer InfluxDB's default port 8086.
free integer
−− Connection Name: Enter a name for your connection string.
inactive integer
total integer −− Username and Password: These fields can remain blank
used integer unless you've enabled authorization in InfluxDB.
used_percent float −− Telegraf Database Name: Optionally, enter a name for your
Telegraf database. The default name is Telegraf.
Select a sample of the data in the field usage_idle in the measurement
cpu_usage_idle: • Click “Add Source.”
> SELECT usage_idle FROM cpu WHERE cpu = 'cpu- Pre-Canned Dashboards
total' LIMIT 5
Pre-created dashboards are delivered with Chronograf, and you just
name: cpu
--------- have to enable the respective desired Telegraf plugins. In our example,
time usage_idle we already enabled the cpu and mem plugins at the config file
2016-01-16T00:03:00Z 97.56189047261816 creation. Let's take a look (Note: this example runs on a Mac):
2016-01-16T00:03:10Z 97.76305923519121
2016-01-16T00:03:20Z 97.32533433320835 Select the Dashboard icon in the navigation bar on the left, and then
2016-01-16T00:03:30Z 95.68857785553611 select "System:”
2016-01-16T00:03:40Z 98.63715928982245
That's it! You now have the foundation for using Telegraf to collect
and write metrics to your database.
Voilà!
Data Visualization and Graphing With Chronograf
Chronograf is the administrative interface and visualization engine
for the TICK Stack. It is simple to use and includes templates and
libraries to allow you to build dashboards of your data and to create
alerting and automation rules.
The selected "System" App option for the example is just one of many
pre-created dashboards available in Chronograf. See the list below:
• apache • influxdb
• consul • kubernetes
• docker • memcached
• elasticsearch • mesos
• haproxy • mysql
• iis • nginx
• nsq • redis
• phpfpm • riak
• ping • system
Cpu.user metric filtered: • postgresql • varnish
• rabbitmq • win_system
Now that you are set to start exploring the world of time series data,
what is next? Learning from others experience is always a good idea.
Get more insights from case studies in various industry segments:
telecom and service providers, e-commerce, financial markets, IoT,
research, manufacturing, telemetry, and of course, the horizontal
case of DevOps and NetOps in any organization, and see how time
series monitoring generated positive results in the organizations,
from better resource management via automation and prediction to
five star customer experience.
Devada, Inc.
600 Park Offices Drive
Suite 150
Research Triangle Park, NC
DZone communities deliver over 6 million pages each
month to more than 3.3 million software developers, 888.678.0399 919.678.0300
architects, and decision makers. DZone offers something for
Copyright © 2019 DZone, Inc. All rights reserved. No part of this publication
everyone, including news, tutorials, cheat sheets, research
may be reproduced, stored in a retrieval system, or transmitted, in any form
guides, feature articles, source code, and more. "DZone is a or by means electronic, mechanical, photocopying, or otherwise, without
developer’s dream," says PC Magazine. prior written permission of the publisher.