Sunteți pe pagina 1din 86

End-to-end Monitoring

with the Prometheus Operator

By @mxinden
Max Inden
Test-Engineer at CoreOS

@mxinden
Max.Inden@CoreOS.com
Secure, simplify and automate container
infrastructure
Secure, simplify and automate container
infrastructure
Secure, simplify and automate container
infrastructure
Secure, simplify and automate container
infrastructure
Why Monitoring?
Why Monitoring?

Alerting
Why Monitoring?

Alerting Long-term trends


What is Prometheus?

Open Source Monitoring


Built by Soundcloud
Inspired by borgmon

What is Prometheus?

Pull-based


What is Prometheus?

Pull-based
Multi-Dimensional


What is Prometheus?

Pull-based
Multi-Dimensional
Metrics, not logging, not tracing


What is Prometheus?

Pull-based
Multi-Dimensional
Metrics, not logging, not tracing
No magic!

Target

Target

Target
Target /metrics

Target /metrics

Target /metrics
Target /metrics

Target /metrics Prometheus

Target /metrics
Target /metrics

15s

Target /metrics Prometheus

Target /metrics
Target /metrics

# HELP http_requests_total Total number of HTTP requests made.


# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8
Target /metrics

# HELP http_requests_total Total number of HTTP requests made.


# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8

Metric name
Target /metrics

# HELP http_requests_total Total number of HTTP requests made.


# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8

Label
Target /metrics

# HELP http_requests_total Total number of HTTP requests made.


# TYPE http_requests_total counter
http_requests_total{code="200",path="/status"} 8

Value
Target /metrics

Target /metrics Prometheus

Target /metrics
Target /metrics

Target /metrics Prometheus

PromQL

Target /metrics
Current percentage of HTTP errors across all service instances?
Current percentage of HTTP errors across all service instances?

sum by(path) rate(http_requests_total{status="500"}[5m]))


/ sum by(path) rate(http_requests_total[5m]))
Current percentage of HTTP errors across all service instances?

sum by(path) rate(http_requests_total{status="500"}[5m]))


/ sum by(path) rate(http_requests_total[5m]))

{path="/status"} 0.0039
{path="/"} 0.0011
{path="/api/v1/topics/:topic"} 0.087
{path="/api/v1/topics} 0.0342
Target /metrics

Target /metrics Prometheus

PromQL

Target /metrics
Target /metrics

Target /metrics Prometheus

PromQL

Target /metrics

Web UI Dashboard
Target /metrics

Target /metrics Prometheus

Target /metrics
Target /metrics

Target /metrics Prometheus

Target /metrics

Alert Definition
Is any disk about to run full within 4 hours?

ALERT DiskWillFillIn4Hours
IF predict_linear(node_filesystem_free[1h], 4*3600) < 0

-1h now +4h

0
Target /metrics

Target /metrics Prometheus

Target /metrics

Alert Definition
Target /metrics

1m

Target /metrics Prometheus

Target /metrics

Alert Definition
Target /metrics

1m

Target /metrics Prometheus

Target /metrics

Alert Definition
Target /metrics

1m

Target /metrics Prometheus Alertmanager

Target /metrics

Alert Definition
Alertmanager

Alert

Alert

Alert

Alert

Alert

Alert

Alert

Deduplicates
Alertmanager

Alert Alert

Alert Alert

Alert

Alert Alert Alert

Alert

Alert Alert

Alert Alert

Deduplicates Groups
Alertmanager

Alert Alert Alert

Alert Alert Alert Team A

Alert

Alert Alert Alert Alert Team B

Alert

Alert Alert Alert Team C

Alert Alert Alert

Deduplicates Groups Routes


Alertmanager

Alert Alert Alert

Alert Alert Alert Team A

Alert

Alert Alert Alert Alert Team B

Alert

Alert Alert Alert Team C

Alert Alert Alert

Deduplicates Groups Routes


Target /metrics

Target /metrics Prometheus Alertmanager

Target /metrics
Target /metrics

Target /metrics Prometheus Alertmanager

Target /metrics
Monitoring
Monitoring

Application Cluster
Cluster Monitoring
What is Kubernetes?

Platform for running


containerized applications
What is Kubernetes?

Announced 2014 by Google


Influenced by Borg & Omega

v1.01 in July 2015


Kubernetes joins the CNCF
Master
Master

API-Server

etcd

Controller-Manager

Scheduler

Kube-DNS

...
Master Worker

API-Server

etcd

Controller-Manager

Scheduler

Kube-DNS

...
Master Worker

API-Server Kubelet

etcd Kube-Proxy

...
Controller-Manager

Scheduler

Kube-DNS

...
Application Monitoring
User

AppX

Location
User

User

AppX

AppX

Location

Location
User

Service

User

AppX

Service

AppX

Location

Service

Location
User

Service

User

AppX

Service Prometheus
AppX

Location

Service

Location
User

Service

User

AppX

AppX
Service
? Prometheus

Location

Service

Location
User
K8s-API-Server
Service

User

AppX

Service Prometheus
AppX

Location

Service

Location
User
K8s-API-Server
Service

User

AppX

Service Prometheus
AppX

Location

Service

Location
Service Discovery
Static target list
DNS discovery
Kubernetes discovery
...
Master Worker User
K8s-API-Server
Service
API-Server Kubelet User

etcd Kube-Proxy
AppX
...
Controller-Manager Service Prometheus
AppX

Scheduler
Location
Kube-DNS
Service

... Location

Cluster-Monitoring Application-Monitoring
Problem
Prometheus is stateful and difficult to
configure!
Introducing the
Prometheus Operator
What is a K8s Operator?
What is a K8s Operator?

Application specific
operational knowledge
What is a K8s Operator?
What is a K8s Operator?

</>
What is a K8s Operator?

</>
What is a K8s Operator?

Operator

</>
Prometheus Operator
Kubernetes native configuration
Automated management and upgrades
of Prometheus & Alertmanager
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-app
spec:
...
apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
name: prometheus-k8s
spec:
...
Kube-Prometheus
Single command to install:

Prometheus & Alertmanager Cluster


Alerting rules
Dashboarding
Demo
Recap
What is Prometheus?

Pull-based
Multi-Dimensional
Metrics, not logging, not tracing
No magic!

Target /metrics

15s

Target /metrics Prometheus

Target /metrics
Target /metrics

1m

Target /metrics Prometheus Alertmanager

Target /metrics

Alert Definition
Prometheus-Operator & Kube-Prometheus

Operator

</>
Where to go from here?

Prometheus.io
/coreos/prometheus-operator
We are hiring!
San Francisco, New York & Berlin
Max Inden
Test-Engineer at CoreOS

@mxinden
Max.Inden@CoreOS.com

S-ar putea să vă placă și