Monitoring

To get better insight into the operations of jobs and services you can collect metrics into your favorite monitoring service. Metrics are exposed using JMX and StatsD.

Transcoders on the other hand only expose metrics using StatsD.

New in version 4.2.3.

StatsD

By default metrics are not sent to a StatsD server. To enable it you have to update the metrics configuration. For example, to have metrics sent to a StatsD server on localhost listening on UDP port 8125, use:

PUT API/configuration/metrics
Content-Type: application/xml

<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
  <statsd/>
</MetricsConfigurationDocument>

Metrics sent to StatsD are by default prefixed with vs.. To have metrics sent with the prefix vs1., for example if you have multiple instances running:

PUT API/configuration/metrics
Content-Type: application/xml

<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
  <statsd>
    <host>metrics.example.com</host>
    <port>6125</port>
    <prefix>vs1</prefix>
  </statsd>
</MetricsConfigurationDocument>

Here metrics are sent to an external StatsD server on the non-standard port 6125. Note that the . between the prefix and metric name is added automatically.

Filtering metrics

You can set inclusion and exclusion filters to restrict which metrics are sent to the StatsD server. The default is to include all and exclude none.

Inclusion/exclusion filters may have a leading or trailing wildcard. For example, to exclude all storage.fs metrics:

<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
  <statsd>
    <exclude>storage.fs.*</exclude>
  </statsd>
</MetricsConfigurationDocument>

Tagged metrics

Some metrics are tagged with additional information. These are sent to StatsD in the format:

<metricname>:<value>|<type>|#<tag>+

A job.step.execution.time metric might for example be sent as:

vs.job.step.execution.time:123|ms|#type:placeholder-import,step:100,sync

If your StatsD server does not support such tags then they can be disabled by setting tags to false:

<MetricsConfigurationDocument xmlns="http://xml.vidispine.com/schema/vidispine">
  <statsd>
    ...
    <tags>false</tags>
  </statsd>
</MetricsConfigurationDocument>

JMX

Each metric is exposed as an JMX MBean in the “metrics” domain. You can view the metrics using for example:

  • A JMX client such as VisualVM with the VisualVM-MBeans plugin, or JConsole.
  • Programmatically using the Java JMX client interface.
  • Over HTTP/JSON using a bridge such as Jolokia.

Metrics

Metrics are exposed as either meters, timers or gauges. The name of a metric is meant to be self-explanatory. Timers are suffixed with time and meters are named as past tense verbs, while gauges make up the rest.

The StatsD type used for each metric, and the statistics exposed over JXM for each type are:

Type StatsD type MBean attributes
Meter c The count, mean and 1/5/15-minute rates.
Gauge g The value.
Timer ms The count, min/max/mean/stdev, rates and percentiles.

Indexing

  • Meters:
    • reindex.{index}.started
    • reindex.{index}.finished
    • indexer.solr.request.failed
  • Timers:
    • indexer.solr.update.time
    • indexer.solr.delete.time
    • indexer.solr.commit.time
    • indexer.{index}.index.time
      • With index being one of item/collection/acl/file.

Job

  • Meters:
    • job.created
    • job.started
    • job.finished
    • job.failed
    • job.blocked
  • Gauges:
    • job.total.{state}
      • Where state is the name of a job state, lower cased and with _ replaced with -. For example finished-warning.
  • Timers:
    • job.{type}.step.{step}.{sync}.execution.time
    • job.step.execution.time
      • Tagged with type:{type}, step:{step} and sync/async.

Solr

  • Meters:
    • solr.request.failed
  • Timers:
    • solr.query.time
    • solr.update.time
    • solr.commit.soft.time
    • solr.commit.hard.time
    • solr.optimize.time

Storage

  • Meters:
    • storage.file.found
    • storage.file.changed
    • storage.file.deleted
    • storage.file.hashed
    • storage.file.checksum.bytes.read
    • storage.fs.stat
      • The number of stat call made.

Transfer

  • Meters:
    • transfer.bytes.transferred
    • transfer.started
    • transfer.finished
    • transfer.finished-part
    • transfer.failed
    • transfer.blocked

Service

  • Meters:
    • service.exception
  • Gauges:
    • service.load.5
      • The 5 minute load.
    • service.load.60
      • The 60 minute load.

Transcoder

  • Gauges
    • transcoder.{transcoder-id}.jobs.running
    • transcoder.{transcoder-id}.jobs.finished
    • transcoder.{transcoder-id}.jobs.failed
    • transcoder.{transcoder-id}.jobs.{transcoder-job-type}.running
    • transcoder.{transcoder-id}.jobs.{transcoder-job-type}.finished
    • transcoder.{transcoder-id}.jobs.{transcoder-job-type}.failed
  • Counters
    • transcoder.{transcoder-id}.muxer.video.frames
    • transcoder.{transcoder-id}.encoder.{codec}.frames
    • transcoder.{transcoder-id}.decoder.{codec}.frames
    • transcoder.{transcoder-id}.io.{protocol}.{direction}.bytes