Closed
Description
For data that typically changes less often than the sampling window, the delta expansion to avoid aliasing actually add inaccuracy.
For example, there is a spike in the rate to 80 (see http://bit.ly/1A4suIw) which is an order of magnitude greater than the actual rate (which is <10).
Activity
juliusv commentedon Mar 10, 2015
Adding a screenshot of that in case the data disappears:
Adding some explanations to this issue:
So it's a bit more tricky than "it deals poorly with slow moving data". Actually, slow-moving data should be handled fine as long as the underlying time series are "there". Meaning, rate deals poorly with time series appearing / disappearing completely, or having very infrequent samples (no matter if the sample values stay constant or not) compared to the time window over which we do the rate. This is due to the way that rates are extrapolated from the actual samples that are found under the time window.
In this example, the reason for the high initial spike of the time series is a tricky one: you're doing the sum over a multiple rates of individual time series. One of these time series (
buildbot_finished_builds_total{builder="flocker/installed-package/fedora-20",buildmaster="build.clusterhq.com",instance="build.clusterhq.com:80",job="buildbot",result="failure",slave_class="fedora-vagrant",slave_number="0"}
) doesn't exist at all for the first part of the graph, and springs into existence in the middle of the 4h-rate window. Meaning, the rate function will see something like this:Seeing no earlier data points to the left, rate() then naively (and sometimes correctly?) assumes that if there had just been more points, the growth would have probably looked similar. The extrapolation generally happens because no matter what time window you choose, you're unlikely to find samples to exactly match the window boundaries, so you wouldn't actually get the rate over the selected window of time, but over whatever deltaT there really is between samples. You'd also get funny temporal aliasing due to that.
Now the question is: how should rates behave in abnormal situations like this, given that the above is just a more extreme case of the more general extrapolation one which is intended? Should they really act differently, or should one just avoid/ignore time series that disappear/appear like that (if the dimensions are known beforehand, one can initialize their counters to 0)? That's not always possible, but not sure what a sane(r) behavior for rate would be that wouldn't be wrong just as often in the other direction?
promql: Remove deprecated 2nd argument to delta()
promql: Remove extrapolation from rate/increase/delta.
promql: Remove extrapolation from rate/increase/delta.
promql: Limit extrapolation of delta/rate/increase
promql: Limit extrapolation of delta/rate/increase
12 remaining items