Skip to content

rate and delta deal poorly with slow moving data. #581

Closed
@tomprince

Description

@tomprince
Contributor

For data that typically changes less often than the sampling window, the delta expansion to avoid aliasing actually add inaccuracy.

For example, there is a spike in the rate to 80 (see http://bit.ly/1A4suIw) which is an order of magnitude greater than the actual rate (which is <10).

Activity

juliusv

juliusv commented on Mar 10, 2015

@juliusv
Member

Adding a screenshot of that in case the data disappears:

rate

Adding some explanations to this issue:

So it's a bit more tricky than "it deals poorly with slow moving data". Actually, slow-moving data should be handled fine as long as the underlying time series are "there". Meaning, rate deals poorly with time series appearing / disappearing completely, or having very infrequent samples (no matter if the sample values stay constant or not) compared to the time window over which we do the rate. This is due to the way that rates are extrapolated from the actual samples that are found under the time window.

In this example, the reason for the high initial spike of the time series is a tricky one: you're doing the sum over a multiple rates of individual time series. One of these time series (buildbot_finished_builds_total{builder="flocker/installed-package/fedora-20",buildmaster="build.clusterhq.com",instance="build.clusterhq.com:80",job="buildbot",result="failure",slave_class="fedora-vagrant",slave_number="0"}) doesn't exist at all for the first part of the graph, and springs into existence in the middle of the 4h-rate window. Meaning, the rate function will see something like this:

                             4h window
|------------------------------------------------------------|              ...
                                                                       X
                                                                  X
                                                            X
                                                      X

Seeing no earlier data points to the left, rate() then naively (and sometimes correctly?) assumes that if there had just been more points, the growth would have probably looked similar. The extrapolation generally happens because no matter what time window you choose, you're unlikely to find samples to exactly match the window boundaries, so you wouldn't actually get the rate over the selected window of time, but over whatever deltaT there really is between samples. You'd also get funny temporal aliasing due to that.

Now the question is: how should rates behave in abnormal situations like this, given that the above is just a more extreme case of the more general extrapolation one which is intended? Should they really act differently, or should one just avoid/ignore time series that disappear/appear like that (if the dimensions are known beforehand, one can initialize their counters to 0)? That's not always possible, but not sure what a sane(r) behavior for rate would be that wouldn't be wrong just as often in the other direction?

added 2 commits that reference this issue on Oct 10, 2015
37daaf1
06c55d7
added 2 commits that reference this issue on Nov 30, 2015
4c7b899
1c3152a

12 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @tomprince@juliusv@brian-brazil

        Issue actions

          `rate` and `delta` deal poorly with slow moving data. · Issue #581 · prometheus/prometheus