Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate metrics when exposing prometheus metrics via spring boot #130

Closed
ddewaele opened this issue Jul 16, 2016 · 34 comments
Closed

Duplicate metrics when exposing prometheus metrics via spring boot #130

ddewaele opened this issue Jul 16, 2016 · 34 comments

Comments

@ddewaele
Copy link

I've got a number of duplicate prometheus metrics in my spring boot app (ex: normalized_servo_rest_totaltime:

# HELP gauge_servo_response_assetmgmt_star_star gauge_servo_response_assetmgmt_star_star
# TYPE gauge_servo_response_assetmgmt_star_star gauge
gauge_servo_response_assetmgmt_star_star 16.0
# HELP gauge_servo_response_star_star gauge_servo_response_star_star
# TYPE gauge_servo_response_star_star gauge
gauge_servo_response_star_star 13.0
# HELP normalized_servo_rest_totaltime normalized_servo_rest_totaltime
# TYPE normalized_servo_rest_totaltime gauge
normalized_servo_rest_totaltime 0.0
# HELP normalized_servo_rest_count normalized_servo_rest_count
# TYPE normalized_servo_rest_count gauge
normalized_servo_rest_count 0.0
# HELP gauge_servo_rest_min gauge_servo_rest_min
# TYPE gauge_servo_rest_min gauge
gauge_servo_rest_min 0.0
# HELP gauge_servo_rest_max gauge_servo_rest_max
# TYPE gauge_servo_rest_max gauge
gauge_servo_rest_max 0.0
# HELP gauge_servo_response_error gauge_servo_response_error
# TYPE gauge_servo_response_error gauge
gauge_servo_response_error 10098.0
# HELP gauge_servo_response_star_star_favicon_ico gauge_servo_response_star_star_favicon_ico
# TYPE gauge_servo_response_star_star_favicon_ico gauge
gauge_servo_response_star_star_favicon_ico 6.0
# HELP gauge_servo_response_prometheus gauge_servo_response_prometheus
# TYPE gauge_servo_response_prometheus gauge
gauge_servo_response_prometheus 0.0
# HELP gauge_servo_response_uaa_star_star gauge_servo_response_uaa_star_star
# TYPE gauge_servo_response_uaa_star_star gauge
gauge_servo_response_uaa_star_star 7.0
# HELP normalized_servo_rest_totaltime normalized_servo_rest_totaltime
# TYPE normalized_servo_rest_totaltime gauge
normalized_servo_rest_totaltime 0.0
# HELP normalized_servo_rest_count normalized_servo_rest_count
# TYPE normalized_servo_rest_count gauge
normalized_servo_rest_count 0.0
# HELP gauge_servo_rest_min gauge_servo_rest_min
# TYPE gauge_servo_rest_min gauge
gauge_servo_rest_min 0.0
# HELP gauge_servo_rest_max gauge_servo_rest_max
# TYPE gauge_servo_rest_max gauge
gauge_servo_rest_max 0.0

In my spring boot metrics I don't see any duplicates

"gauge.servo.response.assetmgmt.star-star": 16,
"gauge.servo.response.star-star": 13,
"normalized.servo.restclient.totaltime": 0,
"normalized.servo.restclient.count": 0,
"gauge.servo.restclient.min": 0,
"gauge.servo.restclient.max": 0,
"normalized.servo.rest.totaltime": 0,
"normalized.servo.rest.count": 0,
"gauge.servo.rest.min": 0,
"gauge.servo.rest.max": 0,
"gauge.servo.response.error": 10098,
"gauge.servo.response.star-star.favicon.ico": 6,
"gauge.servo.response.prometheus": 2,
@brian-brazil
Copy link
Contributor

The code looks fine, can you verify that there's actually no duplicates being passed to the Prometheus
client code?

@ddewaele
Copy link
Author

Will check ... noticed that it doesn't happen immediately... when the spring boot app starts there no duplicates. after a while they pop up .... I'll see if I can debug it a bit more and provide some more info

@ddewaele
Copy link
Author

Here are the duplicates being generated :

This is a Spring Boot app with some Spring cloud dependencies (in this case Netflix Zuul)

The Spring Boot metrics endpoint captures all metrics in a map, thus avoiding duplicates (but potentially also overwriting other values).

https://github.com/spring-projects/spring-boot/blob/master/spring-boot-actuator/src/main/java/org/springframework/boot/actuate/endpoint/MetricsEndpoint.java#L77

@brian-brazil
Copy link
Contributor

Nothing we can do on our end then, this is a bug in spring-boot.

@ddewaele
Copy link
Author

Will log an issue at spring boot / cloud side ...

However I assume that the prometheus spring boot client should follow the spring boot semantics ? (If spring boot decides it's not a bug, and that it's the metrics providers responsibility to ensure uniqueness, then the prometheus spring boot client should follow and also skip duplicates ?)

@brian-brazil
Copy link
Contributor

It depends on what they say. From the screenshot shown these are not duplicates, they're insufficiently distinguished.

@ddewaele
Copy link
Author

Logged an issue spring-projects/spring-boot#6404

@maust
Copy link

maust commented Aug 27, 2016

I ran into the same issue (probably caused by Spring Cloud/Eureka). Here's a (dirty) work around: https://gist.github.com/maust/3166c8b4471f34ee57f1995c79a56c20

@maust
Copy link

maust commented Aug 31, 2016

@ddewaele Seems like Spring Boot is not going to solve it, any progress (with servo)?
@brian-brazil Would you accept a PR with the (Spring-way / work around) of overriding duplicate names?

@brian-brazil
Copy link
Contributor

If the data model is such that there's no correct mapping to time series, we may have to drop support from the java client so as to avoid providing broken metrics.

@vtatai
Copy link

vtatai commented Aug 31, 2016

I'm also facing the duplicates metric issue, but I do not use Spring Boot. My use case is that metrics can be added in several points in the app to the default registry, and in some cases the same metric name can be reused. Since the CollectorRegistry class does not expose the current collectors, there is no way of knowing if the collector is already there or not.

I solved this problem by implementing equals and hashCode inside each of the collector classes, like Counter, Gauge, etc. So when the same collector is added to the registry, since it is a set it does not get added twice. Is that a good solution? If so, I can submit a patch for it. Perhaps that also solves the issue above as well.

@brian-brazil
Copy link
Contributor

That'll be solved properly inside the simpleclient at some point, though if you're running into this likely means you're using instrumentation incorrectly. Metric instances should all be static fields of classes.

@vtatai
Copy link

vtatai commented Aug 31, 2016

I am bridging hundreds of metrics coming from another framework (Finagle), so I cannot create static fields.

@brian-brazil
Copy link
Contributor

brian-brazil commented Aug 31, 2016 via email

@vtatai
Copy link

vtatai commented Aug 31, 2016

I'm not using a custom collector - I guess this is a question more about the CollectorRegistry. For instance, if at one point I do:

Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)

and then later (somewhere else) I do

Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)

with the same value for dynamicName, then it will fail once Prometheus scrapes the data. From what I can tell there is no way of knowing which collectors the registry is storing. I could create a custom collector registry, but feel like this could be easily solved in the CollectorRegistry class itself, or perhaps in the SimpleCollector (by implementing equals / hashCode).

@brian-brazil
Copy link
Contributor

You should never be using dynamic names with direct instrumentation, nor should you ever be providing duplicate names or attempting to dedup on the fly. You're approaching this from the wrong standpoint, make sure the names are unique and static to start with.

@vtatai
Copy link

vtatai commented Sep 1, 2016

Ok I see, thanks!

@pidster
Copy link

pidster commented Nov 14, 2016

@ddewaele please can you share a simple example that reproduces this? If there's duplicate metrics being generated, maybe there's a model mismatch or multiple copies of a Spring Bean.

From a quick scan of the code, I can see that the SpringBootMetricsCollector.java class is annotated with @Component. If this was discovered during a classpath scan, and combined with the @EnableSpringBootMetricsCollector (which creates another bean of the same type), I think it could produce a duplicate bean.

@kiview
Copy link

kiview commented Nov 23, 2016

If @maust's implementation is a valid(?) workaround, wouldn't it be okay to integrate this code into @EnableSpringBootMetricsCollector? I'm currently using this workaround as well and it seems to work.

@brian-brazil
Copy link
Contributor

This is an upstream bug. To workaround it in any way would be to introduce a bug in our own code.

We need to wait on upstream to tell us what the intended semantics are.

@kiview
Copy link

kiview commented Nov 23, 2016

I'd argue that this is a bug inside of SpringBootMetricsCollector. If Spring-Boot allows multiple Metrics with the same name inside a PublicMetric, the SpringBootMetricsCollector should be able to deal with this.

I don't think we will see any additional work from Spring-Boot on this issue, they clearly state that it's up to the metrics sources to determine how they want to write their metrics:
spring-projects/spring-boot#6404 (comment)

@brian-brazil
Copy link
Contributor

We're still waiting on upstream to clarify if the labels are merely annotations or actually part of the identity of the metric.

Blindly throwing away data on collision is not acceptable. We need to know what the intended semantics are here.

@kiview
Copy link

kiview commented Nov 23, 2016

Okay I see your point there.

So do you think we should trigger the Spring-Boot team again, so they can clearly state if this is allowed behavior? Because we won't be able to communicate with all third party metrics providers about their intended metrics usages.

@brian-brazil
Copy link
Contributor

I had asked a clarifying question (I had thought on spring-projects/spring-boot#6404 but can't seem to find it now) but never got an answer back.

@jzampieron
Copy link

Any progress here. This is effectively a show-stopper for anyone using Spring Boot with Prometheus and the Spring Cloud Netflix libraries.

Spring has closed spring-projects/spring-boot#6404

It seems as if the SpringBootMetricsCollector in the plug-in should more uniquely namespace the values.

@brian-brazil
Copy link
Contributor

We're still waiting on Spring Boot to clarify their intended semantics.

@jzampieron
Copy link

Is there an active ticket with Spring Boot for them to define the expected semantics of the data?

As noted, 6404 has been closed. Do I need to reopen a ticket and drive it to closure?

@kiview
Copy link

kiview commented Jan 24, 2017

@jzampieron I think this sounds like a good idea, I assume there won't be any clear answer otherwise.

@jzampieron
Copy link

jzampieron commented Jan 24, 2017

So here's another take on it... there's still a bug here in Prometheus's spring-boot support.

No matter what the input provided by spring boot actuator, prometheus's end point should produce valid consumable output, or otherwise mark and log an internal server error. (500 code) with detailed information.

Producing invalid output to be consumed by your consumer is a bug by definition.

I'll work with to get a spring boot thread started to define/document the proper semantics, if someone here would take an action to at least error log, handle and report in a reasonable manner, that would be good.

Please and thanks.

@brian-brazil
Copy link
Contributor

brian-brazil commented Jan 24, 2017

No matter what the input provided by spring boot actuator, prometheus's end point should produce valid consumable output, or otherwise mark and log an internal server error. (500 code) with detailed information.

I don't intend to add that level of sanity checking to the java client. I take the approach that we'll try and protect you if you're doing direct instrumentation, but if you're writing custom collectors it's presumed you know what you're up to.

Producing invalid output to be consumed by your consumer is a bug by definition.

GIGO

I'll work with to get a spring boot thread started to define/document the proper semantics

Looks like our current behaviour is correct, it just needs to be clarified in upstream docs and the providers fixed.

@jzampieron
Copy link

Alright, so then it sounds like the correct thing to do is open a ticket against spring-cloud-netflix to fix the provider.

@cch0
Copy link

cch0 commented Mar 21, 2017

I ran into similar (if not the same) issue and I had to exclude the dependency on spring-cloud-starter-ribbon (which brings in servo) when adding spring-cloud-starter-consul-discovery in the spring-boot app.

@maust
Copy link

maust commented Apr 1, 2017

Instead of my previous work-around disabling servo metrics might be better:

spring.metrics.servo.enabled: false

@brian-brazil
Copy link
Contributor

We know now these are bugs in the various users of spring metrics, so file bugs with each of them as needed.

This is not a problem with client java, we're just the one of the consumers of spring metrics that cares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants