Closed
Description
I've got a number of duplicate prometheus metrics in my spring boot app (ex: normalized_servo_rest_totaltime
:
# HELP gauge_servo_response_assetmgmt_star_star gauge_servo_response_assetmgmt_star_star
# TYPE gauge_servo_response_assetmgmt_star_star gauge
gauge_servo_response_assetmgmt_star_star 16.0
# HELP gauge_servo_response_star_star gauge_servo_response_star_star
# TYPE gauge_servo_response_star_star gauge
gauge_servo_response_star_star 13.0
# HELP normalized_servo_rest_totaltime normalized_servo_rest_totaltime
# TYPE normalized_servo_rest_totaltime gauge
normalized_servo_rest_totaltime 0.0
# HELP normalized_servo_rest_count normalized_servo_rest_count
# TYPE normalized_servo_rest_count gauge
normalized_servo_rest_count 0.0
# HELP gauge_servo_rest_min gauge_servo_rest_min
# TYPE gauge_servo_rest_min gauge
gauge_servo_rest_min 0.0
# HELP gauge_servo_rest_max gauge_servo_rest_max
# TYPE gauge_servo_rest_max gauge
gauge_servo_rest_max 0.0
# HELP gauge_servo_response_error gauge_servo_response_error
# TYPE gauge_servo_response_error gauge
gauge_servo_response_error 10098.0
# HELP gauge_servo_response_star_star_favicon_ico gauge_servo_response_star_star_favicon_ico
# TYPE gauge_servo_response_star_star_favicon_ico gauge
gauge_servo_response_star_star_favicon_ico 6.0
# HELP gauge_servo_response_prometheus gauge_servo_response_prometheus
# TYPE gauge_servo_response_prometheus gauge
gauge_servo_response_prometheus 0.0
# HELP gauge_servo_response_uaa_star_star gauge_servo_response_uaa_star_star
# TYPE gauge_servo_response_uaa_star_star gauge
gauge_servo_response_uaa_star_star 7.0
# HELP normalized_servo_rest_totaltime normalized_servo_rest_totaltime
# TYPE normalized_servo_rest_totaltime gauge
normalized_servo_rest_totaltime 0.0
# HELP normalized_servo_rest_count normalized_servo_rest_count
# TYPE normalized_servo_rest_count gauge
normalized_servo_rest_count 0.0
# HELP gauge_servo_rest_min gauge_servo_rest_min
# TYPE gauge_servo_rest_min gauge
gauge_servo_rest_min 0.0
# HELP gauge_servo_rest_max gauge_servo_rest_max
# TYPE gauge_servo_rest_max gauge
gauge_servo_rest_max 0.0
In my spring boot metrics I don't see any duplicates
"gauge.servo.response.assetmgmt.star-star": 16,
"gauge.servo.response.star-star": 13,
"normalized.servo.restclient.totaltime": 0,
"normalized.servo.restclient.count": 0,
"gauge.servo.restclient.min": 0,
"gauge.servo.restclient.max": 0,
"normalized.servo.rest.totaltime": 0,
"normalized.servo.rest.count": 0,
"gauge.servo.rest.min": 0,
"gauge.servo.rest.max": 0,
"gauge.servo.response.error": 10098,
"gauge.servo.response.star-star.favicon.ico": 6,
"gauge.servo.response.prometheus": 2,
Activity
brian-brazil commentedon Jul 16, 2016
The code looks fine, can you verify that there's actually no duplicates being passed to the Prometheus
client code?
ddewaele commentedon Jul 16, 2016
Will check ... noticed that it doesn't happen immediately... when the spring boot app starts there no duplicates. after a while they pop up .... I'll see if I can debug it a bit more and provide some more info
ddewaele commentedon Jul 16, 2016
Here are the duplicates being generated :
This is a Spring Boot app with some Spring cloud dependencies (in this case Netflix Zuul)
The Spring Boot metrics endpoint captures all metrics in a map, thus avoiding duplicates (but potentially also overwriting other values).
https://github.com/spring-projects/spring-boot/blob/master/spring-boot-actuator/src/main/java/org/springframework/boot/actuate/endpoint/MetricsEndpoint.java#L77
brian-brazil commentedon Jul 16, 2016
Nothing we can do on our end then, this is a bug in spring-boot.
ddewaele commentedon Jul 16, 2016
Will log an issue at spring boot / cloud side ...
However I assume that the prometheus spring boot client should follow the spring boot semantics ? (If spring boot decides it's not a bug, and that it's the metrics providers responsibility to ensure uniqueness, then the prometheus spring boot client should follow and also skip duplicates ?)
brian-brazil commentedon Jul 16, 2016
It depends on what they say. From the screenshot shown these are not duplicates, they're insufficiently distinguished.
ddewaele commentedon Jul 16, 2016
Logged an issue spring-projects/spring-boot#6404
maust commentedon Aug 27, 2016
I ran into the same issue (probably caused by Spring Cloud/Eureka). Here's a (dirty) work around: https://gist.github.com/maust/3166c8b4471f34ee57f1995c79a56c20
maust commentedon Aug 31, 2016
@ddewaele Seems like Spring Boot is not going to solve it, any progress (with servo)?
@brian-brazil Would you accept a PR with the (Spring-way / work around) of overriding duplicate names?
brian-brazil commentedon Aug 31, 2016
If the data model is such that there's no correct mapping to time series, we may have to drop support from the java client so as to avoid providing broken metrics.
vtatai commentedon Aug 31, 2016
I'm also facing the duplicates metric issue, but I do not use Spring Boot. My use case is that metrics can be added in several points in the app to the default registry, and in some cases the same metric name can be reused. Since the
CollectorRegistry
class does not expose the current collectors, there is no way of knowing if the collector is already there or not.I solved this problem by implementing
equals
andhashCode
inside each of the collector classes, likeCounter
,Gauge
, etc. So when the same collector is added to the registry, since it is a set it does not get added twice. Is that a good solution? If so, I can submit a patch for it. Perhaps that also solves the issue above as well.brian-brazil commentedon Aug 31, 2016
That'll be solved properly inside the simpleclient at some point, though if you're running into this likely means you're using instrumentation incorrectly. Metric instances should all be static fields of classes.
vtatai commentedon Aug 31, 2016
I am bridging hundreds of metrics coming from another framework (Finagle), so I cannot create static fields.
brian-brazil commentedon Aug 31, 2016
vtatai commentedon Aug 31, 2016
I'm not using a custom collector - I guess this is a question more about the
CollectorRegistry
. For instance, if at one point I do:Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)
and then later (somewhere else) I do
Gauge.build().name(dynamicName).register(CollectorRegistry.defaultRegistry)
with the same value for
dynamicName
, then it will fail once Prometheus scrapes the data. From what I can tell there is no way of knowing which collectors the registry is storing. I could create a custom collector registry, but feel like this could be easily solved in the CollectorRegistry class itself, or perhaps in the SimpleCollector (by implementing equals / hashCode).10 remaining items
brian-brazil commentedon Jan 24, 2017
We're still waiting on Spring Boot to clarify their intended semantics.
jzampieron commentedon Jan 24, 2017
Is there an active ticket with Spring Boot for them to define the expected semantics of the data?
As noted, 6404 has been closed. Do I need to reopen a ticket and drive it to closure?
kiview commentedon Jan 24, 2017
@jzampieron I think this sounds like a good idea, I assume there won't be any clear answer otherwise.
jzampieron commentedon Jan 24, 2017
So here's another take on it... there's still a bug here in Prometheus's spring-boot support.
No matter what the input provided by spring boot actuator, prometheus's end point should produce valid consumable output, or otherwise mark and log an internal server error. (500 code) with detailed information.
Producing invalid output to be consumed by your consumer is a bug by definition.
I'll work with to get a spring boot thread started to define/document the proper semantics, if someone here would take an action to at least error log, handle and report in a reasonable manner, that would be good.
Please and thanks.
brian-brazil commentedon Jan 24, 2017
I don't intend to add that level of sanity checking to the java client. I take the approach that we'll try and protect you if you're doing direct instrumentation, but if you're writing custom collectors it's presumed you know what you're up to.
GIGO
Looks like our current behaviour is correct, it just needs to be clarified in upstream docs and the providers fixed.
jzampieron commentedon Jan 24, 2017
Alright, so then it sounds like the correct thing to do is open a ticket against spring-cloud-netflix to fix the provider.
cch0 commentedon Mar 21, 2017
I ran into similar (if not the same) issue and I had to exclude the dependency on spring-cloud-starter-ribbon (which brings in servo) when adding spring-cloud-starter-consul-discovery in the spring-boot app.
maust commentedon Apr 1, 2017
Instead of my previous work-around disabling servo metrics might be better:
spring.metrics.servo.enabled: false
brian-brazil commentedon Apr 1, 2017
We know now these are bugs in the various users of spring metrics, so file bugs with each of them as needed.
This is not a problem with client java, we're just the one of the consumers of spring metrics that cares.