Wrong result when computing accuracy using tf.metrics.accuracy #15115

secsilm · 2017-12-05T04:18:37Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 1709
TensorFlow installed from (source or binary): pip
TensorFlow version (use command below): 1.4.0
Python version: Python 3.5.2 :: Anaconda custom (64-bit)

Describe the problem

I found the result tf.metrics.accuracy returns is incorrect when I trained my model. To verify this I wrote a simple program.

import tensorflow as tf

sess = tf.Session()
labels = tf.placeholder(tf.int32)
predictions = tf.placeholder(tf.int32)
acc, _ = tf.metrics.accuracy(labels, predictions)
my_acc = tf.reduce_mean(tf.cast(tf.equal(labels, predictions), tf.float32))

feed_dict = {
    labels: [1, 2, 3, 4, 5], 
    predictions: [1, 2, 3, 4, 5]
}
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

sess.run(acc, feed_dict)  # 0.0
sess.run(my_acc, feed_dict)  # 1.0

You can see that acc and my_acc is different and acc is wrong. I double checked the doc and still confused. Is there anything I missed? Thank you.

The text was updated successfully, but these errors were encountered:

secsilm · 2017-12-05T08:48:48Z

But the result in evaluation mode is correct while wrong in train mode. I'm using tf.estimator and tf.data.TFRecordDataset. The following are relative codes.

def cifar_model_fn(features, labels, mode):
    # some other codes

    logits = tf.layers.dense(inputs=dropout, units=10)
    predictions = {
        'classes': tf.argmax(input=logits, axis=1, name='classes'),
        'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
    }
    onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
    loss = tf.losses.softmax_cross_entropy(onehot_labels, logits)
    accuracy, update_op = tf.metrics.accuracy(labels=labels, predictions=predictions['classes'], name='accuracy')
    # my way to compute accuracy, which gives correct result when training
    my_acc = tf.reduce_mean(tf.cast(tf.equal(tf.cast(labels, tf.int64), predictions['classes']), tf.float32))

    if mode == tf.estimator.ModeKeys.TRAIN:
        tensors_to_log = {
            'Accuracy': accuracy,
            'My accuracy': my_acc}
        logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=100)
        optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
        train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op, training_hooks=[logging_hook])

    eval_metric_ops = {
        'accuracy': (accuracy, update_op)
    }
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

def main():
    # create dataset ....

    cifar10_classifier = tf.estimator.Estimator(model_fn=cifar_model_fn, model_dir=FLAGS.model_dir)
    cifar10_classifier.train(input_fn=train_input_fn)
    eval_results = cifar10_classifier.evaluate(input_fn=eval_input_fn)

My purpose is to log training accuracy when training. The following is a part of the console output. Accuracy is always 0.0.

INFO:tensorflow:loss = 1.08232, step = 7301 (9.312 sec)
INFO:tensorflow:My accuracy = 0.59375, Accuracy = 0.0 (9.311 sec)
INFO:tensorflow:global_step/sec: 10.8054
INFO:tensorflow:loss = 1.29713, step = 7401 (9.257 sec)
INFO:tensorflow:My accuracy = 0.484375, Accuracy = 0.0 (9.256 sec)
INFO:tensorflow:global_step/sec: 11.4928
INFO:tensorflow:loss = 1.31355, step = 7501 (8.693 sec)
INFO:tensorflow:My accuracy = 0.578125, Accuracy = 0.0 (8.693 sec)
INFO:tensorflow:global_step/sec: 12.1267
INFO:tensorflow:loss = 1.25855, step = 7601 (8.252 sec)
INFO:tensorflow:My accuracy = 0.515625, Accuracy = 0.0 (8.254 sec)
INFO:tensorflow:global_step/sec: 12.0489
INFO:tensorflow:loss = 1.21857, step = 7701 (8.303 sec)
INFO:tensorflow:My accuracy = 0.5625, Accuracy = 0.0 (8.304 sec)
INFO:tensorflow:global_step/sec: 12.1971
INFO:tensorflow:loss = 0.983289, step = 7801 (8.196 sec)
INFO:tensorflow:My accuracy = 0.6875, Accuracy = 0.0 (8.195 sec)

jmaye · 2017-12-05T16:34:34Z

Beware that tf.metrics.accuracy is computed over an entire session run, not per batch. I just kicked it out because I did not get the utility of it. Instead, I'm doing the same thing as you with reduce_mean to have batch accuracy.

reedwm · 2017-12-05T18:54:56Z

As @jmaye said, tf.metrics.accuracy is not meant to compute the accuracy of a single batch. It returns both the accuracy and an update_op, and update_op is intended to be run every batch, which updates the accuracy. See #9498 for more discussion on this.

secsilm · 2017-12-06T02:54:35Z

@reedwm @jmaye Thank you for your reply. I read #9498 and understand that update_op is called every batch. But I still have one question: When the accuracy is computed?

The doc says:

The accuracy function creates two local variables, total and count that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as accuracy: an idempotent operation that simply divides total by count.

but doesn't say when the accuracy is computed, i.e. divides total by count.

In my codes above, the accuracy is computed when finishing evaluation progress. But is it supposed to computing the accuracy when finishing the entire training progress?

reedwm · 2017-12-06T17:51:17Z

accuracy is recomputed everytime it is evaluated.

How tf.metrics.accuracy works in general is that it maintains two variables, total and count, each which starts at 0. Whenever accuracy is evaluated, it returns total / count (or 0 if count is 0), and does not modify total or count. accuracy also does not look labels or predictions. It just does a division.

When update_op is evaluated, it uses labels and predictions to increase total and count. It increases total by the number of predictions that match the labels, and increases count by the total number of predictions. The idea is that you run update_op for every new batch of labels and predictions you get. Then, whenever you want to check to accuracy of all the batches you've seen so far, you evaluate accuracy, which will have the current accuracy of the batches seen so far. accuracy will change only when you run update_op, since only running update_op modifies total and count.

In your Estimators case, update_op is never run during training, only during evaluation. The Estimator will automatically call the update_op every batch when you return the EstimatorSpec with eval_metric_ops during evaluation, but it does not during training.

secsilm · 2017-12-07T02:22:28Z

@reedwm Thank you for your detailed explanation. I got the idea.

dchatterjee172 · 2018-12-20T12:53:19Z

@reedwm @jmaye
if mode == tf.estimator.ModeKeys.EVAL:
a = tf.random.uniform(dtype=tf.int32, maxval=1000, shape=[])
eval_metric_ops = {
"eval_mean_bert_loss": tf.metrics.mean(total_loss_bert),
"eval_mean_original_loss": tf.metrics.mean(total_loss_original),
"eval_mean_loss": tf.metrics.mean(a)}
output_spec = tf.estimator.EstimatorSpec(
mode=mode,
loss=a,
eval_metric_ops=eval_metric_ops)

eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn, steps=100, start_delay_secs=0, throttle_secs=120)

output
INFO:tensorflow:Saving dict for global step 2000: eval_mean_bert_loss = 4.9399266, eval_mean_loss = 483.38, eval_mean_original_loss = 4.982164, global_step = 2000, loss = 483.38
how can loss and eval_mean_loss be exactly same? batch size is 1 here. Shouldn't loss be the value of a at 100th batch, and eval_mean_loss be the mean of the all 100 a ?

reedwm · 2019-01-03T20:05:40Z

@dchatterjee172, can you provide a self-contained, relatively short example I can run to reproduce? It does seem like loss should be different from eval_mean_loss. I am also unsure how loss can be a non-integer, since you passed an int32 value for loss.

dchatterjee172 · 2019-01-04T08:14:47Z

@reedwm
tf.metrics.mean will cast the values argument to float.
https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/python/ops/metrics_impl.py#L392

reedwm · 2019-01-04T19:19:13Z

Even with the cast in tf.metrics.mean, you pass loss=a, which is an int. In any case, a self-contained example would make this easier to debug.

sharan-amutharasu · 2019-03-29T06:23:16Z

Beware that tf.metrics.accuracy is computed over an entire session run, not per batch. I just kicked it out because I did not get the utility of it. Instead, I'm doing the same thing as you with reduce_mean to have batch accuracy.

It can be helpful when there is a lot of oscillation in batch accuracy. The aggregated accuracy will be a lot smoother and so it's easier to observe learning from it's values.

MingleiLI · 2020-01-15T03:01:08Z

why the value returned by tf.metrics.accuracy is always 0?

reedwm closed this as completed Dec 5, 2017

sethtroisi mentioned this issue Jul 7, 2018

tf.metrics.mean != tf.reduce_mean tensorflow/minigo#298

Closed

tom-andersson mentioned this issue Jun 21, 2021

Ability to prevent a metric from accumulating during training #50384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong result when computing accuracy using tf.metrics.accuracy #15115

Wrong result when computing accuracy using tf.metrics.accuracy #15115

secsilm commented Dec 5, 2017

secsilm commented Dec 5, 2017

Uh oh!

jmaye commented Dec 5, 2017

Uh oh!

reedwm commented Dec 5, 2017

Uh oh!

secsilm commented Dec 6, 2017

Uh oh!

reedwm commented Dec 6, 2017

Uh oh!

secsilm commented Dec 7, 2017

Uh oh!

dchatterjee172 commented Dec 20, 2018 •

edited

Loading

Uh oh!

reedwm commented Jan 3, 2019

Uh oh!

dchatterjee172 commented Jan 4, 2019 •

edited

Loading

Uh oh!

reedwm commented Jan 4, 2019

Uh oh!

sharan-amutharasu commented Mar 29, 2019

Uh oh!

MingleiLI commented Jan 15, 2020

Uh oh!

Wrong result when computing accuracy using tf.metrics.accuracy #15115

Wrong result when computing accuracy using tf.metrics.accuracy #15115

Comments

secsilm commented Dec 5, 2017

System information

Describe the problem

secsilm commented Dec 5, 2017

Uh oh!

jmaye commented Dec 5, 2017

Uh oh!

reedwm commented Dec 5, 2017

Uh oh!

secsilm commented Dec 6, 2017

Uh oh!

reedwm commented Dec 6, 2017

Uh oh!

secsilm commented Dec 7, 2017

Uh oh!

dchatterjee172 commented Dec 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reedwm commented Jan 3, 2019

Uh oh!

dchatterjee172 commented Jan 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reedwm commented Jan 4, 2019

Uh oh!

sharan-amutharasu commented Mar 29, 2019

Uh oh!

MingleiLI commented Jan 15, 2020

Uh oh!

dchatterjee172 commented Dec 20, 2018 •

edited

Loading

dchatterjee172 commented Jan 4, 2019 •

edited

Loading