Skip to content

Wrong result when computing accuracy using tf.metrics.accuracy #15115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
secsilm opened this issue Dec 5, 2017 · 12 comments
Closed

Wrong result when computing accuracy using tf.metrics.accuracy #15115

secsilm opened this issue Dec 5, 2017 · 12 comments

Comments

@secsilm
Copy link
Contributor

secsilm commented Dec 5, 2017

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 1709
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): 1.4.0
  • Python version: Python 3.5.2 :: Anaconda custom (64-bit)

Describe the problem

I found the result tf.metrics.accuracy returns is incorrect when I trained my model. To verify this I wrote a simple program.

import tensorflow as tf

sess = tf.Session()
labels = tf.placeholder(tf.int32)
predictions = tf.placeholder(tf.int32)
acc, _ = tf.metrics.accuracy(labels, predictions)
my_acc = tf.reduce_mean(tf.cast(tf.equal(labels, predictions), tf.float32))

feed_dict = {
    labels: [1, 2, 3, 4, 5], 
    predictions: [1, 2, 3, 4, 5]
}
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

sess.run(acc, feed_dict)  # 0.0
sess.run(my_acc, feed_dict)  # 1.0

You can see that acc and my_acc is different and acc is wrong. I double checked the doc and still confused. Is there anything I missed? Thank you.

@secsilm
Copy link
Contributor Author

secsilm commented Dec 5, 2017

But the result in evaluation mode is correct while wrong in train mode. I'm using tf.estimator and tf.data.TFRecordDataset. The following are relative codes.

def cifar_model_fn(features, labels, mode):
    # some other codes

    logits = tf.layers.dense(inputs=dropout, units=10)
    predictions = {
        'classes': tf.argmax(input=logits, axis=1, name='classes'),
        'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
    }
    onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
    loss = tf.losses.softmax_cross_entropy(onehot_labels, logits)
    accuracy, update_op = tf.metrics.accuracy(labels=labels, predictions=predictions['classes'], name='accuracy')
    # my way to compute accuracy, which gives correct result when training
    my_acc = tf.reduce_mean(tf.cast(tf.equal(tf.cast(labels, tf.int64), predictions['classes']), tf.float32))

    if mode == tf.estimator.ModeKeys.TRAIN:
        tensors_to_log = {
            'Accuracy': accuracy,
            'My accuracy': my_acc}
        logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=100)
        optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
        train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op, training_hooks=[logging_hook])

    eval_metric_ops = {
        'accuracy': (accuracy, update_op)
    }
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

def main():
    # create dataset ....

    cifar10_classifier = tf.estimator.Estimator(model_fn=cifar_model_fn, model_dir=FLAGS.model_dir)
    cifar10_classifier.train(input_fn=train_input_fn)
    eval_results = cifar10_classifier.evaluate(input_fn=eval_input_fn)

My purpose is to log training accuracy when training. The following is a part of the console output. Accuracy is always 0.0.

INFO:tensorflow:loss = 1.08232, step = 7301 (9.312 sec)
INFO:tensorflow:My accuracy = 0.59375, Accuracy = 0.0 (9.311 sec)
INFO:tensorflow:global_step/sec: 10.8054
INFO:tensorflow:loss = 1.29713, step = 7401 (9.257 sec)
INFO:tensorflow:My accuracy = 0.484375, Accuracy = 0.0 (9.256 sec)
INFO:tensorflow:global_step/sec: 11.4928
INFO:tensorflow:loss = 1.31355, step = 7501 (8.693 sec)
INFO:tensorflow:My accuracy = 0.578125, Accuracy = 0.0 (8.693 sec)
INFO:tensorflow:global_step/sec: 12.1267
INFO:tensorflow:loss = 1.25855, step = 7601 (8.252 sec)
INFO:tensorflow:My accuracy = 0.515625, Accuracy = 0.0 (8.254 sec)
INFO:tensorflow:global_step/sec: 12.0489
INFO:tensorflow:loss = 1.21857, step = 7701 (8.303 sec)
INFO:tensorflow:My accuracy = 0.5625, Accuracy = 0.0 (8.304 sec)
INFO:tensorflow:global_step/sec: 12.1971
INFO:tensorflow:loss = 0.983289, step = 7801 (8.196 sec)
INFO:tensorflow:My accuracy = 0.6875, Accuracy = 0.0 (8.195 sec)

@jmaye
Copy link

jmaye commented Dec 5, 2017

Beware that tf.metrics.accuracy is computed over an entire session run, not per batch. I just kicked it out because I did not get the utility of it. Instead, I'm doing the same thing as you with reduce_mean to have batch accuracy.

@reedwm
Copy link
Member

reedwm commented Dec 5, 2017

As @jmaye said, tf.metrics.accuracy is not meant to compute the accuracy of a single batch. It returns both the accuracy and an update_op, and update_op is intended to be run every batch, which updates the accuracy. See #9498 for more discussion on this.

@reedwm reedwm closed this as completed Dec 5, 2017
@secsilm
Copy link
Contributor Author

secsilm commented Dec 6, 2017

@reedwm @jmaye Thank you for your reply. I read #9498 and understand that update_op is called every batch. But I still have one question: When the accuracy is computed?

The doc says:

The accuracy function creates two local variables, total and count that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as accuracy: an idempotent operation that simply divides total by count.

but doesn't say when the accuracy is computed, i.e. divides total by count.

In my codes above, the accuracy is computed when finishing evaluation progress. But is it supposed to computing the accuracy when finishing the entire training progress?

@reedwm
Copy link
Member

reedwm commented Dec 6, 2017

accuracy is recomputed everytime it is evaluated.

How tf.metrics.accuracy works in general is that it maintains two variables, total and count, each which starts at 0. Whenever accuracy is evaluated, it returns total / count (or 0 if count is 0), and does not modify total or count. accuracy also does not look labels or predictions. It just does a division.

When update_op is evaluated, it uses labels and predictions to increase total and count. It increases total by the number of predictions that match the labels, and increases count by the total number of predictions. The idea is that you run update_op for every new batch of labels and predictions you get. Then, whenever you want to check to accuracy of all the batches you've seen so far, you evaluate accuracy, which will have the current accuracy of the batches seen so far. accuracy will change only when you run update_op, since only running update_op modifies total and count.

In your Estimators case, update_op is never run during training, only during evaluation. The Estimator will automatically call the update_op every batch when you return the EstimatorSpec with eval_metric_ops during evaluation, but it does not during training.

@secsilm
Copy link
Contributor Author

secsilm commented Dec 7, 2017

@reedwm Thank you for your detailed explanation. I got the idea.

@dchatterjee172
Copy link

dchatterjee172 commented Dec 20, 2018

@reedwm @jmaye
if mode == tf.estimator.ModeKeys.EVAL:
a = tf.random.uniform(dtype=tf.int32, maxval=1000, shape=[])
eval_metric_ops = {
"eval_mean_bert_loss": tf.metrics.mean(total_loss_bert),
"eval_mean_original_loss": tf.metrics.mean(total_loss_original),
"eval_mean_loss": tf.metrics.mean(a)}
output_spec = tf.estimator.EstimatorSpec(
mode=mode,
loss=a,
eval_metric_ops=eval_metric_ops)

eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn, steps=100, start_delay_secs=0, throttle_secs=120)

output
INFO:tensorflow:Saving dict for global step 2000: eval_mean_bert_loss = 4.9399266, eval_mean_loss = 483.38, eval_mean_original_loss = 4.982164, global_step = 2000, loss = 483.38
how can loss and eval_mean_loss be exactly same? batch size is 1 here. Shouldn't loss be the value of a at 100th batch, and eval_mean_loss be the mean of the all 100 a ?

@reedwm
Copy link
Member

reedwm commented Jan 3, 2019

@dchatterjee172, can you provide a self-contained, relatively short example I can run to reproduce? It does seem like loss should be different from eval_mean_loss. I am also unsure how loss can be a non-integer, since you passed an int32 value for loss.

@dchatterjee172
Copy link

dchatterjee172 commented Jan 4, 2019

@reedwm
tf.metrics.mean will cast the values argument to float.
https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/python/ops/metrics_impl.py#L392

@reedwm
Copy link
Member

reedwm commented Jan 4, 2019

Even with the cast in tf.metrics.mean, you pass loss=a, which is an int. In any case, a self-contained example would make this easier to debug.

@sharan-amutharasu
Copy link

Beware that tf.metrics.accuracy is computed over an entire session run, not per batch. I just kicked it out because I did not get the utility of it. Instead, I'm doing the same thing as you with reduce_mean to have batch accuracy.

It can be helpful when there is a lot of oscillation in batch accuracy. The aggregated accuracy will be a lot smoother and so it's easier to observe learning from it's values.

@MingleiLI
Copy link

why the value returned by tf.metrics.accuracy is always 0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants