Closed
Description
Hi there,
I am trying to implement a classification problem with three classes: 0,1 and 2. I would like to fine tune my cost function so that missclassification is weighted some how. In particular, predicting 1 instead of 2 should give twice the cost than predicting 0. writing it in a table format, it should be something like that:
Costs:
Predicted:
0 | 1 | 2
__________________________
Actual 0 | 0 | 0.25 | 0.25
1 | 0.25 | 0 | 0.5
2 | 0.25 | 0.5 | 0
I really like keras framework, it would be nice if it is possible to implement it and not having to dig into tensorflow or theano code.
Thanks
Activity
ayalalazaro commentedon Mar 29, 2016
Sorry, the table has lost its format, I am sending an image:

carlthome commentedon Mar 29, 2016
Similar: #2121
tboquet commentedon Mar 29, 2016
You could use class_weight.
ayalalazaro commentedon Mar 29, 2016
class_weight applies a weight to all data that belongs to the class, it should be dependent on the missclassification.
tboquet commentedon Mar 30, 2016
You are absolutely right, I'm sorry I misunderstood your question. I will try to come back with something tomorrow using
partial
to define the weights. What you want to achieve should be doable with Keras abstract backend.tboquet commentedon Mar 31, 2016
Ok so I had the time to quickly test it.
This is a fully reproducible example on mnist where we put a higher cost when a 1 is missclassified as a 7 and when a 7 is missclassified as a 1.
So if you want to pass constants included in the cost function, just build a new function with partial.
ayalalazaro commentedon Apr 1, 2016
Wow, that s nice. Thanks for the detailed answer!
tboquet commentedon Apr 1, 2016
Try to test it on a toy example to verify that it actually works. If it's what you are looking for, feel free to close the issue!
Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.
ayalalazaro commentedon Apr 2, 2016
Well, I am stuck, I can t make it run in my model, it says:
This is the model I am using:
tboquet commentedon Apr 4, 2016
Sure, sorry I was using Theano functionnalities. I replaced the following line in my previous example:
It should do the trick!
ayalalazaro commentedon Apr 5, 2016
Sounds the way to go, I was using tensorflow as backend. I tell you if it works as soon as posiible. Thanks!
ayalalazaro commentedon Apr 5, 2016
I still get an error:
I ve tried your first reply under theano backend and it works though.
tboquet commentedon Apr 5, 2016
Ok, I was not sure about how
K.shape
would behave with TensorFlow. It seems you should use:ayalalazaro commentedon Apr 6, 2016
I get more or less the same:
It seems like it cannot get the shape of y_pred as an integer , right?
tboquet commentedon Apr 6, 2016
Mm, ok I will take a look at it today and work directly with tensors to try to find a way to have it work properly for both backend.
83 remaining items
william-allen-harris commentedon Sep 13, 2020
Hello does anyone know how to do this for sparse categorical crossentropy?
hiyamgh commentedon Feb 7, 2021
Hello, thank you for this awesome thread. I have a small question though, I am trying to implement this solution in
tensorflow
rather than inkeras backend
. My question is that are these applied to the logits (the output of the last layer of the neural network which are raw values (not probabilities) that we did NOT apply softmax to) or are these the probabilities (AFTER softmax)?In other words, is
K.categorical_cross_entropy
the equivalent oftf.nn.softmax_cross_entropy_with_logits
or not ?Thank you in advance.
hiyamgh commentedon Feb 7, 2021
I guess I found the answer,
I have seen the documentation of
tf.backend.categorical_cross_entropy
and it states the following:I will just do in on the logits then?
TejashwiniDuluri commentedon Feb 10, 2021
yeah even I have the same question on this.
isaranto commentedon Apr 8, 2021
With tf.keras implementation I would propose a more vectorized approach (avoid the for loop):
You can modify the above to fit your needs but it worked for me with an example weight matrix, where you want to achieve some cost-sensitive learning where certain mispredictions are more/less important than others.
rafaelagrc commentedon Apr 12, 2021
Hello. How can we implement this for Sparse Categorical Cross Entropy?
jsbryaniv commentedon Nov 17, 2021
I would also like to know how to implement this for SparseCategoricalCrossEntropy
RachelRamirez commentedon Feb 14, 2023
Has anyone verified the code above works? If so can they share a minimal working example? @isaranto have you verified your vectorized approach of the original method works on the MNIST network example as given? I put a high weight on the misclassification that naturally seems to be highest when running the dense neural network given by @tboquet and the results are not intuitive, as in, the number of misclassifications does not decrease. I've compared the confusion matrix of results for using weights on w[7,9] =1, 1.1, 1.2, 1.5, 1.7, 2, ... 10, ... 100 and one would expect the number of misclassification on [7,9], to decrease as the weight increases, but there doesn't seem to be a consistent pattern and if anything it seems like 7 out of 30 times I run the results, the misclassification for [7,9] increases dramatically (like from 20 to 386). So I tried negative numbers, and that did have the immediate effect of decreasing the misclassification rates. However, using negative numbers isn't consistent with any of the above discussion.
Here's the code I've used - it's long so I posted a link to my Public Google Colab Notebook: https://github.com/RachelRamirez/misclassification_matrix/blob/main/w%5B7%2C9%5D%3D100_Misclassification_Cost_Matrix_Example.ipynb
This is the output of one of the worst confusion matrixes (run 14) using w[7,9]=100. It seems like its rewarding the misclassification instead of the reverse.
CM:
isaranto commentedon Feb 14, 2023
Hey @RachelRamirez , it has been 2 years since I wrote that comment so don't remember all that well.
The thing is that it works, meaning that training is done right, but what is not trivial at all are the weight values that you are going to put over there. I think playing around some high and low weights and checking what happens to the confusion matrix and your metrics
RachelRamirez commentedon Feb 14, 2023
Thank you for the quick reply. I have played with lots of weights, and all of the numbers seem to reward misclassifications until I use a negative weight, which isn't consistent with the comments above. I wish I could comment on a more recent thread but this seems to be the only issue that addresses misclassifications and is continually referenced in all the other Kera's threads.
PhilAlton commentedon Feb 14, 2023
Hi @RachelRamirez - I verified this class https://stackoverflow.com/a/61963004 extensively at the time... Hopefully provides a starting point for your specific problem?
RachelRamirez commentedon Feb 15, 2023
@PhilAlton Thanks! I verified your process works in line with how I expected it to work using the MNIST example! If I raise the cost of a misclassification, the resulting costly misclassification goes down. I still wish Keras would make this a more easy to implement feature.
PhilAlton commentedon Feb 16, 2023
Yep @RachelRamirez - I remember this being a real pain at the time! Tbh, we might be massively overcomplicating this... Loss functions do take a "sample_weights" argument, but it's not well documented (imo). It wasn't 100% clear to me if this was equivalent to class weights, plus I only discovered this when I had my own implementation working...
eliadl commentedon Feb 16, 2023
@PhilAlton Loss functions support a
sample_weights
argument only in their__call__
method, but not in__init__
. (example)That's basically why we needed this #2115 (comment).
PhilAlton commentedon Feb 16, 2023
@eliadl - ah yes, it's all coming back to me now! @RachelRamirez - if you were sufficiently motivated, you could raise a pull request to get this included... Not something I've done before! (imbalanced problems are very common, though accessing via call us clearly TF/Keras' preferred approach, eg: https://keras.io/examples/structured_data/imbalanced_classification/ - though it's not intuitive that the weights should be passed through model.fit)