Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom loss function y_true y_pred shape mismatch #4781

Closed
RishabGargeya opened this issue Dec 21, 2016 · 13 comments
Closed

Custom loss function y_true y_pred shape mismatch #4781

RishabGargeya opened this issue Dec 21, 2016 · 13 comments

Comments

@RishabGargeya
Copy link

Hello,

I am trying to create a custom loss function in Keras, where the target values for my network and the output of my network are of different shapes. Here is the custom loss function I have defined:

def custom_loss(y_true, y_pred):
    sml= T.nnet.sigmoid( - y_pred )
    s1ml= T.nnet.sigmoid( 1.0 -y_pred )
    a = sml
    b = s1ml - sml
    c = 1.0 - s1ml
    p = T.stack((a,b,c), axis=1)
    part1 =  np.log(p + 1.0e-20)
    part2 = y_true * part1
    cost = -(part2).sum()
    return cost

y_pred is of shape (batch_size, 1) and y_true is of shape (batch_size,3), and I aim to calculate a single error value using the above code. However, Keras gives me the following error:

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{Composite{EQ(i0, RoundHalfAwayFromZero(i1))}}(dense_3_target, Elemwise{Add}[(0, 0)].0)
Toposort index: 83
Inputs types: [TensorType(float32, matrix), TensorType(float32, matrix)]
Inputs shapes: [(1001, 3), (1001, 1)]
Inputs strides: [(12, 4), (4, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Sum{acc_dtype=int64}(Elemwise{Composite{EQ(i0, RoundHalfAwayFromZero(i1))}}.0)]]

Does Keras not allow you to have different y_true and y_pred shapes? My cost function requires a singular output of my network and must calculate the cost against a y_true matrix of shape (batch_size,3).

Here is the output of model.summary():

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
===================================================================================================
convolution2d_1 (Convolution2D)    (None, 30, 1, 591)  1830        convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)      (None, 30, 1, 147)  0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)    (None, 30, 1, 138)  9030        maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)      (None, 30, 1, 34)   0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)    (None, 30, 1, 25)   9030        maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D)      (None, 30, 1, 6)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)                (None, 180)         0           maxpooling2d_3[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                    (None, 20)          3620        flatten_1[0][0]                  
____________________________________________________________________________________________________
activation_1 (Activation)          (None, 20)          0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                    (None, 20)          420         activation_1[0][0]               
____________________________________________________________________________________________________
activation_2 (Activation)          (None, 20)          0           dense_2[0][0]                    
____________________________________________________________________________________________________
dense_3 (Dense)                    (None, 1)           21          activation_2[0][0]               
====================================================================================================
Total params: 23951

Thank you for the help!

@RishabGargeya RishabGargeya changed the title Custom objective function shape mismatch Custom loss function y_true y_pred shape mismatch Dec 21, 2016
@bstriner
Copy link
Contributor

Short of hacking into Keras internals, easiest solution is to pad the output to match the shape of the target. Add a lambda layer to pad 0s or use RepeatVector.

Alternatively, add a dummy output that matches the target shape, so there are two outputs. Train with a dummy target and the real target so Keras doesn't complain about the shapes. You will need to directly get the tensors from within the loss function and ignore the ytrue and ypred.

This is a common issue in Keras but you can usually get around it by dummy outputs and targets.

Cheers,
Ben

@stale stale bot added the stale label May 23, 2017
@ssfrr
Copy link

ssfrr commented May 31, 2017

Is there any plan to relax this restriction? It seems like when you're writing a custom loss function it's not uncommon that you're doing some complicated comparison, not just seeing how close your model output is to some target.

@stale stale bot removed the stale label May 31, 2017
@bstriner
Copy link
Contributor

Not sure if any better solutions out there or any plans. It is easy enough to add custom losses by just adding them to the model. The problem is if this ends up meaning you don't need any targets, there is nothing to pass for the outputs.

@ssfrr let's open a feature request for keras-contrib and continue the discussion there. Need some subclass of Model that supports dummy outputs. It can directly interpret an output tensor as a loss, in which case the corresponding target is not required. Shouldn't be too hard to put together.

If it looks good we can always try to push it back into keras.

Cheers

@bstriner
Copy link
Contributor

bstriner commented Jun 1, 2017

After a little more reading, it looks like setting loss weight to None will drop the tensor. Did not know that was a feature.

Something like this might work but haven't tested yet. Set the loss weight to None, then separately add the loss to the model and add the loss as a metric. Then it will still be used as a loss but it will not require a target. There is some skip_indices logic in training that I am reading through.

@bstriner
Copy link
Contributor

bstriner commented Jun 1, 2017

Wow! @ssfrr @RishabGargeya so this is a little weird architecturally and I didn't think it would work but try the below code. It trains a model where the inputs are x and y (not one-hot), and the targets are None.

@fchollet do you have any thoughts on how to approach this type of problem? In some situations, like sequence learning, you need your output sequence to also be an Input so you can use it in an RNN, and you don't want the redundancy of it being both an input and a target. I had been using dummy targets, but that still meant I had to pass zeros or something to train, which is kind of awkward. This is also the kind of thing you might do if you don't want to one-hot encode your targets.

I had no idea about how to skip outputs. Maybe need more examples or docs about that feature.

The below approach works for passing your target as an input but it is verbose and you have to add the losses and the metrics in the right order. If there isn't something significantly better, I can abstract it into a custom model.

import keras.backend as K
from keras.callbacks import CSVLogger
from keras.datasets import mnist
from keras.layers import Input, Lambda, Dense, Flatten, BatchNormalization, Activation
from keras.models import Model


def main():
    # Both inputs and targets are `Input` tensors
    input_x = Input((28, 28), name='input_x', dtype='uint8')  # uint8 [0-255]
    y_true = Input((1,), name='y_true', dtype='uint8')  # uint8 [0-9]
    # Build prediction network as usual
    h = Flatten()(input_x)
    h = Lambda(lambda _x: K.cast(_x, 'float32'),
               output_shape=lambda _x: _x,
               name='cast')(h)  # cast uint8 to float32
    h = BatchNormalization()(h)  # normalize pixels
    for i in range(3):  # hidden relu and batchnorm layers
        h = Dense(256)(h)
        h = BatchNormalization()(h)
        h = Activation('relu')(h)
    y_pred = Dense(10, activation='softmax', name='y_pred')(h)  # softmax output layer
    # Lambda layer performs loss calculation (negative log likelihood)
    loss = Lambda(lambda (_yt, _yp): -K.log(_yp[K.reshape(K.arange(K.shape(_yt)[0]), (-1, 1)), _yt] + K.epsilon()),
                  output_shape=lambda (_yt, _yp): _yt,
                  name='loss')([y_true, y_pred])

    # Model `inputs` are both x and y. `outputs` is the loss.
    model = Model(inputs=[input_x, y_true], outputs=[loss])
    # Manually add the loss to the model. Required because the loss_weight will be None.
    model.add_loss(K.sum(loss, axis=None))
    # Compile with the loss weight set to None, so it will be omitted
    model.compile('adam', loss=[None], loss_weights=[None])
    # Add accuracy to the metrics
    # Cannot add as a metric to compile, because metrics for skipped outputs are skipped
    accuracy = K.mean(K.equal(K.argmax(y_pred, axis=1), K.flatten(y_true)))
    model.metrics_names.append('accuracy')
    model.metrics_tensors.append(accuracy)
    # Model summary
    model.summary()

    # Train model
    train, test = mnist.load_data()
    cb = CSVLogger("mnist_training.csv")
    model.fit(list(train), [None], epochs=300, batch_size=64, callbacks=[cb], validation_data=(list(test), [None]))


if __name__ == "__main__":
    main()

Cheers

@bstriner
Copy link
Contributor

bstriner commented Jun 1, 2017

For now I think just using dummy targets where your loss is lambda _yt, _yp: _yp is the easiest for anyone who doesn't want to play with internals. Just pass whatever as the target as long as it is the right shape.

@waleedka
Copy link
Contributor

waleedka commented Jun 4, 2017

@bstriner Thanks! I've been looking for this as well and this saved me a lot of time.

@stale
Copy link

stale bot commented Sep 2, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@zach-nervana
Copy link
Contributor

Is there any chance this will get supported in a more natural way? This is quite a hack

@Pfaeff
Copy link

Pfaeff commented May 14, 2018

I am trying to use a custom loss function that gets two tensor of different shapes and returns a single value. When compiling the model, I tell keras to use the identity function as the loss function. The actual loss function is inside the model, which has two inputs: one for the data and one for the labels. It seems to work fine, but the model does not converge properly. I am guessing it has something to do with the model outputting a single scalar and keras somehow trying to match that to my dummy label vector that has 'batch_size' many entries. When I use tensorflow to train my model directly, everything works and converges just fine. There must be something happening behind the scenes that messes with the gradient and I can't figure out a solution.

@staghvaeeyan
Copy link

@bstriner, Any updates on this issue? Another use case is quantile regression when predicting multiple quantiles from one y_true.

@gledsonmelotti
Copy link

gledsonmelotti commented Feb 19, 2020

Wow! @ssfrr @RishabGargeya so this is a little weird architecturally and I didn't think it would work but try the below code. It trains a model where the inputs are x and y (not one-hot), and the targets are None.

@fchollet do you have any thoughts on how to approach this type of problem? In some situations, like sequence learning, you need your output sequence to also be an Input so you can use it in an RNN, and you don't want the redundancy of it being both an input and a target. I had been using dummy targets, but that still meant I had to pass zeros or something to train, which is kind of awkward. This is also the kind of thing you might do if you don't want to one-hot encode your targets.

I had no idea about how to skip outputs. Maybe need more examples or docs about that feature.

The below approach works for passing your target as an input but it is verbose and you have to add the losses and the metrics in the right order. If there isn't something significantly better, I can abstract it into a custom model.

import keras.backend as K
from keras.callbacks import CSVLogger
from keras.datasets import mnist
from keras.layers import Input, Lambda, Dense, Flatten, BatchNormalization, Activation
from keras.models import Model


def main():
    # Both inputs and targets are `Input` tensors
    input_x = Input((28, 28), name='input_x', dtype='uint8')  # uint8 [0-255]
    y_true = Input((1,), name='y_true', dtype='uint8')  # uint8 [0-9]
    # Build prediction network as usual
    h = Flatten()(input_x)
    h = Lambda(lambda _x: K.cast(_x, 'float32'),
               output_shape=lambda _x: _x,
               name='cast')(h)  # cast uint8 to float32
    h = BatchNormalization()(h)  # normalize pixels
    for i in range(3):  # hidden relu and batchnorm layers
        h = Dense(256)(h)
        h = BatchNormalization()(h)
        h = Activation('relu')(h)
    y_pred = Dense(10, activation='softmax', name='y_pred')(h)  # softmax output layer
    # Lambda layer performs loss calculation (negative log likelihood)
    loss = Lambda(lambda (_yt, _yp): -K.log(_yp[K.reshape(K.arange(K.shape(_yt)[0]), (-1, 1)), _yt] + K.epsilon()),
                  output_shape=lambda (_yt, _yp): _yt,
                  name='loss')([y_true, y_pred])

    # Model `inputs` are both x and y. `outputs` is the loss.
    model = Model(inputs=[input_x, y_true], outputs=[loss])
    # Manually add the loss to the model. Required because the loss_weight will be None.
    model.add_loss(K.sum(loss, axis=None))
    # Compile with the loss weight set to None, so it will be omitted
    model.compile('adam', loss=[None], loss_weights=[None])
    # Add accuracy to the metrics
    # Cannot add as a metric to compile, because metrics for skipped outputs are skipped
    accuracy = K.mean(K.equal(K.argmax(y_pred, axis=1), K.flatten(y_true)))
    model.metrics_names.append('accuracy')
    model.metrics_tensors.append(accuracy)
    # Model summary
    model.summary()

    # Train model
    train, test = mnist.load_data()
    cb = CSVLogger("mnist_training.csv")
    model.fit(list(train), [None], epochs=300, batch_size=64, callbacks=[cb], validation_data=(list(test), [None]))


if __name__ == "__main__":
    main()

Cheers

@bstriner Taking advantage of your idea of the above algorithm, I would like to know if it is possible to have one classifier inside the other, for example, use the last layer to obtain an SVM. I tried to do this using a custom cost function, but unfortunately I always get an error with the tensor format. Can you help me? I present the algorithm below.

# Classification block
x = GlobalAveragePooling2D()(x)
x = Dense(4096, kernel_regularizer=l2(1e-4), name='Dense_1')(x)
x = Activation('relu', name='relu1')(x)
x = Dropout(DROPOUT)(x)
x = Dense(4096, kernel_regularizer=l2(1e-4), name='Dense_2')(x)
x = Activation('relu', name='relu2')(x)
model_output = Dropout(DROPOUT)(x)
model = Model(model_input, model_output)
model.summary()
import tensorflow as tf
from keras import backend as K
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from keras.losses import categorical_hinge
def custom_loss_value(y_true, y_pred):
    X = K.eval(y_pred)
    print(X)
    Y = np.ravel(K.eval(y_true))
    Predict = []
    Prob = []
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    param_grid = {'C': [0.1, 1, 8, 10], 'gamma': [0.001, 0.01, 0.1, 1]}
    SVM = GridSearchCV(SVC(kernel='rbf',probability=True), cv=3, param_grid=param_grid, scoring='auc', verbose=1)
    SVM.fit(X, Y)
    Final_Model = SVM.best_estimator_
    Predict = Final_Model.predict(X)
    Prob = Final_Model.predict_proba(X)
    return categorical_hinge(tf.convert_to_tensor(Y, dtype=tf.float32), tf.convert_to_tensor(Predict, dtype=tf.float32))

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True) 
model.compile(loss=custom_loss_value, optimizer=sgd, metrics=['accuracy'])

@edmondja
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants