Skip to content

call_back_error when using multi_gpu_model  #10218

Closed
@guiyuliu

Description

@guiyuliu

when I use multi_gpu_model ,I can train it on the first epoch
but when it comes to the 2nd epoch , there are errors about save_model and deepcopy on callbacks
could anyone help me ?
FYI :when I don't use multi_gpu_model ,it doesn't exist

keras :2.1.0
tensorflow 1.4.0
ubuntu16.04

my callback function is

checkpoint = ModelCheckpoint(weight_path,
                               monitor='val_acc',
                               verbose=1,
                               save_best_only=True, mode='max')
callbacks_list = [checkpoint]

parallel_model.fit_generator(nturgbd_train_datagen(augment),
                      steps_per_epoch=num_training_samples/batch_size+1,  
                      epochs=epochs,
                      verbose=1,
                      callbacks=callbacks_list,
                      validation_data=nturgbd_test_datagen(),
                      validation_steps=samples_per_validation/batch_size+1,

                     )

here is the error

Epoch 00001: val_acc improved from -inf to 0.16544, saving model to weights/rot_lstm/cs/rot100/001_0.165.hdf5
Traceback (most recent call last):
  File "VA_train.py", line 460, in <module>
    train()
  File "VA_train.py", line 406, in train
    validation_steps=samples_per_validation/batch_size+1,
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/training.py", line 2136, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 73, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 414, in on_epoch_end
    self.model.save(filepath, overwrite=True)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2556, in save
    save_model(self, filepath, overwrite, include_optimizer)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/models.py", line 108, in save_model
    'config': model.get_config()
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2397, in get_config
    return copy.deepcopy(config)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 182, in deepcopy
    rv = reductor(2)
TypeError: can't pickle NotImplementedType objects

Activity

birolkuyumcu

birolkuyumcu commented on May 17, 2018

@birolkuyumcu

https://keras.io/utils/#multi_gpu_model

Save model via the template model (which shares the same weights)

try this

class ParallelModelCheckpoint(ModelCheckpoint):
    def __init__(self,model,filepath, monitor='val_loss', verbose=0,
                 save_best_only=False, save_weights_only=False,
                 mode='auto', period=1):
		self.single_model = model
		super(ParallelModelCheckpoint,self).__init__(filepath, monitor, verbose,save_best_only, save_weights_only,mode, period)

    def set_model(self, model):
        super(ParallelModelCheckpoint,self).set_model(self.single_model)

use like this
check_point = ParallelModelCheckpoint(single_model ,'best.hd5')

guiyuliu

guiyuliu commented on May 17, 2018

@guiyuliu
Author

@birolkuyumcu thank you! actually problem was solved when I set "save_weights_only=True" ****

birolkuyumcu

birolkuyumcu commented on May 17, 2018

@birolkuyumcu

do you try to load saved weights ?
i think it doesnt work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @fchollet@birolkuyumcu@guiyuliu

        Issue actions

          call_back_error when using multi_gpu_model · Issue #10218 · keras-team/keras