Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

call_back_error when using multi_gpu_model #10218

Closed
guiyuliu opened this issue May 17, 2018 · 3 comments
Closed

call_back_error when using multi_gpu_model #10218

guiyuliu opened this issue May 17, 2018 · 3 comments

Comments

@guiyuliu
Copy link

guiyuliu commented May 17, 2018

when I use multi_gpu_model ,I can train it on the first epoch
but when it comes to the 2nd epoch , there are errors about save_model and deepcopy on callbacks
could anyone help me ?
FYI :when I don't use multi_gpu_model ,it doesn't exist

keras :2.1.0
tensorflow 1.4.0
ubuntu16.04

my callback function is

checkpoint = ModelCheckpoint(weight_path,
                               monitor='val_acc',
                               verbose=1,
                               save_best_only=True, mode='max')
callbacks_list = [checkpoint]

parallel_model.fit_generator(nturgbd_train_datagen(augment),
                      steps_per_epoch=num_training_samples/batch_size+1,  
                      epochs=epochs,
                      verbose=1,
                      callbacks=callbacks_list,
                      validation_data=nturgbd_test_datagen(),
                      validation_steps=samples_per_validation/batch_size+1,

                     )

here is the error

Epoch 00001: val_acc improved from -inf to 0.16544, saving model to weights/rot_lstm/cs/rot100/001_0.165.hdf5
Traceback (most recent call last):
  File "VA_train.py", line 460, in <module>
    train()
  File "VA_train.py", line 406, in train
    validation_steps=samples_per_validation/batch_size+1,
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/training.py", line 2136, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 73, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 414, in on_epoch_end
    self.model.save(filepath, overwrite=True)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2556, in save
    save_model(self, filepath, overwrite, include_optimizer)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/models.py", line 108, in save_model
    'config': model.get_config()
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2397, in get_config
    return copy.deepcopy(config)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 182, in deepcopy
    rv = reductor(2)
TypeError: can't pickle NotImplementedType objects

@birolkuyumcu
Copy link

https://keras.io/utils/#multi_gpu_model

Save model via the template model (which shares the same weights)

try this

class ParallelModelCheckpoint(ModelCheckpoint):
    def __init__(self,model,filepath, monitor='val_loss', verbose=0,
                 save_best_only=False, save_weights_only=False,
                 mode='auto', period=1):
		self.single_model = model
		super(ParallelModelCheckpoint,self).__init__(filepath, monitor, verbose,save_best_only, save_weights_only,mode, period)

    def set_model(self, model):
        super(ParallelModelCheckpoint,self).set_model(self.single_model)

use like this
check_point = ParallelModelCheckpoint(single_model ,'best.hd5')

@guiyuliu
Copy link
Author

@birolkuyumcu thank you! actually problem was solved when I set "save_weights_only=True" ****

@birolkuyumcu
Copy link

birolkuyumcu commented May 17, 2018

do you try to load saved weights ?
i think it doesnt work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants