Closed
Description
when I use multi_gpu_model ,I can train it on the first epoch
but when it comes to the 2nd epoch , there are errors about save_model and deepcopy on callbacks
could anyone help me ?
FYI :when I don't use multi_gpu_model ,it doesn't exist
keras :2.1.0
tensorflow 1.4.0
ubuntu16.04
my callback function is
checkpoint = ModelCheckpoint(weight_path,
monitor='val_acc',
verbose=1,
save_best_only=True, mode='max')
callbacks_list = [checkpoint]
parallel_model.fit_generator(nturgbd_train_datagen(augment),
steps_per_epoch=num_training_samples/batch_size+1,
epochs=epochs,
verbose=1,
callbacks=callbacks_list,
validation_data=nturgbd_test_datagen(),
validation_steps=samples_per_validation/batch_size+1,
)
here is the error
Epoch 00001: val_acc improved from -inf to 0.16544, saving model to weights/rot_lstm/cs/rot100/001_0.165.hdf5
Traceback (most recent call last):
File "VA_train.py", line 460, in <module>
train()
File "VA_train.py", line 406, in train
validation_steps=samples_per_validation/batch_size+1,
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/training.py", line 2136, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 73, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 414, in on_epoch_end
self.model.save(filepath, overwrite=True)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2556, in save
save_model(self, filepath, overwrite, include_optimizer)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/models.py", line 108, in save_model
'config': model.get_config()
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2397, in get_config
return copy.deepcopy(config)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 230, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
y.append(deepcopy(a, memo))
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
y.append(deepcopy(a, memo))
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 182, in deepcopy
rv = reductor(2)
TypeError: can't pickle NotImplementedType objects
Activity
birolkuyumcu commentedon May 17, 2018
https://keras.io/utils/#multi_gpu_model
try this
use like this
check_point = ParallelModelCheckpoint(single_model ,'best.hd5')
guiyuliu commentedon May 17, 2018
@birolkuyumcu thank you! actually problem was solved when I set "save_weights_only=True" ****
birolkuyumcu commentedon May 17, 2018
do you try to load saved weights ?
i think it doesnt work