Closed
Description
Using multi_gpu to train model :
def multi_gpu_test_simple_model():
print('####### test simple model')
num_samples = 1000
input_dim = 10
output_dim = 1
hidden_dim = 10
gpus = 2
epochs = 8
model = keras.models.Sequential()
model.add(keras.layers.Dense(hidden_dim,
input_shape=(input_dim,)))
model.add(keras.layers.Dense(output_dim))
x = np.random.random((num_samples, input_dim))
y = np.random.random((num_samples, output_dim))
parallel_model = multi_gpu_model(model, gpus=gpus)
from keras.callbacks import ModelCheckpoint
parallel_model.compile(loss='mse', optimizer='rmsprop')
parallel_model.fit(x, y, epochs=epochs)
from keras.models import save_model
save_model(parallel_model, '1.h5', overwrite=True, include_optimizer=True)
Error : can't pickle NoImplementedType objects
please solve this problem
Activity
[-]Some Errors : using multi_gpu : can't save_model[/-][+][BUG]Some Errors : using multi_gpu : can't save_model[/+]gabrielleyr commentedon Nov 1, 2017
I had the same error when saving the whole model. Try saving the weights only:
fchollet commentedon Nov 1, 2017
kuba-lilz commentedon Nov 2, 2017
@fchollet I tried doing exactly that. Created a simple custom version of ModelCheckpoint that saves original model on epoch end. It breaks with
Type error: can't pickle module objects
errorkuba-lilz commentedon Nov 2, 2017
Here's a minimal code (with parts irrelevant to the problem ignored) I used to saved the base model while training multi_gpu_model:
When I use
model.save_weights(...)
in callback, like in code above, model can be successfully saved. When using justmodel.save(...)
, code breaks with error from previous post.However, even when using
model.save_weights(...)
, loading model withmodel.load_weights(...)
fails withMy tensorflow version is 1.3.0 and Keras version is 2.0.8
kuba-lilz commentedon Nov 2, 2017
Looking at
keras.callbacks.ModelCheckpoint
class, it usesself.model
object inon_epoch_end(...)
and calls itssave(...)
andsave_weights(...)
functions, as appropriate. I don't see how does it obtain theself.model
object though.Please correct me if I'm wrong, but maybe some magic is going under the hood of
save(...)/save_weights(...)
calls that dynamically assigns meaning tomodel
object based on Tensorflow graph variables. Maybemodel.save_weight()
is not looking atmodel
object at all - maybe it's fetching leaf tensors from Tensorflow graph? In that case even if I do provide a valid CPU-build model object, what gets saved is what magic insidesave_weight()
found, which might be multi_gpu_model tensors.PBehr commentedon Nov 4, 2017
As mentioned in #8123 the problem is the: import tensorflow as tf line in the multi_gpu_model function.
jmcconnell commentedon Nov 8, 2017
I've followed fchollet's advice about saving the model and weights separately. Wound up with this, which is working fine. You'll have to update
original_model()
to pull the correct layer for your architecture.shu-hai commentedon Nov 12, 2017
@jmcconnell , I use functional API to create the model. When I use your code blow, it gives error:
in get_layer ValueError: No such layer: sequential_1.
I am frustrated about saving the model/weight of each epoch when using multi_gpu_model. It always gives some kinds of errors.
shu-hai commentedon Nov 12, 2017
@fchollet, would you please give an example for how to save the model/weights trained by each epoch when using multi_gpu_model?
I always got some errors.
EPellegrini87 commentedon Nov 13, 2017
@PBehr 's suggestion worked for me.
jmcconnell commentedon Nov 14, 2017
@shu-hai yeah, I mentioned you'll have to update
original_model()
to work with your architecture. I'm sure there is a more generalized way of grabbing the correct layer, but I am new to Keras and didn't have time to look into it.Run a
print(parallel_model.summary())
to see what the layer for your original model is called.7 remaining items