Closed
Description
Hey guys,
I was wondering how are the initial internal states in a recurrent layer dealt with? So far it appears they are reset are every run. Is there any way to preserve them?
I'd like to be able to feed a .predict_proba() function data one time step at a time for a time series task, as the points come in, without also feeding the entire history all over again. Is this somehow possible? Thanks
Activity
ssamot commentedon May 31, 2015
Adding on top of the comment above - it should be possible to train incrementally as well, if you can keep state. Otherwise one would possibly need to remember really large sequences and replay them if training online.
fchollet commentedon May 31, 2015
I agree that we need some form of stateful RNNs, that would be particularly useful for sequence generation. It's unclear yet if that should be setup as a mode of existing RNNs, or as different layers altogether.
Anybody interested in looking at it?
ssamot commentedon May 31, 2015
Well - my personal preference would be to use the same layer - something along the lines of
stateless = true
by default. You would only need to somehow preserveh_tm1, c_tm1
from calls between train - right?ssamot commentedon May 31, 2015
OK - I'll try doing an implementation/test with simple character generation and use the old state to init ``theano.scan'. Let's see...
ssamot commentedon Jun 1, 2015
This seems to require possibly a much deeper re-write than I anticipated initially - whatever
theano.scan
uses as initial input seems to be compiled once and stays there - one might need to save state in shared variables somehow? Any ideas?lemuriandezapada commentedon Jun 1, 2015
Is there no way to read or set the internal states by hand?
I wouldn't really touch the training procedure, but the running/prediction procedure needs to have some sort of persistent mode.
ssamot commentedon Jun 1, 2015
The reading you can do, but not the setting - AFAIK - the scan operation is symbolic and the internal state is re-initialised after every pass. I cannot see how the state can be set manually in an easy manner.
lemuriandezapada commentedon Jun 1, 2015
That's a bummer. This implies a reading of the weights and reimplementing the whole nets in a "home", slower, numpy manner.
vzhong commentedon Jun 2, 2015
Can't you set the initial internal states through a shared variable? For example here's the Theano example of the recurrence for a vanilla RNN:
In this case can't you change the initial hidden state by setting
self.h0
?ssamot commentedon Jun 2, 2015
Yes of course - but
h
is symbolic right? You can do something like ``self.h0=shared_zeros(shape = (1, self.output_dim))to create the shared variable - how do you set
h0 = h`?so you can't do:
`
`
(updated for clarity)
fchollet commentedon Jun 8, 2015
Anything new on this front? I will try to repro Karpathy's RNN experiments in Keras, and add everything that's needed in the process.
ssamot commentedon Jun 11, 2015
Just an addition - I am currently struggling to come up with a good API for this. If you are to do state properly, apart from keeping state, you will need a mask of some sort that will tell you when to keep the current activation unchanged and possibley ignore padded elements.
Thus you would need to change
get_output(self, train)
to something likeget_output(self, train, mask_train, mask_run)
where each mask would be a 3d tensor with possible values for each element 0-1, first for train, 0-1 for keep hidden activations or not. This would change the overall interal API - does it make sense?fchollet commentedon Jun 11, 2015
If you need a mask, why not make it an attribute of the layer? Then there would be no need to change the overarching API. But to be honest I am not sure I see what you are describing --could you provide more details?
ssamot commentedon Jun 11, 2015
You cannot make it an attribute of the layers unless there is another way to get the batch you are sending. Imagine a scenario where you have to: read 10 characters then output a single character, keep the state and output another character after receiving three more characters.
If you don't have a mask you would have to have a padded 3d tensor - not that efficient. Does this make more sense now?
jonilaserson commentedon Jun 11, 2015
A. I think it would help if you write what use-cases of stateful RNNs you
would like to be able to model.
B. I'm not sure I see the problem in the example you've given. Why can't
you output a 'null' character every time you don't have a character to
output, and treat it as a character-to-character RNN?
C. Again, I'm not sure if that was the issue you were facing, but if it was
about making predictions on a batch, then here is a thought: maybe it is ok
to allow only one input sequence, instead of a batch, when doing prediction
(feedforward) on stateful RNNs? The training can still be in batch because
you can provide the mask in advance.
On Thu, Jun 11, 2015 at 9:19 PM, ssamot notifications@github.com wrote:
125 remaining items