"real time" recurrent nets #98

Closed

Hey guys,

I was wondering how are the initial internal states in a recurrent layer dealt with? So far it appears they are reset are every run. Is there any way to preserve them?

I'd like to be able to feed a .predict_proba() function data one time step at a time for a time series task, as the points come in, without also feeding the entire history all over again. Is this somehow possible? Thanks

ssamot

Adding on top of the comment above - it should be possible to train incrementally as well, if you can keep state. Otherwise one would possibly need to remember really large sequences and replay them if training online.

fchollet

Collaborator

I agree that we need some form of stateful RNNs, that would be particularly useful for sequence generation. It's unclear yet if that should be setup as a mode of existing RNNs, or as different layers altogether.

Anybody interested in looking at it?

ssamot

Well - my personal preference would be to use the same layer - something along the lines of stateless = true by default. You would only need to somehow preserve h_tm1, c_tm1 from calls between train - right?

ssamot

OK - I'll try doing an implementation/test with simple character generation and use the old state to init ``theano.scan'. Let's see...

ssamot

This seems to require possibly a much deeper re-write than I anticipated initially - whatever theano.scan uses as initial input seems to be compiled once and stays there - one might need to save state in shared variables somehow? Any ideas?

lemuriandezapada

Author

Is there no way to read or set the internal states by hand?
I wouldn't really touch the training procedure, but the running/prediction procedure needs to have some sort of persistent mode.

ssamot

The reading you can do, but not the setting - AFAIK - the scan operation is symbolic and the internal state is re-initialised after every pass. I cannot see how the state can be set manually in an easy manner.

lemuriandezapada

Author

That's a bummer. This implies a reading of the weights and reimplementing the whole nets in a "home", slower, numpy manner.

vzhong

Contributor

Can't you set the initial internal states through a shared variable? For example here's the Theano example of the recurrence for a vanilla RNN:

        def recurrence(x_t, h_tm1):
            h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
                                 + T.dot(h_tm1, self.wh) + self.bh)
            s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
            return [h_t, s_t]

        [h, s], _ = theano.scan(fn=recurrence,
                                sequences=x,
                                outputs_info=[self.h0, None],
                                n_steps=x.shape[0])

        p_y_given_x_sentence = s[:, 0, :]
        y_pred = T.argmax(p_y_given_x_sentence, axis=1)

In this case can't you change the initial hidden state by setting self.h0?

ssamot

Yes of course - but h is symbolic right? You can do something like ``self.h0=shared_zeros(shape = (1, self.output_dim))to create the shared variable - how do you seth0 = h`?

so you can't do:
`

    f = theano.function(inputs=[self.h0], outputs=outputs})
    self.h0.set_value(f()[-1])

`
(updated for clarity)

fchollet

Collaborator

Anything new on this front? I will try to repro Karpathy's RNN experiments in Keras, and add everything that's needed in the process.

ssamot

Just an addition - I am currently struggling to come up with a good API for this. If you are to do state properly, apart from keeping state, you will need a mask of some sort that will tell you when to keep the current activation unchanged and possibley ignore padded elements.

Thus you would need to change get_output(self, train) to something like get_output(self, train, mask_train, mask_run) where each mask would be a 3d tensor with possible values for each element 0-1, first for train, 0-1 for keep hidden activations or not. This would change the overall interal API - does it make sense?

fchollet

Collaborator

Thus you would need to change get_output(self, train) to something like get_output(self, train, mask_train, mask_run) where each mask would be a 3d tensor with possible values for each element 0-1, first for train, 0-1 for keep hidden activations or not. This would change the overall interal API - does it make sense?

If you need a mask, why not make it an attribute of the layer? Then there would be no need to change the overarching API. But to be honest I am not sure I see what you are describing --could you provide more details?

ssamot

You cannot make it an attribute of the layers unless there is another way to get the batch you are sending. Imagine a scenario where you have to: read 10 characters then output a single character, keep the state and output another character after receiving three more characters.

If you don't have a mask you would have to have a padded 3d tensor - not that efficient. Does this make more sense now?

jonilaserson

A. I think it would help if you write what use-cases of stateful RNNs you
would like to be able to model.

B. I'm not sure I see the problem in the example you've given. Why can't
you output a 'null' character every time you don't have a character to
output, and treat it as a character-to-character RNN?

C. Again, I'm not sure if that was the issue you were facing, but if it was
about making predictions on a batch, then here is a thought: maybe it is ok
to allow only one input sequence, instead of a batch, when doing prediction
(feedforward) on stateful RNNs? The training can still be in batch because
you can provide the mask in advance.

On Thu, Jun 11, 2015 at 9:19 PM, ssamot notifications@github.com wrote:

You cannot make it an attribute of the layers unless there is another way
to get the batch you are sending. Imagine a scenario where you have to:
read 10 characters then output a single character, keep the state and
output another character after receiving three more characters.

If you don't have a mask you would have to have a padded 3d tensor - not
that efficient. Does this make more sense now?

—
Reply to this email directly or view it on GitHub
#98 (comment).

125 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"real time" recurrent nets #98

125 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

"real time" recurrent nets #98

Description

Activity

ssamot commented on May 31, 2015

fchollet commented on May 31, 2015

ssamot commented on May 31, 2015

ssamot commented on May 31, 2015

ssamot commented on Jun 1, 2015

lemuriandezapada commented on Jun 1, 2015

ssamot commented on Jun 1, 2015

lemuriandezapada commented on Jun 1, 2015

vzhong commented on Jun 2, 2015

ssamot commented on Jun 2, 2015

fchollet commented on Jun 8, 2015

ssamot commented on Jun 11, 2015

fchollet commented on Jun 11, 2015

ssamot commented on Jun 11, 2015

jonilaserson commented on Jun 11, 2015

125 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions