Skip to content

"real time" recurrent nets #98

Closed
Closed
@lemuriandezapada

Description

@lemuriandezapada

Hey guys,

I was wondering how are the initial internal states in a recurrent layer dealt with? So far it appears they are reset are every run. Is there any way to preserve them?

I'd like to be able to feed a .predict_proba() function data one time step at a time for a time series task, as the points come in, without also feeding the entire history all over again. Is this somehow possible? Thanks

Activity

ssamot

ssamot commented on May 31, 2015

@ssamot

Adding on top of the comment above - it should be possible to train incrementally as well, if you can keep state. Otherwise one would possibly need to remember really large sequences and replay them if training online.

fchollet

fchollet commented on May 31, 2015

@fchollet
Collaborator

I agree that we need some form of stateful RNNs, that would be particularly useful for sequence generation. It's unclear yet if that should be setup as a mode of existing RNNs, or as different layers altogether.

Anybody interested in looking at it?

ssamot

ssamot commented on May 31, 2015

@ssamot

Well - my personal preference would be to use the same layer - something along the lines of stateless = true by default. You would only need to somehow preserve h_tm1, c_tm1 from calls between train - right?

ssamot

ssamot commented on May 31, 2015

@ssamot

OK - I'll try doing an implementation/test with simple character generation and use the old state to init ``theano.scan'. Let's see...

ssamot

ssamot commented on Jun 1, 2015

@ssamot

This seems to require possibly a much deeper re-write than I anticipated initially - whatever theano.scan uses as initial input seems to be compiled once and stays there - one might need to save state in shared variables somehow? Any ideas?

lemuriandezapada

lemuriandezapada commented on Jun 1, 2015

@lemuriandezapada
Author

Is there no way to read or set the internal states by hand?
I wouldn't really touch the training procedure, but the running/prediction procedure needs to have some sort of persistent mode.

ssamot

ssamot commented on Jun 1, 2015

@ssamot

The reading you can do, but not the setting - AFAIK - the scan operation is symbolic and the internal state is re-initialised after every pass. I cannot see how the state can be set manually in an easy manner.

lemuriandezapada

lemuriandezapada commented on Jun 1, 2015

@lemuriandezapada
Author

That's a bummer. This implies a reading of the weights and reimplementing the whole nets in a "home", slower, numpy manner.

vzhong

vzhong commented on Jun 2, 2015

@vzhong
Contributor

Can't you set the initial internal states through a shared variable? For example here's the Theano example of the recurrence for a vanilla RNN:

        def recurrence(x_t, h_tm1):
            h_t = T.nnet.sigmoid(T.dot(x_t, self.wx)
                                 + T.dot(h_tm1, self.wh) + self.bh)
            s_t = T.nnet.softmax(T.dot(h_t, self.w) + self.b)
            return [h_t, s_t]

        [h, s], _ = theano.scan(fn=recurrence,
                                sequences=x,
                                outputs_info=[self.h0, None],
                                n_steps=x.shape[0])

        p_y_given_x_sentence = s[:, 0, :]
        y_pred = T.argmax(p_y_given_x_sentence, axis=1)

In this case can't you change the initial hidden state by setting self.h0?

ssamot

ssamot commented on Jun 2, 2015

@ssamot

Yes of course - but h is symbolic right? You can do something like ``self.h0=shared_zeros(shape = (1, self.output_dim))to create the shared variable - how do you seth0 = h`?

so you can't do:
`

    f = theano.function(inputs=[self.h0], outputs=outputs})
    self.h0.set_value(f()[-1])

`
(updated for clarity)

fchollet

fchollet commented on Jun 8, 2015

@fchollet
Collaborator

Anything new on this front? I will try to repro Karpathy's RNN experiments in Keras, and add everything that's needed in the process.

ssamot

ssamot commented on Jun 11, 2015

@ssamot

Just an addition - I am currently struggling to come up with a good API for this. If you are to do state properly, apart from keeping state, you will need a mask of some sort that will tell you when to keep the current activation unchanged and possibley ignore padded elements.

Thus you would need to change get_output(self, train) to something like get_output(self, train, mask_train, mask_run) where each mask would be a 3d tensor with possible values for each element 0-1, first for train, 0-1 for keep hidden activations or not. This would change the overall interal API - does it make sense?

fchollet

fchollet commented on Jun 11, 2015

@fchollet
Collaborator

Thus you would need to change get_output(self, train) to something like get_output(self, train, mask_train, mask_run) where each mask would be a 3d tensor with possible values for each element 0-1, first for train, 0-1 for keep hidden activations or not. This would change the overall interal API - does it make sense?

If you need a mask, why not make it an attribute of the layer? Then there would be no need to change the overarching API. But to be honest I am not sure I see what you are describing --could you provide more details?

ssamot

ssamot commented on Jun 11, 2015

@ssamot

You cannot make it an attribute of the layers unless there is another way to get the batch you are sending. Imagine a scenario where you have to: read 10 characters then output a single character, keep the state and output another character after receiving three more characters.

If you don't have a mask you would have to have a padded 3d tensor - not that efficient. Does this make more sense now?

jonilaserson

jonilaserson commented on Jun 11, 2015

@jonilaserson

A. I think it would help if you write what use-cases of stateful RNNs you
would like to be able to model.

B. I'm not sure I see the problem in the example you've given. Why can't
you output a 'null' character every time you don't have a character to
output, and treat it as a character-to-character RNN?

C. Again, I'm not sure if that was the issue you were facing, but if it was
about making predictions on a batch, then here is a thought: maybe it is ok
to allow only one input sequence, instead of a batch, when doing prediction
(feedforward) on stateful RNNs? The training can still be in batch because
you can provide the mask in advance.

On Thu, Jun 11, 2015 at 9:19 PM, ssamot notifications@github.com wrote:

You cannot make it an attribute of the layers unless there is another way
to get the batch you are sending. Imagine a scenario where you have to:
read 10 characters then output a single character, keep the state and
output another character after receiving three more characters.

If you don't have a mask you would have to have a padded 3d tensor - not
that efficient. Does this make more sense now?


Reply to this email directly or view it on GitHub
#98 (comment).

125 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @discobot@kylemcdonald@makmanalp@wxs@pikeas

        Issue actions

          "real time" recurrent nets · Issue #98 · keras-team/keras