You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For RNN cells, we get the initial state using cell.zero_state() and the last state after processing a sequence using rnn.dynamic_rnn(). However, to use the last state as the initial state for the next run, one must create a tf.placeholder(). As far as I know, currently there is no way to create and fill such a placeholder (or nested tuple of placeholders) automatically. Such a feature would be very useful so that we don't have to adjust the placeholder manually when changing the RNN cell.
tomrunia, wb14123, huache, koichiro11, martiansideofthemoon and 5 more
It's for both truncated BPTT and architectures using LSTM decoders. In the second case, the cells are initialized with some encoded activation. For an example see: Skip-Thought Vectors (Kiros et al. 2015).
We are working on a system that makes this easy. Something should already
in the github repo within a coupe of weeks.
On Jul 10, 2016 3:34 AM, "Tom Runia" notifications@github.com wrote:
@yangzw You can feed in values for any tensor. Placeholders are just special in that they throw an error if you don't feed them while a variable would silently use its last value.
yuke93, heyzude, bonbonauswurst, post2web, m4nuC and 6 more
We now have a comprehensive solution for truncated BPTT; introduced in 955efc9. See tf.contrib.training.batch_sequences_with_states. Unfortunately for now the only examples are in the unit tests.
Automatically creating nested placeholders would be useful. I'll look into adding this.
@ebrevdo : In my opinion there should be something more general. For cases in which one trains a RNN with the whole sequence being fed at once but during inference requires fetching and feeding the states on a per time-step basis, the solution right now is not very neat. Is there some hope of fetching and feeding list of state tuples for MultiRNNCell in some way.
srkunze, ischlag, rejuvyesh, animanathome and tarragoestevetarragoesteve
I'm current solving this by holding the state in a non-trainable variable that I initialize from the default state. The variable name is prefixed by state/ and I have helper functions to return a dictionary from name to tensor containing all variables matching this prefix. Similarly, I have a helper function to assign variable values from this dictionary.
This is a general way to handle context, but it's not straight forward using the existing TensorFlow features. Moreover, it doesn't work with the new decision to represent LSTM states as tuples.
I can contribute code to TensorFlow for a feature like this, but we should think this through first, and see that it matches TensorFlow's preferred way to handle states.
@danijar: I recommend against using a non-trainable variable because this is not thread-safe (you can't run multiple inference threads against the same graph). However, it's not too hard to create some placeholder tensors and wrap them in the necessary tuple type. Similarly when calling a session run, one can pull out the "next state" tuple and store it, feeding it as an input to the next session.run. This is decidedly more thread-safe than the variable solution (and in fact is zero-copy if you're doing this in a C++ client; though sadly not zero-copy in python since TF runtime must copy feed_dict inputs from python since python does its own memory management)
@nitishgupta you can now fetch an arbitrary tuple type in python. don't think you can feed one though (but that may have changed recently?) since usually per-step RNN inference is meant done in a C++ client, i don't have any plans to add python sugar for this.
@wpm an example of "my_parser" is something that reads a serialized SequenceExample via a reader and deserializes it using parse_single_sequence_example. The parse_single_sequence_example call returns context and sequences dictionaries that exactly match some of the inputs of batch_sequences_with_states.
Activity
ebrevdo commentedon Jun 14, 2016
Is this request specifically for truncated BPTT? or something more general?
danijar commentedon Jun 15, 2016
It's for both truncated BPTT and architectures using LSTM decoders. In the second case, the cells are initialized with some encoded activation. For an example see: Skip-Thought Vectors (Kiros et al. 2015).
tomrunia commentedon Jul 10, 2016
I am also interested in having a good way to remember the LSTM states for the next batch. This question was also asked by me on StackOverflow: http://stackoverflow.com/questions/38241410/tensorflow-remember-lstm-state-for-next-batch-stateful-lstm
ebrevdo commentedon Jul 10, 2016
We are working on a system that makes this easy. Something should already
in the github repo within a coupe of weeks.
On Jul 10, 2016 3:34 AM, "Tom Runia" notifications@github.com wrote:
tomrunia commentedon Jul 12, 2016
This might be of interest to @danijar : #2695
Great that you are working on things to make this easier, I will wait for an update :-)
yangzw commentedon Jul 31, 2016
In the example ptb_word_lm.py, line 257
However, there is no placeholder for
m.ititial_state
. Why this could work?danijar commentedon Jul 31, 2016
@yangzw You can feed in values for any tensor. Placeholders are just special in that they throw an error if you don't feed them while a variable would silently use its last value.
ebrevdo commentedon Aug 17, 2016
We now have a comprehensive solution for truncated BPTT; introduced in 955efc9. See
tf.contrib.training.batch_sequences_with_states
. Unfortunately for now the only examples are in the unit tests.Automatically creating nested placeholders would be useful. I'll look into adding this.
wpm commentedon Aug 28, 2016
Is there example code for the
my_parser
function in the example in the documentation forbatch_sequences_with_states
?I'm trying to figure it out from the documentation but am still having questions.
nitishgupta commentedon Aug 29, 2016
@ebrevdo : In my opinion there should be something more general. For cases in which one trains a RNN with the whole sequence being fed at once but during inference requires fetching and feeding the states on a per time-step basis, the solution right now is not very neat. Is there some hope of fetching and feeding list of state tuples for MultiRNNCell in some way.
danijar commentedon Aug 29, 2016
I'm current solving this by holding the state in a non-trainable variable that I initialize from the default state. The variable name is prefixed by
state/
and I have helper functions to return a dictionary from name to tensor containing all variables matching this prefix. Similarly, I have a helper function to assign variable values from this dictionary.This is a general way to handle context, but it's not straight forward using the existing TensorFlow features. Moreover, it doesn't work with the new decision to represent LSTM states as tuples.
I can contribute code to TensorFlow for a feature like this, but we should think this through first, and see that it matches TensorFlow's preferred way to handle states.
ebrevdo commentedon Aug 29, 2016
@danijar: I recommend against using a non-trainable variable because this is not thread-safe (you can't run multiple inference threads against the same graph). However, it's not too hard to create some placeholder tensors and wrap them in the necessary tuple type. Similarly when calling a session run, one can pull out the "next state" tuple and store it, feeding it as an input to the next session.run. This is decidedly more thread-safe than the variable solution (and in fact is zero-copy if you're doing this in a C++ client; though sadly not zero-copy in python since TF runtime must copy feed_dict inputs from python since python does its own memory management)
@nitishgupta you can now fetch an arbitrary tuple type in python. don't think you can feed one though (but that may have changed recently?) since usually per-step RNN inference is meant done in a C++ client, i don't have any plans to add python sugar for this.
@wpm an example of "my_parser" is something that reads a serialized
SequenceExample
via a reader and deserializes it usingparse_single_sequence_example
. Theparse_single_sequence_example
call returnscontext
andsequences
dictionaries that exactly match some of the inputs ofbatch_sequences_with_states
.14 remaining items