tatk.policy.pg package¶
Submodules¶
tatk.policy.pg.pg module¶
- 
class 
tatk.policy.pg.pg.PG(is_train=False, dataset='Multiwoz')¶ Bases:
tatk.policy.policy.Policy- 
__init__(is_train=False, dataset='Multiwoz')¶ Initialize self. See help(type(self)) for accurate signature.
- 
est_return(r, mask)¶ we save a trajectory in continuous space and it reaches the ending of current trajectory when mask=0. :param r: reward, Tensor, [b] :param mask: indicates ending for 0 otherwise 1, Tensor, [b] :return: V-target(s), Tensor
- 
init_session()¶ Restore after one session
- 
load(filename)¶ 
- 
predict(state)¶ Predict an system action given state. Args:
state (dict): Dialog state. Please refer to util/state.py
- Returns:
 action : System act, with the form of (act_type, {slot_name_1: value_1, slot_name_2, value_2, …})
- 
save(directory, epoch)¶ 
- 
update(epoch, batchsz, s, a, r, mask)¶ 
-