tatk.policy.ppo package¶

Submodules¶

tatk.policy.ppo.ppo module¶

class tatk.policy.ppo.ppo.PPO(is_train=False, dataset='Multiwoz')¶

Bases: tatk.policy.policy.Policy

__init__(is_train=False, dataset='Multiwoz')¶: Initialize self. See help(type(self)) for accurate signature.

est_adv(r, v, mask)¶: we save a trajectory in continuous space and it reaches the ending of current trajectory when mask=0. :param r: reward, Tensor, [b] :param v: estimated value, Tensor, [b] :param mask: indicates ending for 0 otherwise 1, Tensor, [b] :return: A(s, a), V-target(s), both Tensor

init_session()¶: Restore after one session

load(filename)¶

predict(state)¶

Predict an system action given state. Args:

state (dict): Dialog state. Please refer to util/state.py

Returns:: action : System act, with the form of (act_type, {slot_name_1: value_1, slot_name_2, value_2, …})

save(directory, epoch)¶

update(epoch, batchsz, s, a, r, mask)¶