convlab2.policy package

Subpackages

Submodules

convlab2.policy.evaluate module

convlab2.policy.evaluate.evaluate(dataset_name, model_name, load_path, calculate_reward=True)
convlab2.policy.evaluate.init_logging(log_dir_path, path_suffix=None)
convlab2.policy.evaluate.sample(env, policy, batchsz, process_num)

Given batchsz number of task, the batchsz will be splited equally to each processes and when processes return, it merge all data and return

param env

param policy

Parameters

batchsz

param process_num

Returns

batch

convlab2.policy.evaluate.sampler(pid, queue, evt, env, policy, batchsz)

This is a sampler function, and it will be called by multiprocess.Process to sample data from environment by multiple processes. :param pid: process id :param queue: multiprocessing.Queue, to collect sampled data :param evt: multiprocessing.Event, to keep the process alive :param env: environment instance :param policy: policy network, to generate action from current policy :param batchsz: total sampled items :return:

convlab2.policy.policy module

Policy Interface

class convlab2.policy.policy.Policy

Bases: convlab2.util.module.Module

Policy module interface.

predict(state)

Predict the next agent action given dialog state.

Args:
state (dict or list of list):

when the policy takes dialogue state as input, the type is dict. else when the policy takes dialogue act as input, the type is list of list.

Returns:
action (list of list or str):

when the policy outputs dialogue act, the type is list of list. else when the policy outputs utterance directly, the type is str.

convlab2.policy.rlmodule module

class convlab2.policy.rlmodule.ContinuousPolicy(s_dim, h_dim, a_dim)

Bases: torch.nn.modules.module.Module

forward(s)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_log_prob(s, a)
Parameters
  • s – [b, s_dim]

  • a – [b, a_dim]

Returns

[b, 1]

select_action(s, sample=True)
Parameters

s – [s_dim]

Returns

[a_dim]

class convlab2.policy.rlmodule.DiscretePolicy(s_dim, h_dim, a_dim)

Bases: torch.nn.modules.module.Module

forward(s)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_log_prob(s, a)
Parameters
  • s – [b, s_dim]

  • a – [b, 1]

Returns

[b, 1]

select_action(s, sample=True)
Parameters

s – [s_dim]

Returns

[1]

class convlab2.policy.rlmodule.EpsilonGreedyPolicy(s_dim, h_dim, a_dim, epsilon_spec={'end': 0.0, 'end_epoch': 200, 'start': 0.1})

Bases: torch.nn.modules.module.Module

forward(s)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

select_action(s, is_train=True)
Parameters

s – [s_dim]

Returns

[1]

update_epsilon(epoch)
class convlab2.policy.rlmodule.Memory

Bases: object

append(new_memory)
get_batch(batch_size=None)
push(*args)

Saves a transition.

class convlab2.policy.rlmodule.MemoryReplay(max_size)

Bases: object

The difference to class Memory is that MemoryReplay has a limited size. It is mainly used for off-policy algorithms.

append(new_memory)
get_batch(batch_size=None)
push(*args)

Saves a transition.

reset()
class convlab2.policy.rlmodule.MultiDiscretePolicy(s_dim, h_dim, a_dim)

Bases: torch.nn.modules.module.Module

forward(s)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_log_prob(s, a)
Parameters
  • s – [b, s_dim]

  • a – [b, a_dim]

Returns

[b, 1]

select_action(s, sample=True)
Parameters

s – [s_dim]

Returns

[a_dim]

class convlab2.policy.rlmodule.Transition(state, action, reward, next_state, mask)

Bases: tuple

property action

Alias for field number 1

property mask

Alias for field number 4

property next_state

Alias for field number 3

property reward

Alias for field number 2

property state

Alias for field number 0

class convlab2.policy.rlmodule.Value(s_dim, hv_dim)

Bases: torch.nn.modules.module.Module

forward(s)
Parameters

s – [b, s_dim]

Returns

[b, 1]

convlab2.policy.vec module

Vector Interface

class convlab2.policy.vec.Vector

Bases: object

action_devectorize(action_vec)

recover an action

Args:
action_vec (np.array):

Dialog act vector

Returns:
action (tuple):

Dialog act

generate_dict()

init the dict for mapping state/action into vector

state_vectorize(state)

vectorize a state

Args:
state (tuple):

Dialog state

Returns:
state_vec (np.array):

Dialog state vector

Module contents