tatk.util package

Submodules

tatk.util.allennlp_file_utils module

Copy from allennlp https://github.com/allenai/allennlp/blob/master/allennlp/common/file_utils.py Utilities for working with the local dataset cache.

class tatk.util.allennlp_file_utils.Tqdm

Bases: object

default_mininterval = 0.1
static set_default_mininterval(value: float) → None
static set_slower_interval(use_slower_interval: bool) → None

If use_slower_interval is True, we will dramatically slow down tqdm's default output rate. tqdm's default output rate is great for interactively watching progress, but it is not great for log files. You might want to set this if you are primarily going to be looking at output through log files, not the terminal.

static tqdm(*args, **kwargs)
tatk.util.allennlp_file_utils.cached_path(url_or_filename: Union[str, pathlib.Path], cache_dir: str = None) → str

Given something that might be a URL (or might be a local path), determine which. If it’s a URL, download the file and cache it, and return the path to the cached file. If it’s already a local path, make sure the file exists and then return the path.

tatk.util.allennlp_file_utils.filename_to_url(filename: str, cache_dir: str = None) → Tuple[str, str]

Return the url and etag (which may be None) stored for filename. Raise FileNotFoundError if filename or its stored metadata do not exist.

tatk.util.allennlp_file_utils.get_file_extension(path: str, dot=True, lower: bool = True)
tatk.util.allennlp_file_utils.get_from_cache(url: str, cache_dir: str = None) → str

Given a URL, look for the corresponding dataset in the local cache. If it’s not there, download it. Then return the path to the cached file.

tatk.util.allennlp_file_utils.get_s3_resource()
tatk.util.allennlp_file_utils.http_get(url: str, temp_file: IO) → None
tatk.util.allennlp_file_utils.is_url_or_existing_file(url_or_filename: Union[str, pathlib.Path, None]) → bool

Given something that might be a URL (or might be a local path), determine check if it’s url or an existing file path.

tatk.util.allennlp_file_utils.read_set_from_file(filename: str) → Set[str]

Extract a de-duped collection (set) of text from a file. Expected file format is one item per line.

tatk.util.allennlp_file_utils.s3_etag(url: str) → Optional[str]

Check ETag on S3 object.

tatk.util.allennlp_file_utils.s3_get(url: str, temp_file: IO) → None

Pull a file directly from S3.

tatk.util.allennlp_file_utils.s3_request(func: Callable)

Wrapper function for s3 requests in order to create more helpful error messages.

tatk.util.allennlp_file_utils.session_with_backoff() → requests.sessions.Session

We ran into an issue where http requests to s3 were timing out, possibly because we were making too many requests too quickly. This helper function returns a requests session that has retry-with-backoff built in.

see https://stackoverflow.com/questions/23267409/how-to-implement-retry-mechanism-into-python-requests-library

tatk.util.allennlp_file_utils.split_s3_path(url: str) → Tuple[str, str]

Split a full s3 path into the bucket name and path.

tatk.util.allennlp_file_utils.url_to_filename(url: str, etag: str = None) → str

Convert url into a hashed filename in a repeatable way. If etag is specified, append its hash to the url’s, delimited by a period.

tatk.util.file_util module

tatk.util.file_util.cached_path(file_path, cached_dir=None)
tatk.util.file_util.dump_json(content, filepath)
tatk.util.file_util.read_zipped_json(zip_path, filepath)
tatk.util.file_util.write_zipped_json(zip_path, filepath)

tatk.util.module module

module interface.

class tatk.util.module.Module

Bases: abc.ABC

from_cache(*args, **kwargs)

restore internal state for multi-turn dialog

init_session()

Init the class variables for a new session.

test(*args, **kwargs)

Model testing entry point

to_cache(*args, **kwargs)

save internal state for multi-turn dialog

train(*args, **kwargs)

Model training entry point

tatk.util.train_util module

tatk.util.train_util.init_logging_handler(log_dir, extra='')
tatk.util.train_util.to_device(data)