[![Main Repo](https://img.shields.io/badge/Main_project-cotk-blue.svg?logo=github)](https://github.com/thu-coai/cotk) [![This Repo](https://img.shields.io/badge/Model_repo-pytorch--seq2seq-blue.svg?logo=github)](https://github.com/thu-coai/seq2seq-pytorch) [![Coverage Status](https://coveralls.io/repos/github/thu-coai/seq2seq-pytorch/badge.svg?branch=master)](https://coveralls.io/github/thu-coai/seq2seq-pytorch?branch=master) [![Build Status](https://travis-ci.com/thu-coai/seq2seq-pytorch.svg?branch=master)](https://travis-ci.com/thu-coai/seq2seq-pytorch) # Seq2Seq (PyTorch) Seq2seq with attention mechanism is a basic model for single turn dialog. In addition, batch normalization and dropout has been applied. You can also choose beamsearch, greedy, random sample, random sample from top-k when decoding. You can refer to the following paper for details: Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In *Advances in neural information processing systems* (pp. 3104-3112). Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In *International Conference on Learning Representation*. ## Require Packages * **python3** * cotk * pytorch == 1.0.0 * tensorboardX >= 1.4 ## Quick Start * Using ``cotk download thu-coai/seq2seq-pytorch/master`` to download codes. * Execute ``python run.py`` to train the model. * The default dataset is ``OpenSubtitles``. You can use ``--dataset`` to specify other ``dataloader`` class and ``--dataid`` to specify other data path (can be a local path, a url or a resources id). For example: ``--dataset OpenSubtitles --dataid resources://OpenSubtitles`` * It doesn't use pretrained word vector by default setting. You can use ``--wvclass`` to specify ``wordvector`` class and ``--wvpath`` to specify pretrained word embeddings. For example: ``--wvclass gloves``. For example: ``--dataset Glove --dataid resources://Glove300`` * If you don't have GPUs, you can add `--cpu` for switching to CPU, but it may cost very long time for either training or test. * You can view training process by tensorboard, the log is at `./tensorboard`. * For example, ``tensorboard --logdir=./tensorboard``. (You have to install tensorboard first.) * After training, execute ``python run.py --mode test --restore best`` for test. * You can use ``--restore filename`` to specify checkpoints files, which are in ``./model``. For example: ``--restore pretrained-opensubtitles`` for loading ``./model/pretrained-opensubtitles.model`` * ``--restore last`` means last checkpoint, ``--restore best`` means best checkpoints on dev. * ``--restore NAME_last`` means last checkpoint with model named NAME. The same as``--restore NAME_best``. * Find results at ``./output``. ## Arguments ```none usage: run.py [-h] [--name NAME] [--restore RESTORE] [--mode MODE] [--lr LR] [--eh_size EH_SIZE] [--dh_size DH_SIZE] [--droprate DROPRATE] [--batchnorm] [--decode_mode {max,sample,gumbel,samplek,beam}] [--top_k TOP_K] [--length_penalty LENGTH_PENALTY] [--dataset DATASET] [--dataid DATAID] [--epoch EPOCH] [--batch_per_epoch BATCH_PER_EPOCH] [--wvclass WVCLASS] [--wvid WVID] [--out_dir OUT_DIR] [--log_dir LOG_DIR] [--model_dir MODEL_DIR] [--cache_dir CACHE_DIR] [--cpu] [--debug] [--cache] [--seed SEED] A seq2seq model with GRU encoder and decoder. Attention, beamsearch, dropout and batchnorm is supported. optional arguments: -h, --help show this help message and exit --name NAME The name of your model, used for tensorboard, etc. Default: runXXXXXX_XXXXXX (initialized by current time) --restore RESTORE Checkpoints name to load. "NAME_last" for the last checkpoint of model named NAME. "NAME_best" means the best checkpoint. You can also use "last" and "best", by default use last model you run. Attention: "NAME_last" and "NAME_best" are not guaranteed to work when 2 models with same name run in the same time. "last" and "best" are not guaranteed to work when 2 models run in the same time. Default: None (don't load anything) --mode MODE "train" or "test". Default: train --lr LR Learning rate. Default: 0.001 --eh_size EH_SIZE Size of encoder GRU --dh_size DH_SIZE Size of decoder GRU --droprate DROPRATE The probability to be zerod in dropout. 0 indicates for don't use dropout --batchnorm Use bathnorm --decode_mode {max,sample,gumbel,samplek,beam} The decode strategy when freerun. Choices: max, sample, gumbel(=sample), samplek(sample from topk), beam(beamsearch). Default: beam --top_k TOP_K The top_k when decode_mode == "beam" or "samplek" --length_penalty LENGTH_PENALTY The beamsearch penalty for short sentences. The penalty will get larger when this becomes smaller. --dataset DATASET Dataloader class. Default: OpenSubtitles --dataid DATAID Resource id for data set. It can be a resource name or a local path. Default: resources://OpenSubtitles --epoch EPOCH Epoch for training. Default: 100 --batch_per_epoch BATCH_PER_EPOCH Batches per epoch. Default: 1500 --wvclass WVCLASS Wordvector class, none for not using pretrained wordvec. Default: Glove --wvid WVID Resource id for pretrained wordvector. Default: resources://Glove300d --out_dir OUT_DIR Output directory for test output. Default: ./output --log_dir LOG_DIR Log directory for tensorboard. Default: ./tensorboard --model_dir MODEL_DIR Checkpoints directory for model. Default: ./model --cache_dir CACHE_DIR Checkpoints directory for cache. Default: ./cache --cpu Use cpu. --debug Enter debug mode (using ptvsd). --cache Use cache for speeding up load data and wordvec. (It may cause problems when you switch dataset.) --seed SEED Specify random seed. Default: 0 ``` ## TensorBoard Example Execute ``tensorboard --logdir=./tensorboard``, you will see the plot in tensorboard pages: ![tensorboard_plot_example](./images/tensorboard_plot_example.png) Following plot are shown in this model: * gen/loss (``gen`` means training process) * gen/perplexity (``=exp(gen/word_loss)``) * gen/word_loss (``=gen/loss`` in this model) * dev/loss * dev/perplexity_avg_on_batch * test/loss * test/perplexity_avg_on_batch And text output: ![tensorboard_plot_example](./images/tensorboard_text_example.png) Following text are shown in this model: * args * dev/show_str%d (``%d`` is according to ``args.show_sample`` in ``run.py``) ## Case Study of Model Results Execute ``python run.py --mode test --restore best`` The following output will be in `./output/[name]_[dev|test].txt`: ```none perplexity: 48.194050 bleu: 0.320098 post: my name is josie . resp: , pennsylvania , the state . gen: i' m a teacher . post: i put premium gasoline in her . resp: josie , i told you . gen: i don' t know . post: josie , dont hang up resp: they do it to aii the new kids . gen: aii right , you guys , you know what ? post: about playing a part . resp: and thats the theme of as you like it . gen: i don' t know . ...... ``` ## Experiment ### Subset Experiment Based on `OpenSubtitles_small` (a smaller version of `OpenSubtitles`), we did the following experiments. | encoder | decoder | batchnorm | learning rate | droprate | dev perplexity | test perplexity | | :-----: | :-----: | :-------: | :-----------: | :------: | :------------: | :-------------: | | 175 | 175 | no | 0.0001 | 0.2 | 88.971 | 94.698 | | **128** | **128** | no | 0.0001 | **0** | 95.207 | 100.676 | | **128** | **128** | no | 0.0001 | 0.2 | 93.559 | 99.287 | | **128** | **128** | **yes** | 0.0001 | **0** | 105.649 | 112.818 | | **128** | **128** | **yes** | 0.0001 | 0.2 | 95.894 | 102.243 | | **150** | **150** | no | 0.0001 | 0.2 | 90.153 | 95.072 | | **256** | **256** | no | 0.0001 | 0.2 | 89.374 | 94.272 | | 175 | 175 | **yes** | 0.0001 | 0.2 | 97.704 | 102.851 | | 175 | 175 | no | 0.0001 | **0.1** | 90.149 | 95.310 | | 175 | 175 | no | 0.0001 | **0.3** | 89.750 | 95.042 | | 175 | 175 | no | **0.0005** | 0.2 | **88.688** | **93.421** | The following experiments are based on the parameters of the first experiment. To reproduce the experiment, run the following command to train the model: ```bash python run.py --dataset=OpenSubtitles --dataid resources://OpenSubtitles_small --eh_size 175 --dh_size 175 --droprate 0.2 --epoch 35 --lr 0.0001 ``` Based on the best parameters, we did another five experiments in order to analyze the model's performance totally. | seed | dev perplexity | test perplexity | dev bleu(x1e-3) | test bleu(x1e-3) | | :----------------: | :------------: | :-------------: | :------------: | :-------------: | | 7913 | 89.123 | 93.898 | 3.24 | 2.82 | | 1640 | 88.224 | 94.019 | 4.51 | 4.13 | | 3739 | 87.969 | 92.730 | 2.37 | 2.15 | | 972 | 88.083 | 93.515 | 2.99 | 2.34 | | 3594 | 87.933 | 94.258 | 4.22 | 3.44 | | Average | 88.266 ± 0.440 | 93.684 ± 0.534 |3.47 ± 7.92|2.98 ± 7.29| Run the following command to test the trained model: ```bash python run.py --dataset=OpenSubtitles --dataid resources://OpenSubtitles_small --eh_size 175 --dh_size 175 --decode_mode samplek --droprate 0.2 --epoch 35 --mode test --restore [model name] --seed [seed value] ``` ### Fullset experiment Based on the result of subset experiment, we did the following experiment on `OpenSubtitles`. | Parameters | Value | | :-----------: | :----: | | encoder | 256 | | decoder | 256 | | batchnorm | no | | learning rate | 0.0003 | | droprate | 0.2 | | epoch | 200 | | seed | 0 | | decode mode |dev perplexity | test perplexity | dev bleu(x1e-3) | test bleu(x1e-3) | |:--: | :------------: | :-------------: | :------: | :-------: | | **samplek** | 41.457 | 43.868 | 3.24 | 2.82 | | **beam** | 41.457 | 43.868 | 11.5 | 10.4 | To reproduce the experiment, run the following command to train the model: ```bash python run.py --dataset=OpenSubtitles --dataid resources://OpenSubtitles --eh_size 256 --dh_size 256 --droprate 0.2 --lr 0.0003 --epoch 200 ``` Run the following command to test the trained model: ```bash python run.py --dataset=OpenSubtitles --dataid resources://OpenSubtitles --eh_size 256 --dh_size 256 --decode_mode [decode mode] --droprate 0.2 --lr 0.0003 --epoch 200 --mode test --restore best ``` ## Performance | | Perplexity | BLEU | | ------------- | ---------- | ----- | | OpenSubtitles | 43.868 | 0.0115 | ## Author [HUANG Fei](https://github.com/hzhwcmhf)