remove the cache

00945388 · xuchen · dcc5a635 · 00945388 · 00945388 · 00945388
Commit 00945388 authored Aug 24, 2021 by xuchen
--- a/.gitignore
+++ b/.gitignore
+# JetBrains PyCharm IDE
+.idea/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# macOS dir files
+.DS_Store
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Checkpoints
+checkpoints
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# dotenv
+.env
+
+# virtualenv
+.venv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+# Generated files
+/fairseq/temporal_convolution_tbc
+/fairseq/modules/*_layer/*_forward.cu
+/fairseq/modules/*_layer/*_backward.cu
+/fairseq/version.py
+
+# data
+data-bin/
+
+# reranking
+/examples/reranking/rerank_data
+
+# Cython-generated C++ source files
+/fairseq/data/data_utils_fast.cpp
+/fairseq/data/token_block_utils_fast.cpp
+
+# VSCODE
+.vscode/ftp-sync.json
+.vscode/settings.json
+
+# Experimental Folder
+experimental/*
+
+# Weights and Biases logs
+wandb/
+/examples/translation/iwslt14.tokenized.de-en/
+/toy/
\ No newline at end of file
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "fairseq/model_parallel/megatron"]
+    path = fairseq/model_parallel/megatron
+    url = https://github.com/ngoyal2707/Megatron-LM
+    branch = fairseq
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
+# Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to make participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies within all project spaces, and it also applies when
+an individual is representing the project or its community in public spaces.
+Examples of representing a project or community include using an official
+project e-mail address, posting via an official social media account, or acting
+as an appointed representative at an online or offline event. Representation of
+a project may be further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at <conduct@pytorch.org>. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
+
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
+# Contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq)
+We want to make contributing to this project as easy and transparent as
+possible.
+
+## Pull Requests
+We actively welcome your pull requests.
+
+1. Fork the repo and create your branch from `master`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. If you haven't already, complete the Contributor License Agreement ("CLA").
+
+## Contributor License Agreement ("CLA")
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+
+Complete your CLA here: <https://code.facebook.com/cla>
+
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+
+## License
+By contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq),
+you agree that your contributions will be licensed under the LICENSE file in
+the root directory of this source tree.
--- a/Fairseq-README.md
+++ b/Fairseq-README.md
--- a/LICENSE
+++ b/LICENSE
+MIT License
+
+Copyright (c) Facebook, Inc. and its affiliates.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
+# Fairseq-S2T
+
+Adapt the fairseq toolkit for speech to text task.
+
+Implementation of the paper:
+[Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders](https://arxiv.org/abs/2105.05752)
+
+## Key Features
+
+### Training
+
+- Support the Kaldi-style complete recipe
+- ASR, MT, and ST pipeline (bin)
+- Read training config in yaml file
+- CTC multi-task learning
+- MT training in the ST-like way (Online tokenizer) (There may be bugs)
+- speed perturb during pre-processing (need torchaudio ≥ 0.8.0)
+  
+### Model
+
+- Conformer Architecture
+- Load pre-trained model for ST
+- Relative position encoding
+- Stacked acoustic-and-textual encoding 
+
+## Installation
+
+* Note we only test the following environment.
+
+1. Python == 3.6
+2. torch == 1.8, torchaudio == 0.8.0, cuda == 10.2
+3. apex
+```
+pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
+```
+4. nccl
+```
+make -j src.build CUDA_HOME=<path to cuda install>
+```
+5. gcc ≥ 4.9 (We use the version 5.4)
+6. python library 
+```
+pip install pandas sentencepiece configargparse gpustat tensorboard editdistance
+```
+
+## Code Tree
+
+The shell scripts for each benchmark is in the egs folder, we create the ASR pipeline for LibriSpeech, all pipelines (ASR, MT, and ST) for MuST-C. Besides, we also provide the template for other benchmarks.
+
+Here is an example for MuST-C:
+
+```markdown
+mustc
+├── asr
+│   ├── binary.sh
+│   ├── conf
+│   ├── decode.sh
+│   ├── local
+│   ├── run.sh
+│   └── train.sh
+├── mt
+│   ├── binary.sh
+│   ├── conf
+│   ├── decode.sh
+│   ├── local
+│   ├── run.sh
+│   └── train.sh
+└── st
+    ├── binary.sh
+    ├── conf
+    ├── decode.sh
+    ├── ensemble.sh
+    ├── local
+    ├── run.sh
+    └── train.sh
+```
+
+* run.sh: the core script, which includes the whole processes
+* train.sh: call the run.sh for training
+* decode.sh: call the run.sh for decoding
+* binary.sh: generate the datasets alone
+* conf: the folder to save the configure files (.yaml). 
+* local: the folder to save utils shell scripts
+  * monitor.sh: check the GPUS for running the program automatically 
+  * parse_options.sh: parse the parameters for run.sh
+  * path.sh: no use
+  * utils.sh: the utils shell functions
+  
+## Citations
+```angular2html
+@inproceedings{xu-etal-2021-stacked,
+    title = "Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders",
+    author = "Xu, Chen  and
+      Hu, Bojie  and
+      Li, Yanyang  and
+      Zhang, Yuhao  and
+      Huang, Shen  and
+      Ju, Qi  and
+      Xiao, Tong  and
+      Zhu, Jingbo",
+    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
+    month = aug,
+    year = "2021",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2021.acl-long.204",
+    doi = "10.18653/v1/2021.acl-long.204",
+    pages = "2619--2630",
+}
+```
\ No newline at end of file
--- a/docs/Makefile
+++ b/docs/Makefile
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = python -msphinx
+SPHINXPROJ    = fairseq
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
--- a/docs/_static/theme_overrides.css
+++ b/docs/_static/theme_overrides.css
+.wy-table-responsive table td kbd {
+    white-space: nowrap;
+}
+.wy-table-responsive table td {
+    white-space: normal !important;
+}
+.wy-table-responsive {
+    overflow: visible !important;
+}
--- a/docs/command_line_tools.rst
+++ b/docs/command_line_tools.rst
+.. _Command-line Tools:
+
+Command-line Tools
+==================
+
+Fairseq provides several command-line tools for training and evaluating models:
+
+- :ref:`fairseq-preprocess`: Data pre-processing: build vocabularies and binarize training data
+- :ref:`fairseq-train`: Train a new model on one or multiple GPUs
+- :ref:`fairseq-generate`: Translate pre-processed data with a trained model
+- :ref:`fairseq-interactive`: Translate raw text with a trained model
+- :ref:`fairseq-score`: BLEU scoring of generated translations against reference translations
+- :ref:`fairseq-eval-lm`: Language model evaluation
+
+
+.. _fairseq-preprocess:
+
+fairseq-preprocess
+~~~~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.preprocess
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_preprocessing_parser
+        :prog: fairseq-preprocess
+
+
+.. _fairseq-train:
+
+fairseq-train
+~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.train
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_training_parser
+        :prog: fairseq-train
+
+
+.. _fairseq-generate:
+
+fairseq-generate
+~~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.generate
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_generation_parser
+        :prog: fairseq-generate
+
+
+.. _fairseq-interactive:
+
+fairseq-interactive
+~~~~~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.interactive
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_interactive_generation_parser
+        :prog: fairseq-interactive
+
+
+.. _fairseq-score:
+
+fairseq-score
+~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.score
+
+    .. argparse::
+        :module: fairseq_cli.score
+        :func: get_parser
+        :prog: fairseq-score
+
+
+.. _fairseq-eval-lm:
+
+fairseq-eval-lm
+~~~~~~~~~~~~~~~
+.. automodule:: fairseq_cli.eval_lm
+
+    .. argparse::
+        :module: fairseq.options
+        :func: get_eval_lm_parser
+        :prog: fairseq-eval-lm
--- a/docs/conf.py
+++ b/docs/conf.py
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+#
+# fairseq documentation build configuration file, created by
+# sphinx-quickstart on Fri Aug 17 21:45:30 2018.
+#
+# This file is execfile()d with the current directory set to its
+# containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+
+import os
+import sys
+from fairseq import __version__
+
+
+# source code directory, relative to this file, for sphinx-autobuild
+sys.path.insert(0, os.path.abspath(".."))
+
+source_suffix = [".rst"]
+
+# -- General configuration ------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    "sphinx.ext.autodoc",
+    "sphinx.ext.intersphinx",
+    "sphinx.ext.viewcode",
+    "sphinx.ext.napoleon",
+    "sphinxarg.ext",
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ["_templates"]
+
+# The master toctree document.
+master_doc = "index"
+
+# General information about the project.
+project = "fairseq"
+copyright = "Facebook AI Research (FAIR)"
+author = "Facebook AI Research (FAIR)"
+
+github_doc_root = "https://github.com/pytorch/fairseq/tree/master/docs/"
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = __version__
+# The full version, including alpha/beta/rc tags.
+release = __version__
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This patterns also effect to html_static_path and html_extra_path
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = "sphinx"
+highlight_language = "python"
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = False
+
+
+# -- Options for HTML output ----------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = "sphinx_rtd_theme"
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ["_static"]
+
+html_context = {
+    "css_files": [
+        "_static/theme_overrides.css",  # override wide tables in RTD theme
+    ],
+}
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# This is required for the alabaster theme
+# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
+# html_sidebars = {
+#    '**': [
+#        'about.html',
+#        'navigation.html',
+#        'relations.html',  # needs 'show_related': True theme option to display
+#        'searchbox.html',
+#        'donate.html',
+#    ]
+# }
+
+
+# Example configuration for intersphinx: refer to the Python standard library.
+intersphinx_mapping = {
+    "numpy": ("http://docs.scipy.org/doc/numpy/", None),
+    "python": ("https://docs.python.org/", None),
+    "torch": ("https://pytorch.org/docs/master/", None),
+}
--- a/docs/criterions.rst
+++ b/docs/criterions.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. _Criterions:
+
+Criterions
+==========
+
+Criterions compute the loss function given the model and batch, roughly::
+
+  loss = criterion(model, batch)
+
+.. automodule:: fairseq.criterions
+    :members:
+
+.. autoclass:: fairseq.criterions.FairseqCriterion
+    :members:
+    :undoc-members:
+
+.. autoclass:: fairseq.criterions.adaptive_loss.AdaptiveLoss
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.criterions.composite_loss.CompositeLoss
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.criterions.cross_entropy.CrossEntropyCriterion
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.criterions.label_smoothed_cross_entropy.LabelSmoothedCrossEntropyCriterion
+    :members:
+    :undoc-members:
--- a/docs/data.rst
+++ b/docs/data.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: fairseq.data
+
+Data Loading and Utilities
+==========================
+
+.. _datasets:
+
+Datasets
+--------
+
+**Datasets** define the data format and provide helpers for creating
+mini-batches.
+
+.. autoclass:: fairseq.data.FairseqDataset
+    :members:
+.. autoclass:: fairseq.data.LanguagePairDataset
+    :members:
+.. autoclass:: fairseq.data.MonolingualDataset
+    :members:
+
+**Helper Datasets**
+
+These datasets wrap other :class:`fairseq.data.FairseqDataset` instances and
+provide additional functionality:
+
+.. autoclass:: fairseq.data.BacktranslationDataset
+    :members:
+.. autoclass:: fairseq.data.ConcatDataset
+    :members:
+.. autoclass:: fairseq.data.ResamplingDataset
+    :members:
+.. autoclass:: fairseq.data.RoundRobinZipDatasets
+    :members:
+.. autoclass:: fairseq.data.TransformEosDataset
+    :members:
+
+
+Dictionary
+----------
+
+.. autoclass:: fairseq.data.Dictionary
+    :members:
+
+
+Iterators
+---------
+
+.. autoclass:: fairseq.data.CountingIterator
+    :members:
+.. autoclass:: fairseq.data.EpochBatchIterator
+    :members:
+.. autoclass:: fairseq.data.GroupedIterator
+    :members:
+.. autoclass:: fairseq.data.ShardedIterator
+    :members:
--- a/docs/docutils.conf
+++ b/docs/docutils.conf
+[writers]
+option-limit=0
--- a/docs/fairseq.gif
+++ b/docs/fairseq.gif
--- a/docs/fairseq_logo.png
+++ b/docs/fairseq_logo.png
--- a/docs/getting_started.rst
+++ b/docs/getting_started.rst
+Evaluating Pre-trained Models
+=============================
+
+First, download a pre-trained model along with its vocabularies:
+
+.. code-block:: console
+
+    > curl https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf -
+
+This model uses a `Byte Pair Encoding (BPE)
+vocabulary <https://arxiv.org/abs/1508.07909>`__, so we'll have to apply
+the encoding to the source text before it can be translated. This can be
+done with the
+`apply\_bpe.py <https://github.com/rsennrich/subword-nmt/blob/master/subword_nmt/apply_bpe.py>`__
+script using the ``wmt14.en-fr.fconv-cuda/bpecodes`` file. ``@@`` is
+used as a continuation marker and the original text can be easily
+recovered with e.g. ``sed s/@@ //g`` or by passing the ``--remove-bpe``
+flag to :ref:`fairseq-generate`. Prior to BPE, input text needs to be tokenized
+using ``tokenizer.perl`` from
+`mosesdecoder <https://github.com/moses-smt/mosesdecoder>`__.
+
+Let's use :ref:`fairseq-interactive` to generate translations interactively.
+Here, we use a beam size of 5 and preprocess the input with the Moses
+tokenizer and the given Byte-Pair Encoding vocabulary. It will automatically
+remove the BPE continuation markers and detokenize the output.
+
+.. code-block:: console
+
+    > MODEL_DIR=wmt14.en-fr.fconv-py
+    > fairseq-interactive \
+        --path $MODEL_DIR/model.pt $MODEL_DIR \
+        --beam 5 --source-lang en --target-lang fr \
+        --tokenizer moses \
+        --bpe subword_nmt --bpe-codes $MODEL_DIR/bpecodes
+    | loading model(s) from wmt14.en-fr.fconv-py/model.pt
+    | [en] dictionary: 44206 types
+    | [fr] dictionary: 44463 types
+    | Type the input sentence and press return:
+    Why is it rare to discover new marine mammal species?
+    S-0     Why is it rare to discover new marine mam@@ mal species ?
+    H-0     -0.0643349438905716     Pourquoi est-il rare de découvrir de nouvelles espèces de mammifères marins?
+    P-0     -0.0763 -0.1849 -0.0956 -0.0946 -0.0735 -0.1150 -0.1301 -0.0042 -0.0321 -0.0171 -0.0052 -0.0062 -0.0015
+
+This generation script produces three types of outputs: a line prefixed
+with *O* is a copy of the original source sentence; *H* is the
+hypothesis along with an average log-likelihood; and *P* is the
+positional score per token position, including the
+end-of-sentence marker which is omitted from the text.
+
+Other types of output lines you might see are *D*, the detokenized hypothesis,
+*T*, the reference target, *A*, alignment info, *E* the history of generation steps.
+
+See the `README <https://github.com/pytorch/fairseq#pre-trained-models>`__ for a
+full list of pre-trained models available.
+
+Training a New Model
+====================
+
+The following tutorial is for machine translation. For an example of how
+to use Fairseq for other tasks, such as :ref:`language modeling`, please see the
+``examples/`` directory.
+
+Data Pre-processing
+-------------------
+
+Fairseq contains example pre-processing scripts for several translation
+datasets: IWSLT 2014 (German-English), WMT 2014 (English-French) and WMT
+2014 (English-German). To pre-process and binarize the IWSLT dataset:
+
+.. code-block:: console
+
+    > cd examples/translation/
+    > bash prepare-iwslt14.sh
+    > cd ../..
+    > TEXT=examples/translation/iwslt14.tokenized.de-en
+    > fairseq-preprocess --source-lang de --target-lang en \
+        --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
+        --destdir data-bin/iwslt14.tokenized.de-en
+
+This will write binarized data that can be used for model training to
+``data-bin/iwslt14.tokenized.de-en``.
+
+Training
+--------
+
+Use :ref:`fairseq-train` to train a new model. Here a few example settings that work
+well for the IWSLT 2014 dataset:
+
+.. code-block:: console
+
+    > mkdir -p checkpoints/fconv
+    > CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
+        --optimizer nag --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
+        --arch fconv_iwslt_de_en --save-dir checkpoints/fconv
+
+By default, :ref:`fairseq-train` will use all available GPUs on your machine. Use the
+``CUDA_VISIBLE_DEVICES`` environment variable to select specific GPUs and/or to
+change the number of GPU devices that will be used.
+
+Also note that the batch size is specified in terms of the maximum
+number of tokens per batch (``--max-tokens``). You may need to use a
+smaller value depending on the available GPU memory on your system.
+
+Generation
+----------
+
+Once your model is trained, you can generate translations using
+:ref:`fairseq-generate` **(for binarized data)** or
+:ref:`fairseq-interactive` **(for raw text)**:
+
+.. code-block:: console
+
+    > fairseq-generate data-bin/iwslt14.tokenized.de-en \
+        --path checkpoints/fconv/checkpoint_best.pt \
+        --batch-size 128 --beam 5
+    | [de] dictionary: 35475 types
+    | [en] dictionary: 24739 types
+    | data-bin/iwslt14.tokenized.de-en test 6750 examples
+    | model fconv
+    | loaded checkpoint trainings/fconv/checkpoint_best.pt
+    S-721   danke .
+    T-721   thank you .
+    ...
+
+To generate translations with only a CPU, use the ``--cpu`` flag. BPE
+continuation markers can be removed with the ``--remove-bpe`` flag.
+
+Advanced Training Options
+=========================
+
+Large mini-batch training with delayed updates
+----------------------------------------------
+
+The ``--update-freq`` option can be used to accumulate gradients from
+multiple mini-batches and delay updating, creating a larger effective
+batch size. Delayed updates can also improve training speed by reducing
+inter-GPU communication costs and by saving idle time caused by variance
+in workload across GPUs. See `Ott et al.
+(2018) <https://arxiv.org/abs/1806.00187>`__ for more details.
+
+To train on a single GPU with an effective batch size that is equivalent
+to training on 8 GPUs:
+
+.. code-block:: console
+
+    > CUDA_VISIBLE_DEVICES=0 fairseq-train --update-freq 8 (...)
+
+Training with half precision floating point (FP16)
+--------------------------------------------------
+
+.. note::
+
+    FP16 training requires a Volta GPU and CUDA 9.1 or greater
+
+Recent GPUs enable efficient half precision floating point computation,
+e.g., using `Nvidia Tensor Cores
+<https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html>`__.
+Fairseq supports FP16 training with the ``--fp16`` flag:
+
+.. code-block:: console
+
+    > fairseq-train --fp16 (...)
+
+Distributed training
+--------------------
+
+Distributed training in fairseq is implemented on top of ``torch.distributed``.
+The easiest way to launch jobs is with the `torch.distributed.launch
+<https://pytorch.org/docs/stable/distributed.html#launch-utility>`__ tool.
+
+For example, to train a large English-German Transformer model on 2 nodes each
+with 8 GPUs (in total 16 GPUs), run the following command on each node,
+replacing ``node_rank=0`` with ``node_rank=1`` on the second node and making
+sure to update ``--master_addr`` to the IP address of the first node:
+
+.. code-block:: console
+
+    > python -m torch.distributed.launch --nproc_per_node=8 \
+        --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" \
+        --master_port=12345 \
+        $(which fairseq-train) data-bin/wmt16_en_de_bpe32k \
+        --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
+        --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
+        --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
+        --lr 0.0005 \
+        --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
+        --max-tokens 3584 \
+        --max-epoch 70 \
+        --fp16
+
+On SLURM clusters, fairseq will automatically detect the number of nodes and
+GPUs, but a port number must be provided:
+
+.. code-block:: console
+
+    > salloc --gpus=16 --nodes 2 (...)
+    > srun fairseq-train --distributed-port 12345 (...).
+
+Sharding very large datasets
+----------------------------
+
+It can be challenging to train over very large datasets, particularly if your
+machine does not have much system RAM. Most tasks in fairseq support training
+over "sharded" datasets, in which the original dataset has been preprocessed
+into non-overlapping chunks (or "shards").
+
+For example, instead of preprocessing all your data into a single "data-bin"
+directory, you can split the data and create "data-bin1", "data-bin2", etc.
+Then you can adapt your training command like so:
+
+.. code-block:: console
+
+    > fairseq-train data-bin1:data-bin2:data-bin3 (...)
+
+Training will now iterate over each shard, one by one, with each shard
+corresponding to an "epoch", thus reducing system memory usage.
--- a/docs/hydra_integration.md
+++ b/docs/hydra_integration.md
--- a/docs/index.rst
+++ b/docs/index.rst
+.. fairseq documentation master file, created by
+   sphinx-quickstart on Fri Aug 17 21:45:30 2018.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+:github_url: https://github.com/pytorch/fairseq
+
+
+fairseq documentation
+=====================
+
+Fairseq is a sequence modeling toolkit written in `PyTorch
+<http://pytorch.org/>`_ that allows researchers and developers to
+train custom models for translation, summarization, language modeling and other
+text generation tasks.
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Getting Started
+
+    getting_started
+    command_line_tools
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Extending Fairseq
+
+    overview
+    tutorial_simple_lstm
+    tutorial_classifying_names
+
+.. toctree::
+    :maxdepth: 2
+    :caption: Library Reference
+
+    tasks
+    models
+    criterions
+    optim
+    lr_scheduler
+    data
+    modules
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`search`
--- a/docs/lr_scheduler.rst
+++ b/docs/lr_scheduler.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. _Learning Rate Schedulers:
+
+Learning Rate Schedulers
+========================
+
+Learning Rate Schedulers update the learning rate over the course of training.
+Learning rates can be updated after each update via :func:`step_update` or at
+epoch boundaries via :func:`step`.
+
+.. automodule:: fairseq.optim.lr_scheduler
+    :members:
+
+.. autoclass:: fairseq.optim.lr_scheduler.FairseqLRScheduler
+    :members:
+    :undoc-members:
+
+.. autoclass:: fairseq.optim.lr_scheduler.cosine_lr_scheduler.CosineSchedule
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.fixed_schedule.FixedSchedule
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.inverse_square_root_schedule.InverseSquareRootSchedule
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.reduce_lr_on_plateau.ReduceLROnPlateau
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.lr_scheduler.triangular_lr_scheduler.TriangularSchedule
+    :members:
+    :undoc-members:
--- a/docs/make.bat
+++ b/docs/make.bat
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=python -msphinx
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+set SPHINXPROJ=fairseq
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The Sphinx module was not found. Make sure you have Sphinx installed,
+	echo.then set the SPHINXBUILD environment variable to point to the full
+	echo.path of the 'sphinx-build' executable. Alternatively you may add the
+	echo.Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+
+:end
+popd
--- a/docs/models.rst
+++ b/docs/models.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: fairseq.models
+
+.. _Models:
+
+Models
+======
+
+A Model defines the neural network's ``forward()`` method and encapsulates all
+of the learnable parameters in the network. Each model also provides a set of
+named *architectures* that define the precise network configuration (e.g.,
+embedding dimension, number of layers, etc.).
+
+Both the model type and architecture are selected via the ``--arch``
+command-line argument. Once selected, a model may expose additional command-line
+arguments for further configuration.
+
+.. note::
+
+    All fairseq Models extend :class:`BaseFairseqModel`, which in turn extends
+    :class:`torch.nn.Module`. Thus any fairseq Model can be used as a
+    stand-alone Module in other PyTorch code.
+
+
+Convolutional Neural Networks (CNN)
+-----------------------------------
+
+.. module:: fairseq.models.fconv
+.. autoclass:: fairseq.models.fconv.FConvModel
+    :members:
+.. autoclass:: fairseq.models.fconv.FConvEncoder
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.fconv.FConvDecoder
+    :members:
+
+
+Long Short-Term Memory (LSTM) networks
+--------------------------------------
+
+.. module:: fairseq.models.lstm
+.. autoclass:: fairseq.models.lstm.LSTMModel
+    :members:
+.. autoclass:: fairseq.models.lstm.LSTMEncoder
+    :members:
+.. autoclass:: fairseq.models.lstm.LSTMDecoder
+    :members:
+
+
+Transformer (self-attention) networks
+-------------------------------------
+
+.. module:: fairseq.models.transformer
+.. autoclass:: fairseq.models.transformer.TransformerModel
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerEncoder
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerEncoderLayer
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerDecoder
+    :members:
+.. autoclass:: fairseq.models.transformer.TransformerDecoderLayer
+    :members:
+
+
+Adding new models
+-----------------
+
+.. currentmodule:: fairseq.models
+.. autofunction:: fairseq.models.register_model
+.. autofunction:: fairseq.models.register_model_architecture
+.. autoclass:: fairseq.models.BaseFairseqModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqEncoderDecoderModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqEncoderModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqLanguageModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqMultiModel
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.models.FairseqEncoder
+    :members:
+.. autoclass:: fairseq.models.CompositeEncoder
+    :members:
+.. autoclass:: fairseq.models.FairseqDecoder
+    :members:
+
+
+.. _Incremental decoding:
+
+Incremental decoding
+--------------------
+
+.. autoclass:: fairseq.models.FairseqIncrementalDecoder
+    :members:
+    :undoc-members:
--- a/docs/modules.rst
+++ b/docs/modules.rst
+Modules
+=======
+
+Fairseq provides several stand-alone :class:`torch.nn.Module` classes that may
+be helpful when implementing a new :class:`~fairseq.models.BaseFairseqModel`.
+
+.. automodule:: fairseq.modules
+    :members:
+    :undoc-members:
--- a/docs/optim.rst
+++ b/docs/optim.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. _optimizers:
+
+Optimizers
+==========
+
+Optimizers update the Model parameters based on the gradients.
+
+.. automodule:: fairseq.optim
+    :members:
+
+.. autoclass:: fairseq.optim.FairseqOptimizer
+    :members:
+    :undoc-members:
+
+.. autoclass:: fairseq.optim.adadelta.Adadelta
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.adagrad.Adagrad
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.adafactor.FairseqAdafactor
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.adam.FairseqAdam
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.fp16_optimizer.FP16Optimizer
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.nag.FairseqNAG
+    :members:
+    :undoc-members:
+.. autoclass:: fairseq.optim.sgd.SGD
+    :members:
+    :undoc-members:
--- a/docs/overview.rst
+++ b/docs/overview.rst
+Overview
+========
+
+Fairseq can be extended through user-supplied `plug-ins
+<https://en.wikipedia.org/wiki/Plug-in_(computing)>`_. We support five kinds of
+plug-ins:
+
+- :ref:`Models` define the neural network architecture and encapsulate all of the
+  learnable parameters.
+- :ref:`Criterions` compute the loss function given the model outputs and targets.
+- :ref:`Tasks` store dictionaries and provide helpers for loading/iterating over
+  Datasets, initializing the Model/Criterion and calculating the loss.
+- :ref:`Optimizers` update the Model parameters based on the gradients.
+- :ref:`Learning Rate Schedulers` update the learning rate over the course of
+  training.
+
+**Training Flow**
+
+Given a ``model``, ``criterion``, ``task``, ``optimizer`` and ``lr_scheduler``,
+fairseq implements the following high-level training flow::
+
+  for epoch in range(num_epochs):
+      itr = task.get_batch_iterator(task.dataset('train'))
+      for num_updates, batch in enumerate(itr):
+          task.train_step(batch, model, criterion, optimizer)
+          average_and_clip_gradients()
+          optimizer.step()
+          lr_scheduler.step_update(num_updates)
+      lr_scheduler.step(epoch)
+
+where the default implementation for ``task.train_step`` is roughly::
+
+  def train_step(self, batch, model, criterion, optimizer, **unused):
+      loss = criterion(model, batch)
+      optimizer.backward(loss)
+      return loss
+
+**Registering new plug-ins**
+
+New plug-ins are *registered* through a set of ``@register`` function
+decorators, for example::
+
+  @register_model('my_lstm')
+  class MyLSTM(FairseqEncoderDecoderModel):
+      (...)
+
+Once registered, new plug-ins can be used with the existing :ref:`Command-line
+Tools`. See the Tutorial sections for more detailed walkthroughs of how to add
+new plug-ins.
+
+**Loading plug-ins from another directory**
+
+New plug-ins can be defined in a custom module stored in the user system. In
+order to import the module, and make the plugin available to *fairseq*, the
+command line supports the ``--user-dir`` flag that can be used to specify a
+custom location for additional modules to load into *fairseq*.
+
+For example, assuming this directory tree::
+
+  /home/user/my-module/
+  └── __init__.py
+  
+with ``__init__.py``::
+
+  from fairseq.models import register_model_architecture
+  from fairseq.models.transformer import transformer_vaswani_wmt_en_de_big
+
+  @register_model_architecture('transformer', 'my_transformer')
+  def transformer_mmt_big(args):
+      transformer_vaswani_wmt_en_de_big(args)
+
+it is possible to invoke the :ref:`fairseq-train` script with the new architecture with::
+
+  fairseq-train ... --user-dir /home/user/my-module -a my_transformer --task translation
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
+sphinx<2.0
+sphinx-argparse
--- a/docs/tasks.rst
+++ b/docs/tasks.rst
+.. role:: hidden
+    :class: hidden-section
+
+.. module:: fairseq.tasks
+
+.. _Tasks:
+
+Tasks
+=====
+
+Tasks store dictionaries and provide helpers for loading/iterating over
+Datasets, initializing the Model/Criterion and calculating the loss.
+
+Tasks can be selected via the ``--task`` command-line argument. Once selected, a
+task may expose additional command-line arguments for further configuration.
+
+Example usage::
+
+    # setup the task (e.g., load dictionaries)
+    task = fairseq.tasks.setup_task(args)
+
+    # build model and criterion
+    model = task.build_model(args)
+    criterion = task.build_criterion(args)
+
+    # load datasets
+    task.load_dataset('train')
+    task.load_dataset('valid')
+
+    # iterate over mini-batches of data
+    batch_itr = task.get_batch_iterator(
+        task.dataset('train'), max_tokens=4096,
+    )
+    for batch in batch_itr:
+        # compute the loss
+        loss, sample_size, logging_output = task.get_loss(
+            model, criterion, batch,
+        )
+        loss.backward()
+
+
+Translation
+-----------
+
+.. autoclass:: fairseq.tasks.translation.TranslationTask
+
+.. _language modeling:
+
+Language Modeling
+-----------------
+
+.. autoclass:: fairseq.tasks.language_modeling.LanguageModelingTask
+
+
+Adding new tasks
+----------------
+
+.. autofunction:: fairseq.tasks.register_task
+.. autoclass:: fairseq.tasks.FairseqTask
+    :members:
+    :undoc-members:
--- a/docs/tutorial_classifying_names.rst
+++ b/docs/tutorial_classifying_names.rst
--- a/docs/tutorial_simple_lstm.rst
+++ b/docs/tutorial_simple_lstm.rst
--- a/egs/librispeech/asr/conf/rpr.yaml
+++ b/egs/librispeech/asr/conf/rpr.yaml
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
\ No newline at end of file
--- a/egs/librispeech/asr/conf/train.yaml
+++ b/egs/librispeech/asr/conf/train.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc.yaml
+++ b/egs/librispeech/asr/conf/train_ctc.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_conformer.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_conformer.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_conformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_conformer_rpr.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_conformer_rpr.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_conformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_debug.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_debug.yaml
+#train-subset: train-clean-100,train-clean-360,train-other-500
+train-subset: train-clean-100
+valid-subset: dev-clean
+
+max-epoch: 100
+max-update: 300000
+
+num-workers: 0
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 3
+decoder-layers: 3
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+adpater: subsample
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_rpr.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_rpr.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_sate.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_sate.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_sate
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+encoder-normalize-before: True
+decoder-normalize-before: True
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+text-encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+acoustic-encoder: transformer
+adapter: league
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_sate_conformer.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_sate_conformer.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_sate
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+encoder-normalize-before: True
+decoder-normalize-before: True
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+text-encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+acoustic-encoder: conformer
+adapter: league
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_sate_conformer_rpr.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_sate_conformer_rpr.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_sate
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+encoder-normalize-before: True
+decoder-normalize-before: True
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+text-encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+acoustic-encoder: conformer
+adapter: league
+
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_sate_rpr.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_sate_rpr.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_sate
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+encoder-normalize-before: True
+decoder-normalize-before: True
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+text-encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+acoustic-encoder: transformer
+adapter: league
+
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_ctc_templete.yaml
+++ b/egs/librispeech/asr/conf/train_ctc_templete.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_conformer_m
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 1e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+#dropout: 0.1
+#activation-fn: relu
+#encoder-embed-dim: 256
+#encoder-ffn-embed-dim: 2048
+#encoder-layers: 12
+#decoder-layers: 6
+#encoder-attention-heads: 4
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
+
+# conformer
+#macaron-style: True
+#use-cnn-module: True
+#cnn-module-kernel: 31
+
+# relative position encoding
+#encoder-attention-type: relative
+#decoder-attention-type: relative
+#max-encoder-relative-length: 100
+#max-decoder-relative-length: 20
+
+
--- a/egs/librispeech/asr/conf/train_sate.yaml
+++ b/egs/librispeech/asr/conf/train_sate.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_sate
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+encoder-normalize-before: True
+decoder-normalize-before: True
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+text-encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+acoustic-encoder: transformer
+adapter: league
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/librispeech/asr/conf/train_sate_rpr.yaml
+++ b/egs/librispeech/asr/conf/train_sate_rpr.yaml
+train-subset: train_st,train_covost
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_sate
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+encoder-normalize-before: True
+decoder-normalize-before: True
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+text-encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+acoustic-encoder: transformer
+adapter: league
+
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/librispeech/asr/decode.sh
+++ b/egs/librispeech/asr/decode.sh
+#! /bin/bash
+
+gpu_num=1
+
+data_dir=
+test_subset=(test-cleam test-other)
+
+exp_name=
+if [ "$#" -eq 1 ]; then
+    exp_name=$1
+fi
+
+n_average=10
+beam_size=5
+len_penalty=1.0
+max_tokens=10000
+dec_model=checkpoint_best.pt
+
+cmd="./run.sh
+    --stage 2
+    --stop_stage 2
+    --gpu_num ${gpu_num}
+    --exp_name ${exp_name}
+    --n_average ${n_average}
+    --beam_size ${beam_size}
+    --len_penalty ${len_penalty}
+    --max_tokens ${max_tokens}
+    --dec_model ${dec_model}
+    "
+
+if [[ -n ${data_dir} ]]; then
+    cmd="$cmd --data_dir ${data_dir}"
+fi
+if [[ -n ${test_subset} ]]; then
+    test_subset=`echo ${test_subset[*]} | sed 's/ /,/g'`
+    cmd="$cmd --test_subset ${test_subset}"
+fi
+
+echo $cmd
+eval $cmd
--- a/egs/librispeech/asr/local/monitor.sh
+++ b/egs/librispeech/asr/local/monitor.sh
+gpu_num=1
+
+while :
+do
+    all_devices=$(seq 0 `gpustat | sed '1,2d' | wc -l`);
+    count=0
+    for dev in ${all_devices[@]}
+    do
+        line=`expr $dev + 2`
+        use=`gpustat -p | head -n $line | tail -1 | cut -d '|' -f4 | wc -w`
+        if [[ $use -eq 0 ]]; then
+            device[$count]=$dev
+            count=`expr $count + 1`
+            if [[ $count -eq $gpu_num ]]; then
+                break
+            fi
+        fi
+    done
+    if [[ ${#device[@]} -lt $gpu_num ]]; then
+        sleep 60s
+    else
+        echo "Run $cmd"
+        eval $cmd
+        sleep 10s
+        exit
+    fi
+done
--- a/egs/librispeech/asr/local/parse_options.sh
+++ b/egs/librispeech/asr/local/parse_options.sh
+#!/usr/bin/env bash
+
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey);
+#                 Arnab Ghoshal, Karel Vesely
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Parse command-line options.
+# To be sourced by another script (as in ". parse_options.sh").
+# Option format is: --option-name arg
+# and shell variable "option_name" gets set to value "arg."
+# The exception is --help, which takes no arguments, but prints the
+# $help_message variable (if defined).
+
+
+###
+### The --config file options have lower priority to command line
+### options, so we need to import them first...
+###
+
+# Now import all the configs specified by command-line, in left-to-right order
+for ((argpos=1; argpos<$#; argpos++)); do
+  if [ "${!argpos}" == "--config" ]; then
+    argpos_plus1=$((argpos+1))
+    config=${!argpos_plus1}
+    [ ! -r $config ] && echo "$0: missing config '$config'" && exit 1
+    . $config  # source the config file.
+  fi
+done
+
+
+###
+### Now we process the command line options
+###
+while true; do
+  [ -z "${1:-}" ] && break;  # break if there are no arguments
+  case "$1" in
+    # If the enclosing script is called with --help option, print the help
+    # message and exit.  Scripts should put help messages in $help_message
+    --help|-h) if [ -z "$help_message" ]; then echo "No help found." 1>&2;
+      else printf "$help_message\n" 1>&2 ; fi;
+      exit 0 ;;
+    --*=*) echo "$0: options to scripts must be of the form --name value, got '$1'"
+      exit 1 ;;
+    # If the first command-line argument begins with "--" (e.g. --foo-bar),
+    # then work out the variable name as $name, which will equal "foo_bar".
+    --*) name=`echo "$1" | sed s/^--// | sed s/-/_/g`;
+      # Next we test whether the variable in question is undefned-- if so it's
+      # an invalid option and we die.  Note: $0 evaluates to the name of the
+      # enclosing script.
+      # The test [ -z ${foo_bar+xxx} ] will return true if the variable foo_bar
+      # is undefined.  We then have to wrap this test inside "eval" because
+      # foo_bar is itself inside a variable ($name).
+      eval '[ -z "${'$name'+xxx}" ]' && echo "$0: invalid option $1" 1>&2 && exit 1;
+
+      oldval="`eval echo \\$$name`";
+      # Work out whether we seem to be expecting a Boolean argument.
+      if [ "$oldval" == "true" ] || [ "$oldval" == "false" ]; then
+        was_bool=true;
+      else
+        was_bool=false;
+      fi
+
+      # Set the variable to the right value-- the escaped quotes make it work if
+      # the option had spaces, like --cmd "queue.pl -sync y"
+      eval $name=\"$2\";
+
+      # Check that Boolean-valued arguments are really Boolean.
+      if $was_bool && [[ "$2" != "true" && "$2" != "false" ]]; then
+        echo "$0: expected \"true\" or \"false\": $1 $2" 1>&2
+        exit 1;
+      fi
+      shift 2;
+      ;;
+  *) break;
+  esac
+done
+
+
+# Check for an empty argument to the --cmd option, which can easily occur as a
+# result of scripting errors.
+[ ! -z "${cmd+xxx}" ] && [ -z "$cmd" ] && echo "$0: empty argument to --cmd option" 1>&2 && exit 1;
+
+
+true; # so this script returns exit code 0.
--- a/egs/librispeech/asr/local/path.sh
+++ b/egs/librispeech/asr/local/path.sh
+MAIN_ROOT=$PWD/../../..
+KALDI_ROOT=$MAIN_ROOT/tools/kaldi
+
+export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PATH
+[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1
+. $KALDI_ROOT/tools/config/common_path.sh
+export LC_ALL=C
+
+export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$MAIN_ROOT/src/lib
+export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$MAIN_ROOT/tools/chainer_ctc/ext/warp-ctc/build
+. "${MAIN_ROOT}"/tools/activate_python.sh && . "${MAIN_ROOT}"/tools/extra_path.sh
+export PATH=$MAIN_ROOT/utils:$MAIN_ROOT/espnet/bin:$PATH
+
+export OMP_NUM_THREADS=1
+
+# check extra module installation
+if ! which tokenizer.perl > /dev/null; then
+    echo "Error: it seems that moses is not installed." >&2
+    echo "Error: please install moses as follows." >&2
+    echo "Error: cd ${MAIN_ROOT}/tools && make moses.done" >&2
+    return 1
+fi
+
+# NOTE(kan-bayashi): Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
+export PYTHONIOENCODING=UTF-8
--- a/egs/librispeech/asr/local/utils.sh
+++ b/egs/librispeech/asr/local/utils.sh
+
+get_devices(){
+    gpu_num=$1
+    use_cpu=$2
+    device=()
+    while :
+    do
+        record=`mktemp -t temp.record.XXXXXX`
+        gpustat > $record
+        all_devices=$(seq 0 `cat $record | sed '1,2d' | wc -l`);
+        count=0
+        for dev in ${all_devices[@]}
+        do
+            line=`expr $dev + 2`
+            use=`cat $record | head -n $line | tail -1 | cut -d '|' -f3 | cut -d '/' -f1`
+            if [[ $use -lt 100 ]]; then
+                device[$count]=$dev
+                count=`expr $count + 1`
+                if [[ $count -eq $gpu_num ]]; then
+                    break
+                fi
+            fi
+        done
+        if [[ ${#device[@]} -lt $gpu_num ]]; then
+            if [[ $use_cpu -eq 1 ]]; then
+                device=(-1)
+            else
+                sleep 60s
+            fi
+        else
+            break
+        fi
+    done
+
+    echo ${device[*]} | sed 's/ /,/g'
+    return $?
+}
+
+
--- a/egs/librispeech/asr/run.sh
+++ b/egs/librispeech/asr/run.sh
+#! /bin/bash
+
+# Processing LibriSpeech Datasets
+
+# Copyright 2021 Natural Language Processing Laboratory 
+# Xu Chen (xuchenneu@163.com)
+
+# Set bash to 'debug' mode, it will exit on :
+# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
+set -e
+#set -u
+set -o pipefail
+export PYTHONIOENCODING=UTF-8
+
+eval=1
+time=$(date "+%m%d_%H%M")
+
+stage=0
+stop_stage=0
+
+######## hardware ########
+# devices
+device=()
+gpu_num=8
+update_freq=1
+
+root_dir=~/st/Fairseq-S2T
+pwd_dir=$PWD
+
+# dataset
+src_lang=en
+lang=${src_lang}
+
+dataset=librispeech
+task=speech_to_text
+vocab_type=unigram
+vocab_size=10000
+speed_perturb=0
+
+use_specific_dict=0
+specific_prefix=valid
+specific_dir=/home/xuchen/st/data/mustc/st_lcrm/en-de
+asr_vocab_prefix=spm_unigram10000_st_share
+
+org_data_dir=/media/data/${dataset}
+data_dir=~/st/data/${dataset}
+test_subset=dev-clean,dev-other,test-clean,test-other
+
+# exp
+exp_prefix=${time}
+extra_tag=
+extra_parameter=
+exp_tag=baseline
+exp_name=
+
+# config
+train_config=train_ctc.yaml
+data_config=config.yaml
+
+# training setting
+fp16=1
+max_tokens=40000
+step_valid=0
+
+# decoding setting
+dec_model=checkpoint_best.pt
+n_average=10
+beam_size=5
+len_penalty=1.0
+
+if [[ ${speed_perturb} -eq 1 ]]; then
+    data_dir=${data_dir}_sp
+    exp_prefix=${exp_prefix}_sp
+fi
+if [[ ${use_specific_dict} -eq 1 ]]; then
+    data_dir=${data_dir}_${specific_prefix}
+    exp_prefix=${exp_prefix}_${specific_prefix}
+fi
+
+. ./local/parse_options.sh || exit 1;
+
+# full path
+train_config=$pwd_dir/conf/${train_config}
+if [[ -z ${exp_name} ]]; then
+    exp_name=${exp_prefix}_$(basename ${train_config%.*})_${exp_tag}
+    if [[ -n ${extra_tag} ]]; then
+        exp_name=${exp_name}_${extra_tag}
+    fi
+fi
+model_dir=$root_dir/../checkpoints/$dataset/asr/${exp_name}
+
+if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
+    echo "stage -1: Data Download"
+    # pass
+fi
+
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+    ### Task dependent. You have to make data the following preparation part by yourself.
+    ### But you can utilize Kaldi recipes in most cases
+    echo "stage 0: Data Preparation"
+
+    if [[ ! -e ${data_dir} ]]; then
+        mkdir -p ${data_dir}
+    fi
+    source ~/tools/audio/bin/activate
+
+    cmd="python ${root_dir}/examples/speech_to_text/prep_librispeech_data.py
+        --data-root ${org_data_dir}
+        --output-root ${data_dir}
+        --vocab-type ${vocab_type}
+        --vocab-size ${vocab_size}"
+
+    if [[ ${use_specific_dict} -eq 1 ]]; then
+        cp -r ${specific_dir}/${asr_vocab_prefix}.* ${data_dir}/${lang}
+        cmd="$cmd
+        --asr-prefix ${asr_vocab_prefix}"
+    fi
+    if [[ ${speed_perturb} -eq 1 ]]; then
+        cmd="$cmd
+        --speed-perturb"
+    fi
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    [[ $eval -eq 1 ]] && eval $cmd
+fi
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+    echo "stage 1: ASR Network Training"
+    [[ ! -d ${data_dir} ]] && echo "The data dir ${data_dir} is not existing!" && exit 1;
+
+    if [[ -z ${device} || ${#device[@]} -eq 0 ]]; then
+		if [[ ${gpu_num} -eq 0 ]]; then
+			device=()
+		else
+        	source ./local/utils.sh
+        	device=$(get_devices $gpu_num 0)
+		fi
+    fi
+
+    echo -e "dev=${device} data=${data_dir} model=${model_dir}"
+
+    if [[ ! -d ${model_dir} ]]; then
+        mkdir -p ${model_dir}
+    else
+        echo "${model_dir} exists."
+    fi
+
+    cp ${BASH_SOURCE[0]} ${model_dir}
+    cp ${PWD}/train.sh ${model_dir}
+    cp ${train_config} ${model_dir}
+
+    cmd="python3 -u ${root_dir}/fairseq_cli/train.py
+        ${data_dir}
+        --config-yaml ${data_config}
+        --train-config ${train_config}
+        --task ${task}
+        --max-tokens ${max_tokens}
+        --skip-invalid-size-inputs-valid-test
+        --update-freq ${update_freq}
+        --log-interval 100
+        --save-dir ${model_dir}
+        --tensorboard-logdir ${model_dir}"
+
+    if [[ -n ${extra_parameter} ]]; then
+        cmd="${cmd}
+        ${extra_parameter}"
+    fi
+	if [[ ${gpu_num} -gt 0 ]]; then
+		cmd="${cmd}
+        --distributed-world-size $gpu_num
+        --ddp-backend no_c10d"
+	fi
+    if [[ $fp16 -eq 1 ]]; then
+        cmd="${cmd}
+        --fp16"
+    fi
+    if [[ $step_valid -eq 1 ]]; then
+        validate_interval=1
+        save_interval=1
+        keep_last_epochs=10
+        no_epoch_checkpoints=0
+        save_interval_updates=500
+        keep_interval_updates=10
+    else
+        validate_interval=1
+        keep_last_epochs=10
+    fi
+    if [[ -n $no_epoch_checkpoints && $no_epoch_checkpoints -eq 1 ]]; then
+        cmd="$cmd
+        --no-epoch-checkpoints"
+    fi
+    if [[ -n $validate_interval ]]; then
+        cmd="${cmd}
+        --validate-interval $validate_interval "
+    fi
+    if [[ -n $save_interval ]]; then
+        cmd="${cmd}
+        --save-interval $save_interval "
+    fi
+    if [[ -n $keep_last_epochs ]]; then
+        cmd="${cmd}
+        --keep-last-epochs $keep_last_epochs "
+    fi
+    if [[ -n $save_interval_updates ]]; then
+        cmd="${cmd}
+        --save-interval-updates $save_interval_updates"
+        if [[ -n $keep_interval_updates ]]; then
+        cmd="${cmd}
+        --keep-interval-updates $keep_interval_updates"
+        fi
+    fi
+
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+
+    # save info
+    log=./history.log
+    echo "${time} | ${device} | ${data_dir} | ${model_dir} " >> $log
+    cat $log | tail -n 50 > tmp.log
+    mv tmp.log $log
+    export CUDA_VISIBLE_DEVICES=${device}
+
+    cmd="nohup ${cmd} >> ${model_dir}/train.log 2>&1 &"
+    if [[ $eval -eq 1 ]]; then
+		eval $cmd
+		sleep 2s
+		tail -n `wc -l ${model_dir}/train.log | awk '{print $1+1}'` -f ${model_dir}/train.log
+	fi
+fi
+wait
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+    echo "stage 2: ASR Decoding"
+    if [[ ${n_average} -ne 1 ]]; then
+        # Average models
+		dec_model=avg_${n_average}_checkpoint.pt
+
+		cmd="python ${root_dir}/scripts/average_checkpoints.py
+        --inputs ${model_dir}
+        --num-epoch-checkpoints ${n_average}
+        --output ${model_dir}/${dec_model}"
+    	echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    	[[ $eval -eq 1 ]] && eval $cmd
+	else
+		dec_model=${dec_model}
+	fi
+
+    if [[ -z ${device} || ${#device[@]} -eq 0 ]]; then
+		if [[ ${gpu_num} -eq 0 ]]; then
+			device=()
+		else
+        	source ./local/utils.sh
+        	device=$(get_devices $gpu_num 0)
+		fi
+    fi
+    export CUDA_VISIBLE_DEVICES=${device}
+
+	#tmp_file=$(mktemp ${model_dir}/tmp-XXXXX)
+	#trap 'rm -rf ${tmp_file}' EXIT
+	result_file=${model_dir}/decode_result
+	[[ -f ${result_file} ]] && rm ${result_file}
+
+    test_subset=(${test_subset//,/ })
+	for subset in ${test_subset[@]}; do
+        subset=${subset}
+  		cmd="python ${root_dir}/fairseq_cli/generate.py
+        ${data_dir}
+        --config-yaml ${data_config}
+        --gen-subset ${subset}
+        --task speech_to_text
+        --path ${model_dir}/${dec_model}
+        --results-path ${model_dir}
+        --max-tokens ${max_tokens}
+        --beam ${beam_size}
+        --lenpen ${len_penalty}
+        --scoring wer"
+    	echo -e "\033[34mRun command: \n${cmd} \033[0m"
+
+        if [[ $eval -eq 1 ]]; then
+    	    eval $cmd
+    	    tail -n 1 ${model_dir}/generate-${subset}.txt >> ${result_file}
+        fi
+	done
+    cat ${result_file}
+fi
--- a/egs/librispeech/asr/train.sh
+++ b/egs/librispeech/asr/train.sh
+#! /bin/bash
+
+# training the model
+
+gpu_num=8
+update_freq=2
+max_tokens=20000
+
+extra_tag=
+extra_parameter=
+
+#extra_tag="${extra_tag}"
+#extra_parameter="${extra_parameter} "
+
+exp_tag=
+train_config=train_ctc.yaml
+
+cmd="./run.sh
+    --stage 1
+    --stop_stage 1
+    --gpu_num ${gpu_num}
+    --update_freq ${update_freq}
+    --train_config ${train_config}
+    --max_tokens ${max_tokens}
+    "
+
+if [[ -n ${exp_tag} ]]; then
+    cmd="$cmd --exp_tag ${exp_tag}"
+fi
+if [[ -n ${extra_tag} ]]; then
+    cmd="$cmd --extra_tag ${extra_tag}"
+fi
+if [[ -n ${extra_parameter} ]]; then
+    cmd="$cmd --extra_parameter \"${extra_parameter}\""
+fi
+
+echo $cmd
+eval $cmd
--- a/egs/mustc/asr/binary.sh
+++ b/egs/mustc/asr/binary.sh
+set -e
+
+eval=1
+
+root_dir=~/st/Fairseq-S2T
+data_dir=/home/xuchen/st/data/test
+vocab_dir=/home/xuchen/st/data/mustc/st_lcrm/en-de
+asr_vocab_prefix=spm_unigram10000_st_share
+
+src_lang=en
+tgt_lang=de
+splits=(2019)
+
+source ~/tools/audio/bin/activate
+
+splits=`echo ${splits[*]} | sed 's/ /,/g'`
+
+cp -r ${vocab_dir}/${asr_vocab_prefix}.* ${data_dir}/${src_lang}-${tgt_lang}
+rm -rf ${data_dir}/${src_lang}-${tgt_lang}/fbank80.zip
+
+cmd="python ${root_dir}/examples/speech_to_text/prep_st_data.py
+    --data-root ${data_dir}
+    --output-root ${data_dir}
+    --splits ${splits}
+    --task asr
+    --src-lang ${src_lang}
+    --tgt-lang ${tgt_lang}
+    --add-src
+    --share
+    --asr-prefix ${asr_vocab_prefix}
+    --cmvn-type utterance"
+
+    if [[ ${lcrm} -eq 1 ]]; then
+        cmd="$cmd
+    --lowercase-src
+    --rm-punc-src"
+    fi
+    if [[ ${tokenizer} -eq 1 ]]; then
+        cmd="$cmd
+    --tokenizer"
+    fi
+
+echo -e "\033[34mRun command: \n${cmd} \033[0m"
+[[ $eval -eq 1 ]] && eval ${cmd}
+deactivate
--- a/egs/mustc/asr/conf/base.yaml
+++ b/egs/mustc/asr/conf/base.yaml
+train-subset: train_asr
+valid-subset: dev_asr
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/mustc/asr/conf/conformer.yaml
+++ b/egs/mustc/asr/conf/conformer.yaml
+arch: s2t_conformer_s
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
--- a/egs/mustc/asr/conf/ctc.yaml
+++ b/egs/mustc/asr/conf/ctc.yaml
+train-subset: train_asr
+valid-subset: dev_asr
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/mustc/asr/conf/dlcl.yaml
+++ b/egs/mustc/asr/conf/dlcl.yaml
+use-enc-dlcl: True
+use-dec-dlcl: True
--- a/egs/mustc/asr/conf/local_attn.yaml
+++ b/egs/mustc/asr/conf/local_attn.yaml
+encoder-attention-type: local
+hard-mask-window: 0
+gauss-mask-sigma: 3
+init-mask-weight: 0
\ No newline at end of file
--- a/egs/mustc/asr/conf/rpr.yaml
+++ b/egs/mustc/asr/conf/rpr.yaml
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
--- a/egs/mustc/asr/conf/tmp.yaml
+++ b/egs/mustc/asr/conf/tmp.yaml
+
+
--- a/egs/mustc/asr/decode.sh
+++ b/egs/mustc/asr/decode.sh
+#! /bin/bash
+
+gpu_num=1
+
+data_dir=
+test_subset=(tst-COMMON)
+
+exp_name=
+if [ "$#" -eq 1 ]; then
+    exp_name=$1
+fi
+
+n_average=10
+beam_size=5
+len_penalty=1.0
+max_tokens=10000
+dec_model=checkpoint_best.pt
+
+cmd="./run.sh
+    --stage 2
+    --stop_stage 2
+    --gpu_num ${gpu_num}
+    --exp_name ${exp_name}
+    --n_average ${n_average}
+    --beam_size ${beam_size}
+    --len_penalty ${len_penalty}
+    --max_tokens ${max_tokens}
+    --dec_model ${dec_model}
+    "
+
+if [[ -n ${data_dir} ]]; then
+    cmd="$cmd --data_dir ${data_dir}"
+fi
+if [[ ${#test_subset[@]} -eq 0 ]]; then
+    subsets=$(echo ${test_subset[*]} | sed 's/ /,/g')
+    cmd="$cmd --test_subset ${subsets}"
+fi
+
+echo $cmd
+eval $cmd
--- a/egs/mustc/asr/local/monitor.sh
+++ b/egs/mustc/asr/local/monitor.sh
+gpu_num=1
+cmd="sh train.sh"
+
+while :
+do
+    record=$(mktemp -t temp.record.XXXXXX)
+    gpustat > $record
+    all_devices=$(seq 0 "$(sed '1,2d' ${record} | wc -l)");
+
+    count=0
+    for dev in "${all_devices[@]}"
+    do
+        line=$((dev + 2))
+        use=$(head -n $line ${record} | tail -1 | cut -d '|' -f3 | cut -d '/' -f1)
+
+        if [[ $use -lt 100 ]]; then
+            device[$count]=$dev
+            count=$((count + 1))
+            if [[ $count -eq $gpu_num ]]; then
+                break
+            fi
+        fi
+    done
+    if [[ ${#device[@]} -lt $gpu_num ]]; then
+        sleep 60s
+    else
+        echo "Run $cmd"
+        eval $cmd
+        sleep 10s
+        exit
+    fi
+done
--- a/egs/mustc/asr/local/parse_options.sh
+++ b/egs/mustc/asr/local/parse_options.sh
+#!/usr/bin/env bash
+
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey);
+#                 Arnab Ghoshal, Karel Vesely
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Parse command-line options.
+# To be sourced by another script (as in ". parse_options.sh").
+# Option format is: --option-name arg
+# and shell variable "option_name" gets set to value "arg."
+# The exception is --help, which takes no arguments, but prints the
+# $help_message variable (if defined).
+
+
+###
+### The --config file options have lower priority to command line
+### options, so we need to import them first...
+###
+
+# Now import all the configs specified by command-line, in left-to-right order
+for ((argpos=1; argpos<$#; argpos++)); do
+  if [ "${!argpos}" == "--config" ]; then
+    argpos_plus1=$((argpos+1))
+    config=${!argpos_plus1}
+    [ ! -r $config ] && echo "$0: missing config '$config'" && exit 1
+    . $config  # source the config file.
+  fi
+done
+
+
+###
+### Now we process the command line options
+###
+while true; do
+  [ -z "${1:-}" ] && break;  # break if there are no arguments
+  case "$1" in
+    # If the enclosing script is called with --help option, print the help
+    # message and exit.  Scripts should put help messages in $help_message
+    --help|-h) if [ -z "$help_message" ]; then echo "No help found." 1>&2;
+      else printf "$help_message\n" 1>&2 ; fi;
+      exit 0 ;;
+    --*=*) echo "$0: options to scripts must be of the form --name value, got '$1'"
+      exit 1 ;;
+    # If the first command-line argument begins with "--" (e.g. --foo-bar),
+    # then work out the variable name as $name, which will equal "foo_bar".
+    --*) name=`echo "$1" | sed s/^--// | sed s/-/_/g`;
+      # Next we test whether the variable in question is undefned-- if so it's
+      # an invalid option and we die.  Note: $0 evaluates to the name of the
+      # enclosing script.
+      # The test [ -z ${foo_bar+xxx} ] will return true if the variable foo_bar
+      # is undefined.  We then have to wrap this test inside "eval" because
+      # foo_bar is itself inside a variable ($name).
+      eval '[ -z "${'$name'+xxx}" ]' && echo "$0: invalid option $1" 1>&2 && exit 1;
+
+      oldval="`eval echo \\$$name`";
+      # Work out whether we seem to be expecting a Boolean argument.
+      if [ "$oldval" == "true" ] || [ "$oldval" == "false" ]; then
+        was_bool=true;
+      else
+        was_bool=false;
+      fi
+
+      # Set the variable to the right value-- the escaped quotes make it work if
+      # the option had spaces, like --cmd "queue.pl -sync y"
+      eval $name=\"$2\";
+
+      # Check that Boolean-valued arguments are really Boolean.
+      if $was_bool && [[ "$2" != "true" && "$2" != "false" ]]; then
+        echo "$0: expected \"true\" or \"false\": $1 $2" 1>&2
+        exit 1;
+      fi
+      shift 2;
+      ;;
+  *) break;
+  esac
+done
+
+
+# Check for an empty argument to the --cmd option, which can easily occur as a
+# result of scripting errors.
+[ ! -z "${cmd+xxx}" ] && [ -z "$cmd" ] && echo "$0: empty argument to --cmd option" 1>&2 && exit 1;
+
+
+true; # so this script returns exit code 0.
--- a/egs/mustc/asr/local/utils.sh
+++ b/egs/mustc/asr/local/utils.sh
+
+get_devices(){
+    gpu_num=$1
+    use_cpu=$2
+    device=()
+    while :
+    do
+        record=$(mktemp -t temp.record.XXXXXX)
+        gpustat > $record
+        all_devices=$(seq 0 "$(sed '1,2d' ${record} | wc -l)");
+
+        count=0
+        for dev in "${all_devices[@]}"
+        do
+            line=$((dev + 2))
+            use=$(head -n $line ${record} | tail -1 | cut -d '|' -f3 | cut -d '/' -f1)
+            if [[ $use -lt 100 ]]; then
+                device[$count]=$dev
+                count=$((count + 1))
+                if [[ $count -eq $gpu_num ]]; then
+                    break
+                fi
+            fi
+        done
+        if [[ ${#device[@]} -lt $gpu_num ]]; then
+            if [[ $use_cpu -eq 1 ]]; then
+                device=(-1)
+            else
+                sleep 60s
+            fi
+        else
+            break
+        fi
+    done
+
+    echo ${device[*]} | sed 's/ /,/g'
+    return $?
+}
+
+
--- a/egs/mustc/asr/run.sh
+++ b/egs/mustc/asr/run.sh
+#! /bin/bash
+
+# Processing MuST-C Datasets
+
+# Copyright 2021 Natural Language Processing Laboratory 
+# Xu Chen (xuchenneu@163.com)
+
+# Set bash to 'debug' mode, it will exit on :
+# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
+set -e
+#set -u
+set -o pipefail
+export PYTHONIOENCODING=UTF-8
+
+eval=1
+time=$(date "+%m%d_%H%M")
+
+stage=0
+stop_stage=0
+
+######## hardware ########
+# devices
+#device=()
+gpu_num=8
+update_freq=1
+
+root_dir=~/st/Fairseq-S2T
+pwd_dir=$PWD
+
+# dataset
+src_lang=en
+tgt_lang=de
+lang=${src_lang}-${tgt_lang}
+
+dataset=mustc
+task=speech_to_text
+vocab_type=unigram
+vocab_size=5000
+speed_perturb=0
+lcrm=0
+tokenizer=0
+
+use_specific_dict=0
+specific_prefix=valid
+specific_dir=/home/xuchen/st/data/mustc/st_lcrm/en-de
+asr_vocab_prefix=spm_unigram10000_st_share
+
+org_data_dir=~/st/data/${dataset}
+data_dir=~/st/data/${dataset}/asr
+test_subset=tst-COMMON
+
+# exp
+exp_prefix=$(date "+%m%d")
+extra_tag=
+extra_parameter=
+exp_tag=baseline
+exp_name=
+
+# config
+train_config=ctc
+data_config=config_asr.yaml
+
+# training setting
+fp16=1
+max_tokens=40000
+step_valid=0
+
+# decoding setting
+dec_model=checkpoint_best.pt
+n_average=10
+beam_size=5
+len_penalty=1.0
+
+if [[ ${speed_perturb} -eq 1 ]]; then
+    data_dir=${data_dir}_sp
+    exp_prefix=${exp_prefix}_sp
+fi
+if [[ ${lcrm} -eq 1 ]]; then
+    data_dir=${data_dir}_lcrm
+    exp_prefix=${exp_prefix}_lcrm
+fi
+if [[ ${use_specific_dict} -eq 1 ]]; then
+    data_dir=${data_dir}_${specific_prefix}
+    exp_prefix=${exp_prefix}_${specific_prefix}
+fi
+if [[ ${tokenizer} -eq 1 ]]; then
+    data_dir=${data_dir}_tok
+    exp_prefix=${exp_prefix}_tok
+fi
+
+. ./local/parse_options.sh || exit 1;
+
+if [[ -z ${exp_name} ]]; then
+#    exp_name=${exp_prefix}_$(basename ${train_config%.*})_${exp_tag}
+    exp_name=${exp_prefix}_${train_config}_${exp_tag}
+    if [[ -n ${extra_tag} ]]; then
+        exp_name=${exp_name}_${extra_tag}
+    fi
+fi
+model_dir=$root_dir/../checkpoints/$dataset/asr/${exp_name}
+
+if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then
+    echo "stage -1: Data Download"
+    # pass
+fi
+
+if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
+    ### Task dependent. You have to make data the following preparation part by yourself.
+    ### But you can utilize Kaldi recipes in most cases
+    echo "stage 0: ASR Data Preparation"
+    if [[ ! -e ${data_dir}/${lang} ]]; then
+        mkdir -p ${data_dir}/${lang}
+    fi
+
+    cmd="python ${root_dir}/examples/speech_to_text/prep_mustc_data.py
+        --data-root ${org_data_dir}
+        --output-root ${data_dir}
+        --task asr
+        --vocab-type ${vocab_type}
+        --vocab-size ${vocab_size}"
+
+    if [[ ${use_specific_dict} -eq 1 ]]; then
+        cp -r ${specific_dir}/${asr_vocab_prefix}.* ${data_dir}/${lang}
+        cmd="$cmd
+        --asr-prefix ${asr_vocab_prefix}"
+    fi
+    if [[ ${speed_perturb} -eq 1 ]]; then
+        cmd="$cmd
+        --speed-perturb"
+    fi
+    if [[ ${lcrm} -eq 1 ]]; then
+        cmd="$cmd
+        --lowercase-src
+        --rm-punc-src"
+    fi
+    if [[ ${tokenizer} -eq 1 ]]; then
+        cmd="$cmd
+        --tokenizer"
+    fi
+
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    [[ $eval -eq 1 ]] && eval ${cmd}
+fi
+
+data_dir=${data_dir}/${lang}
+
+if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
+    echo "stage 1: ASR Network Training"
+    [[ ! -d ${data_dir} ]] && echo "The data dir ${data_dir} is not existing!" && exit 1;
+
+    if [[ -z ${device} || ${#device[@]} -eq 0 ]]; then
+		if [[ ${gpu_num} -eq 0 ]]; then
+			device=""
+		else
+        	source ./local/utils.sh
+        	device=$(get_devices $gpu_num 0)
+		fi
+    fi
+
+    echo -e "dev=${device} data=${data_dir} model=${model_dir}"
+
+    if [[ ! -d ${model_dir} ]]; then
+        mkdir -p ${model_dir}
+    else
+        echo "${model_dir} exists."
+    fi
+
+    cp ${BASH_SOURCE[0]} ${model_dir}
+    cp ${PWD}/train.sh ${model_dir}
+
+    config_list="${train_config//,/ }"
+    idx=0
+    for config in "${config_list[@]}"
+    do
+        config_path=$pwd_dir/conf/${config}.yaml
+        if [[ ! -f ${config_path} ]]; then
+            echo "No config file ${config_path}"
+            exit
+        fi
+        cp ${config_path} ${model_dir}
+
+        if [[ idx -eq 0 ]]; then
+            extra_parameter="${extra_parameter}
+        --train-config ${config_path}"
+        else
+            extra_parameter="${extra_parameter}
+        --train-config${idx} ${config_path}"
+        fi
+        idx=$((idx + 1))
+    done
+
+    cmd="python3 -u ${root_dir}/fairseq_cli/train.py
+        ${data_dir}
+        --config-yaml ${data_config}
+        --task ${task}
+        --max-tokens ${max_tokens}
+        --skip-invalid-size-inputs-valid-test
+        --update-freq ${update_freq}
+        --log-interval 100
+        --save-dir ${model_dir}
+        --tensorboard-logdir ${model_dir}"
+
+	if [[ -n ${extra_parameter} ]]; then
+        cmd="${cmd}
+        ${extra_parameter}"
+    fi
+	if [[ ${gpu_num} -gt 0 ]]; then
+		cmd="${cmd}
+        --distributed-world-size $gpu_num
+        --ddp-backend no_c10d"
+	fi
+    if [[ $fp16 -eq 1 ]]; then
+        cmd="${cmd}
+        --fp16"
+    fi
+    if [[ $step_valid -eq 1 ]]; then
+        validate_interval=1
+        save_interval=1
+        keep_last_epochs=10
+        no_epoch_checkpoints=0
+        save_interval_updates=500
+        keep_interval_updates=10
+    else
+        validate_interval=1
+        keep_last_epochs=10
+    fi
+    if [[ -n $no_epoch_checkpoints && $no_epoch_checkpoints -eq 1 ]]; then
+        cmd="$cmd
+        --no-epoch-checkpoints"
+    fi
+    if [[ -n $validate_interval ]]; then
+        cmd="${cmd}
+        --validate-interval $validate_interval "
+    fi
+    if [[ -n $save_interval ]]; then
+        cmd="${cmd}
+        --save-interval $save_interval "
+    fi
+    if [[ -n $keep_last_epochs ]]; then
+        cmd="${cmd}
+        --keep-last-epochs $keep_last_epochs "
+    fi
+    if [[ -n $save_interval_updates ]]; then
+        cmd="${cmd}
+        --save-interval-updates $save_interval_updates"
+        if [[ -n $keep_interval_updates ]]; then
+        cmd="${cmd}
+        --keep-interval-updates $keep_interval_updates"
+        fi
+    fi
+
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+
+    # save info
+    log=./history.log
+    echo "${time} | ${device} | ${data_dir} | ${exp_name} | ${model_dir} " >> $log
+    tail -n 50 ${log} > tmp.log
+    mv tmp.log $log
+    export CUDA_VISIBLE_DEVICES=${device}
+
+    cmd="nohup ${cmd} >> ${model_dir}/train.log 2>&1 &"
+    if [[ $eval -eq 1 ]]; then
+		eval $cmd
+		sleep 2s
+		tail -n "$(wc -l ${model_dir}/train.log | awk '{print $1+1}')" -f ${model_dir}/train.log
+	fi
+fi
+wait
+
+if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
+    echo "stage 2: ASR Decoding"
+    if [[ ${n_average} -ne 1 ]]; then
+        # Average models
+		dec_model=avg_${n_average}_checkpoint.pt
+
+		cmd="python ${root_dir}/scripts/average_checkpoints.py
+        --inputs ${model_dir}
+        --num-epoch-checkpoints ${n_average}
+        --output ${model_dir}/${dec_model}"
+    	echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    	[[ $eval -eq 1 ]] && eval $cmd
+	else
+		dec_model=${dec_model}
+	fi
+
+    if [[ -z ${device} || ${#device[@]} -eq 0 ]]; then
+		if [[ ${gpu_num} -eq 0 ]]; then
+			device=""
+		else
+        	source ./local/utils.sh
+        	device=$(get_devices $gpu_num 0)
+		fi
+    fi
+    export CUDA_VISIBLE_DEVICES=${device}
+
+	result_file=${model_dir}/decode_result
+	[[ -f ${result_file} ]] && rm ${result_file}
+
+    test_subset=${test_subset//,/ }
+	for subset in "${test_subset[@]}"; do
+        subset=${subset}_asr
+  		cmd="python ${root_dir}/fairseq_cli/generate.py
+        ${data_dir}
+        --config-yaml ${data_config}
+        --gen-subset ${subset}
+        --task speech_to_text
+        --path ${model_dir}/${dec_model}
+        --results-path ${model_dir}
+        --max-tokens ${max_tokens}
+        --beam ${beam_size}
+        --lenpen ${len_penalty}
+        --scoring wer
+        --wer-tokenizer 13a
+        --wer-lowercase
+        --wer-remove-punct
+        "
+    	echo -e "\033[34mRun command: \n${cmd} \033[0m"
+
+        if [[ $eval -eq 1 ]]; then
+    	    eval $cmd
+    	    tail -n 1 ${model_dir}/generate-${subset}.txt >> ${result_file}
+        fi
+	done
+    cat ${result_file}
+fi
--- a/egs/mustc/asr/train.sh
+++ b/egs/mustc/asr/train.sh
+#! /bin/bash
+
+# training the model
+
+gpu_num=8
+update_freq=1
+max_tokens=40000
+
+exp_tag=
+config_list=(ctc local_attn)
+
+# exp full name
+exp_name=
+
+extra_tag=
+extra_parameter=
+#extra_tag="${extra_tag}"
+#extra_parameter="${extra_parameter} "
+
+train_config=$(echo ${config_list[*]} | sed 's/ /,/g')
+
+cmd="./run.sh
+    --stage 1
+    --stop_stage 1
+    --gpu_num ${gpu_num}
+    --update_freq ${update_freq}
+    --train_config ${train_config}
+    --max_tokens ${max_tokens}
+    "
+
+if [[ -n ${exp_name} ]]; then
+    cmd="$cmd --exp_name ${exp_name}"
+fi
+if [[ -n ${exp_tag} ]]; then
+    cmd="$cmd --exp_tag ${exp_tag}"
+fi
+if [[ -n ${extra_tag} ]]; then
+    cmd="$cmd --extra_tag ${extra_tag}"
+fi
+if [[ -n ${extra_parameter} ]]; then
+#    cmd="$cmd --extra_parameter \"${extra_parameter}\""
+    cmd="$cmd --extra_parameter ${extra_parameter}"
+fi
+
+echo ${cmd}
+eval ${cmd}
--- a/egs/mustc/mt/binary.sh
+++ b/egs/mustc/mt/binary.sh
+set -e
+
+eval=1
+
+root_dir=~/st/Fairseq-S2T
+data_dir=/home/xuchen/st/data/wmt/test
+vocab_dir=/home/xuchen/st/data/wmt/mt/en-de/unigram32000_share
+src_vocab_prefix=spm_unigram32000_share
+tgt_vocab_prefix=spm_unigram32000_share
+
+src_lang=en
+tgt_lang=de
+tokenize=1
+splits=(newstest2014 newstest2016)
+
+for split in ${splits[@]}; do
+    src_file=${data_dir}/${split}.${src_lang}
+    tgt_file=${data_dir}/${split}.${tgt_lang}
+
+    if [[ ${tokenize} -eq 1 ]]; then
+        cmd="tokenizer.perl -l ${src_lang} --threads 8 -no-escape < ${src_file} > ${src_file}.tok"
+        echo -e "\033[34mRun command: \n${cmd} \033[0m"
+        [[ $eval -eq 1 ]] && eval ${cmd}
+
+        cmd="tokenizer.perl -l ${tgt_lang} --threads 8 -no-escape < ${tgt_file} > ${tgt_file}.tok"
+        echo -e "\033[34mRun command: \n${cmd} \033[0m"
+        [[ $eval -eq 1 ]] && eval ${cmd}
+        src_file=${src_file}.tok
+        tgt_file=${tgt_file}.tok
+    fi
+
+    cmd="cat ${src_file}"
+    if [[ ${lcrm} -eq 1 ]]; then
+        cmd="python local/lower_rm.py ${src_file}"
+    fi
+    cmd="${cmd}
+    | spm_encode --model ${vocab_dir}/${src_vocab_prefix}.model
+    --output_format=piece
+    > ${src_file}.spm"
+
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    [[ $eval -eq 1 ]] && eval ${cmd}
+
+    cmd="spm_encode
+    --model ${vocab_dir}/${tgt_vocab_prefix}.model
+    --output_format=piece
+    < ${tgt_file}
+    > ${tgt_file}.spm"
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    [[ $eval -eq 1 ]] && eval ${cmd}
+
+    src_file=${src_file}.spm
+    tgt_file=${tgt_file}.spm
+
+    mkdir -p ${data_dir}/final
+    cmd="cp ${src_file} ${data_dir}/final/${split}.${src_lang}"
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    [[ $eval -eq 1 ]] && eval ${cmd}
+
+    cmd="cp ${tgt_file} ${data_dir}/final/${split}.${tgt_lang}"
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+    [[ $eval -eq 1 ]] && eval ${cmd}
+done
+
+n_set=${#splits[*]}
+for ((i=0;i<$n_set;i++)); do
+    dataset[$i]=${data_dir}/final/${splits[$i]}
+done
+pref=`echo ${dataset[*]} | sed 's/ /,/g'`
+
+cmd="python ${root_dir}/fairseq_cli/preprocess.py
+    --source-lang ${src_lang}
+    --target-lang ${tgt_lang}
+    --testpref ${pref}
+    --destdir ${data_dir}/data-bin
+    --srcdict ${vocab_dir}/${src_vocab_prefix}.txt
+    --tgtdict ${vocab_dir}/${tgt_vocab_prefix}.txt
+    --workers 64"
+
+echo -e "\033[34mRun command: \n${cmd} \033[0m"
+[[ $eval -eq 1 ]] && eval ${cmd}
\ No newline at end of file
--- a/egs/mustc/mt/conf/train.yaml
+++ b/egs/mustc/mt/conf/train.yaml
+train-subset: train
+valid-subset: valid
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+skip-invalid-size-inputs-valid-test: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: transformer
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 8000
+lr: 1e-3
+adam_betas: (0.9,0.997)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+dropout: 0.1
+attention-dropout: 0.1
+activation-dropout: 0.1
+
+activation-fn: relu
+encoder-normalize-before: True
+decoder-normalize-before: True
+encoder-embed-dim: 512
+encoder-ffn-embed-dim: 2048
+encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 8
+
+decoder-embed-dim: 512
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 8
--- a/egs/mustc/mt/conf/train_dlcl.yaml
+++ b/egs/mustc/mt/conf/train_dlcl.yaml
+train-subset: train
+valid-subset: valid
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+skip-invalid-size-inputs-valid-test: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: dlcl_transformer
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 8000
+lr: 1e-3
+adam_betas: (0.9,0.997)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+dropout: 0.1
+attention-dropout: 0.1
+activation-dropout: 0.1
+
+activation-fn: relu
+encoder-normalize-before: True
+decoder-normalize-before: True
+encoder-embed-dim: 512
+encoder-ffn-embed-dim: 2048
+encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 8
+
+decoder-embed-dim: 512
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 8
+
+use-enc-dlcl: True
+use-dec-dlcl: True
\ No newline at end of file
--- a/egs/mustc/mt/conf/train_dlcl_rpr.yaml
+++ b/egs/mustc/mt/conf/train_dlcl_rpr.yaml
+train-subset: train
+valid-subset: valid
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+skip-invalid-size-inputs-valid-test: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: dlcl_transformer
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 8000
+lr: 1e-3
+adam_betas: (0.9,0.997)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+dropout: 0.1
+attention-dropout: 0.1
+activation-dropout: 0.1
+
+activation-fn: relu
+encoder-normalize-before: True
+decoder-normalize-before: True
+encoder-embed-dim: 512
+encoder-ffn-embed-dim: 2048
+encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 8
+
+decoder-embed-dim: 512
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 8
+
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 20
+max-decoder-relative-length: 20
+
+use-enc-dlcl: True
+use-dec-dlcl: True
--- a/egs/mustc/mt/conf/train_rpr.yaml
+++ b/egs/mustc/mt/conf/train_rpr.yaml
+train-subset: train
+valid-subset: valid
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+skip-invalid-size-inputs-valid-test: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: transformer
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 8000
+lr: 1e-3
+adam_betas: (0.9,0.997)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+dropout: 0.1
+attention-dropout: 0.1
+activation-dropout: 0.1
+
+activation-fn: relu
+encoder-normalize-before: True
+decoder-normalize-before: True
+encoder-embed-dim: 512
+encoder-ffn-embed-dim: 2048
+encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 8
+
+decoder-embed-dim: 512
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 8
+
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 20
+max-decoder-relative-length: 20
--- a/egs/mustc/mt/conf/train_s.yaml
+++ b/egs/mustc/mt/conf/train_s.yaml
+train-subset: train
+valid-subset: valid
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+skip-invalid-size-inputs-valid-test: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: transformer
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 8000
+lr: 1e-3
+adam_betas: (0.9,0.997)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+dropout: 0.1
+attention-dropout: 0.1
+activation-dropout: 0.1
+
+activation-fn: relu
+encoder-normalize-before: True
+decoder-normalize-before: True
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
--- a/egs/mustc/mt/decode.sh
+++ b/egs/mustc/mt/decode.sh
+#! /bin/bash
+
+gpu_num=1
+
+data_dir=
+test_subset=(test)
+
+exp_name=
+if [ "$#" -eq 1 ]; then
+    exp_name=$1
+fi
+
+n_average=10
+beam_size=5
+len_penalty=1.0
+max_tokens=10000
+dec_model=checkpoint_best.pt
+
+cmd="./run.sh
+    --stage 2
+    --stop_stage 2
+    --gpu_num ${gpu_num}
+    --exp_name ${exp_name}
+    --n_average ${n_average}
+    --beam_size ${beam_size}
+    --len_penalty ${len_penalty}
+    --max_tokens ${max_tokens}
+    --dec_model ${dec_model}
+    "
+
+if [[ -n ${data_dir} ]]; then
+    cmd="$cmd --data_dir ${data_dir}"
+fi
+if [[ -n ${test_subset} ]]; then
+    test_subset=`echo ${test_subset[*]} | sed 's/ /,/g'`
+    cmd="$cmd --test_subset ${test_subset}"
+fi
+
+echo $cmd
+eval $cmd
--- a/egs/mustc/mt/local/lower_rm.py
+++ b/egs/mustc/mt/local/lower_rm.py
+import sys
+import string
+
+
+in_file = sys.argv[1]
+
+with open(in_file, "r", encoding="utf-8") as f:
+    for line in f.readlines():
+        line = line.strip().lower()
+        for w in string.punctuation:
+            line = line.replace(w, "")
+        line = line.replace("  ", "")
+        print(line)
+
--- a/egs/mustc/mt/local/monitor.sh
+++ b/egs/mustc/mt/local/monitor.sh
+gpu_num=1
+
+while :
+do
+    all_devices=$(seq 0 `gpustat | sed '1,2d' | wc -l`);
+    count=0
+    for dev in ${all_devices[@]}
+    do
+        line=`expr $dev + 2`
+        use=`gpustat -p | head -n $line | tail -1 | cut -d '|' -f4 | wc -w`
+        if [[ $use -eq 0 ]]; then
+            device[$count]=$dev
+            count=`expr $count + 1`
+            if [[ $count -eq $gpu_num ]]; then
+                break
+            fi
+        fi
+    done
+    if [[ ${#device[@]} -lt $gpu_num ]]; then
+        sleep 60s
+    else
+        echo "Run $cmd"
+        eval $cmd
+        sleep 10s
+        exit
+    fi
+done
--- a/egs/mustc/mt/local/parse_options.sh
+++ b/egs/mustc/mt/local/parse_options.sh
+#!/usr/bin/env bash
+
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey);
+#                 Arnab Ghoshal, Karel Vesely
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Parse command-line options.
+# To be sourced by another script (as in ". parse_options.sh").
+# Option format is: --option-name arg
+# and shell variable "option_name" gets set to value "arg."
+# The exception is --help, which takes no arguments, but prints the
+# $help_message variable (if defined).
+
+
+###
+### The --config file options have lower priority to command line
+### options, so we need to import them first...
+###
+
+# Now import all the configs specified by command-line, in left-to-right order
+for ((argpos=1; argpos<$#; argpos++)); do
+  if [ "${!argpos}" == "--config" ]; then
+    argpos_plus1=$((argpos+1))
+    config=${!argpos_plus1}
+    [ ! -r $config ] && echo "$0: missing config '$config'" && exit 1
+    . $config  # source the config file.
+  fi
+done
+
+
+###
+### Now we process the command line options
+###
+while true; do
+  [ -z "${1:-}" ] && break;  # break if there are no arguments
+  case "$1" in
+    # If the enclosing script is called with --help option, print the help
+    # message and exit.  Scripts should put help messages in $help_message
+    --help|-h) if [ -z "$help_message" ]; then echo "No help found." 1>&2;
+      else printf "$help_message\n" 1>&2 ; fi;
+      exit 0 ;;
+    --*=*) echo "$0: options to scripts must be of the form --name value, got '$1'"
+      exit 1 ;;
+    # If the first command-line argument begins with "--" (e.g. --foo-bar),
+    # then work out the variable name as $name, which will equal "foo_bar".
+    --*) name=`echo "$1" | sed s/^--// | sed s/-/_/g`;
+      # Next we test whether the variable in question is undefned-- if so it's
+      # an invalid option and we die.  Note: $0 evaluates to the name of the
+      # enclosing script.
+      # The test [ -z ${foo_bar+xxx} ] will return true if the variable foo_bar
+      # is undefined.  We then have to wrap this test inside "eval" because
+      # foo_bar is itself inside a variable ($name).
+      eval '[ -z "${'$name'+xxx}" ]' && echo "$0: invalid option $1" 1>&2 && exit 1;
+
+      oldval="`eval echo \\$$name`";
+      # Work out whether we seem to be expecting a Boolean argument.
+      if [ "$oldval" == "true" ] || [ "$oldval" == "false" ]; then
+        was_bool=true;
+      else
+        was_bool=false;
+      fi
+
+      # Set the variable to the right value-- the escaped quotes make it work if
+      # the option had spaces, like --cmd "queue.pl -sync y"
+      eval $name=\"$2\";
+
+      # Check that Boolean-valued arguments are really Boolean.
+      if $was_bool && [[ "$2" != "true" && "$2" != "false" ]]; then
+        echo "$0: expected \"true\" or \"false\": $1 $2" 1>&2
+        exit 1;
+      fi
+      shift 2;
+      ;;
+  *) break;
+  esac
+done
+
+
+# Check for an empty argument to the --cmd option, which can easily occur as a
+# result of scripting errors.
+[ ! -z "${cmd+xxx}" ] && [ -z "$cmd" ] && echo "$0: empty argument to --cmd option" 1>&2 && exit 1;
+
+
+true; # so this script returns exit code 0.
--- a/egs/mustc/mt/local/path.sh
+++ b/egs/mustc/mt/local/path.sh
+MAIN_ROOT=$PWD/../../..
+KALDI_ROOT=$MAIN_ROOT/tools/kaldi
+
+export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PATH
+[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1
+. $KALDI_ROOT/tools/config/common_path.sh
+export LC_ALL=C
+
+export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$MAIN_ROOT/src/lib
+export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$MAIN_ROOT/tools/chainer_ctc/ext/warp-ctc/build
+. "${MAIN_ROOT}"/tools/activate_python.sh && . "${MAIN_ROOT}"/tools/extra_path.sh
+export PATH=$MAIN_ROOT/utils:$MAIN_ROOT/espnet/bin:$PATH
+
+export OMP_NUM_THREADS=1
+
+# check extra module installation
+if ! which tokenizer.perl > /dev/null; then
+    echo "Error: it seems that moses is not installed." >&2
+    echo "Error: please install moses as follows." >&2
+    echo "Error: cd ${MAIN_ROOT}/tools && make moses.done" >&2
+    return 1
+fi
+
+# NOTE(kan-bayashi): Use UTF-8 in Python to avoid UnicodeDecodeError when LC_ALL=C
+export PYTHONIOENCODING=UTF-8
--- a/egs/mustc/mt/local/utils.sh
+++ b/egs/mustc/mt/local/utils.sh
+
+get_devices(){
+    gpu_num=$1
+    use_cpu=$2
+    device=()
+    while :
+    do
+        record=`mktemp -t temp.record.XXXXXX`
+        gpustat > $record
+        all_devices=$(seq 0 `cat $record | sed '1,2d' | wc -l`);
+        count=0
+        for dev in ${all_devices[@]}
+        do
+            line=`expr $dev + 2`
+            use=`cat $record | head -n $line | tail -1 | cut -d '|' -f3 | cut -d '/' -f1`
+            if [[ $use -lt 100 ]]; then
+                device[$count]=$dev
+                count=`expr $count + 1`
+                if [[ $count -eq $gpu_num ]]; then
+                    break
+                fi
+            fi
+        done
+        if [[ ${#device[@]} -lt $gpu_num ]]; then
+            if [[ $use_cpu -eq 1 ]]; then
+                device=(-1)
+            else
+                sleep 60s
+            fi
+        else
+            break
+        fi
+    done
+
+    echo ${device[*]} | sed 's/ /,/g'
+    return $?
+}
+
+
--- a/egs/mustc/mt/run.sh
+++ b/egs/mustc/mt/run.sh
--- a/egs/mustc/mt/train.sh
+++ b/egs/mustc/mt/train.sh
+#! /bin/bash
+
+# training the model
+
+gpu_num=1
+update_freq=1
+max_tokens=4096
+
+extra_tag=
+extra_parameter=
+
+#extra_tag="${extra_tag}"
+#extra_parameter="${extra_parameter} "
+
+exp_tag=baseline
+train_config=train.yaml
+
+cmd="./run.sh
+    --stage 1
+    --stop_stage 1
+    --gpu_num ${gpu_num}
+    --update_freq ${update_freq}
+    --train_config ${train_config}
+    --max_tokens ${max_tokens}
+    "
+
+if [[ -n ${exp_tag} ]]; then
+    cmd="$cmd --exp_tag ${exp_tag}"
+fi
+if [[ -n ${extra_tag} ]]; then
+    cmd="$cmd --extra_tag ${extra_tag}"
+fi
+if [[ -n ${extra_parameter} ]]; then
+    cmd="$cmd --extra_parameter \"${extra_parameter}\""
+fi
+
+echo $cmd
+eval $cmd
--- a/egs/mustc/st/binary.sh
+++ b/egs/mustc/st/binary.sh
+set -e
+
+eval=1
+
+lcrm=1
+tokenizer=0
+
+root_dir=~/st/Fairseq-S2T
+data_dir=/home/xuchen/st/data/test
+vocab_dir=/home/xuchen/st/data/mustc/st_lcrm/en-de
+asr_vocab_prefix=spm_unigram10000_st_share
+st_vocab_prefix=spm_unigram10000_st_share
+
+src_lang=en
+tgt_lang=de
+splits=(2019)
+
+splits=$(echo ${splits[*]} | sed 's/ /_/g')
+
+cp -r ${vocab_dir}/${asr_vocab_prefix}.* ${data_dir}/${src_lang}-${tgt_lang}
+cp -r ${vocab_dir}/${st_vocab_prefix}.* ${data_dir}/${src_lang}-${tgt_lang}
+rm -rf ${data_dir}/${src_lang}-${tgt_lang}/fbank80.zip
+
+cmd="python ${root_dir}/examples/speech_to_text/prep_st_data.py
+    --data-root ${data_dir}
+    --output-root ${data_dir}
+    --splits ${splits}
+    --task st
+    --src-lang ${src_lang}
+    --tgt-lang ${tgt_lang}
+    --add-src
+    --share
+    --asr-prefix ${asr_vocab_prefix}
+    --st-spm-prefix ${st_vocab_prefix}
+    --cmvn-type utterance"
+
+    if [[ ${lcrm} -eq 1 ]]; then
+        cmd="$cmd
+    --lowercase-src
+    --rm-punc-src"
+    fi
+    if [[ ${tokenizer} -eq 1 ]]; then
+        cmd="$cmd
+    --tokenizer"
+    fi
+
+echo -e "\033[34mRun command: \n${cmd} \033[0m"
+[[ $eval -eq 1 ]] && eval ${cmd}
--- a/egs/mustc/st/conf/base.yaml
+++ b/egs/mustc/st/conf/base.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+criterion: label_smoothed_cross_entropy
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/mustc/st/conf/conformer.yaml
+++ b/egs/mustc/st/conf/conformer.yaml
+arch: s2t_conformer_s
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
--- a/egs/mustc/st/conf/ctc.yaml
+++ b/egs/mustc/st/conf/ctc.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_transformer_s
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+decoder-layers: 6
+encoder-attention-heads: 4
+
+decoder-embed-dim: 256
+decoder-ffn-embed-dim: 2048
+decoder-attention-heads: 4
+attention-dropout: 0.1
+activation-dropout: 0.1
--- a/egs/mustc/st/conf/dlcl.yaml
+++ b/egs/mustc/st/conf/dlcl.yaml
+use-enc-dlcl: True
+use-dec-dlcl: True
--- a/egs/mustc/st/conf/local_attn.yaml
+++ b/egs/mustc/st/conf/local_attn.yaml
+encoder-attention-type: local
+hard-mask-window: 0
+gauss-mask-sigma: 3
+init-mask-weight: 0
\ No newline at end of file
--- a/egs/mustc/st/conf/rpr.yaml
+++ b/egs/mustc/st/conf/rpr.yaml
+encoder-attention-type: relative
+decoder-attention-type: relative
+max-encoder-relative-length: 100
+max-decoder-relative-length: 20
--- a/egs/mustc/st/conf/sate.yaml
+++ b/egs/mustc/st/conf/sate.yaml
+arch: s2t_sate_s
+acoustic-encoder: transformer
+adapter: league
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
--- a/egs/mustc/st/conf/sate_ctc.yaml
+++ b/egs/mustc/st/conf/sate_ctc.yaml
+train-subset: train_st
+valid-subset: dev_st
+
+max-epoch: 50
+max-update: 100000
+
+num-workers: 8
+patience: 10
+no-progress-bar: True
+log-interval: 100
+seed: 1
+report-accuracy: True
+
+#load-pretrained-encoder-from:
+#load-pretrained-acoustic-encoder-from:
+#load-pretrained-text-encoder-from:
+#load-pretrained-decoder-from:
+
+arch: s2t_sate
+share-decoder-input-output-embed: True
+optimizer: adam
+clip-norm: 10.0
+lr-scheduler: inverse_sqrt
+warmup-init-lr: 1e-7
+warmup-updates: 10000
+lr: 2e-3
+#adam_betas: (0.9,0.98)
+
+ctc-weight: 0.3
+criterion: label_smoothed_cross_entropy_with_ctc
+label_smoothing: 0.1
+
+encoder-normalize-before: True
+decoder-normalize-before: True
+conv-kernel-sizes: 5,5
+conv-channels: 1024
+dropout: 0.1
+activation-fn: relu
+encoder-embed-dim: 256
+encoder-ffn-embed-dim: 2048
+encoder-layers: 12
+text-encoder-layers: 6
+decoder-layers: 6
+encoder-attention-heads: 4
+
+macaron-style: True
+use-cnn-module: True
+cnn-module-kernel: 31
+
+acoustic-encoder: transformer
+adapter: league
+
+#decoder-embed-dim: 256
+#decoder-ffn-embed-dim: 2048
+#decoder-attention-heads: 4
+#attention-dropout: 0.1
+#activation-dropout: 0.1
--- a/egs/mustc/st/conf/tmp.yaml
+++ b/egs/mustc/st/conf/tmp.yaml
+
+
--- a/egs/mustc/st/decode.sh
+++ b/egs/mustc/st/decode.sh
+#! /bin/bash
+
+gpu_num=1
+
+data_dir=
+test_subset=(tst-COMMON)
+
+exp_name=
+if [ "$#" -eq 1 ]; then
+    exp_name=$1
+fi
+
+n_average=10
+beam_size=5
+len_penalty=1.0
+max_tokens=10000
+dec_model=checkpoint_best.pt
+
+cmd="./run.sh
+    --stage 2
+    --stop_stage 2
+    --gpu_num ${gpu_num}
+    --exp_name ${exp_name}
+    --n_average ${n_average}
+    --beam_size ${beam_size}
+    --len_penalty ${len_penalty}
+    --max_tokens ${max_tokens}
+    --dec_model ${dec_model}
+    "
+
+if [[ -n ${data_dir} ]]; then
+    cmd="$cmd --data_dir ${data_dir}"
+fi
+if [[ ${#test_subset[@]} -eq 0 ]]; then
+    subsets=$(echo ${test_subset[*]} | sed 's/ /,/g')
+    cmd="$cmd --test_subset ${subsets}"
+fi
+
+echo $cmd
+eval $cmd
--- a/egs/mustc/st/ensemble.sh
+++ b/egs/mustc/st/ensemble.sh
+set -e
+
+gpu_num=1
+root_dir=/home/xuchen/st/Fairseq-S2T
+ckpt=/home/xuchen/st/checkpoints/mustc-v2/st
+
+model_txt=$1
+set=$2
+test_subset=$3
+
+#data_dir=/home/xuchen/st/data/mustc-v2/st_lcrm/en-de
+#test_subset=(tst-COMMON)
+
+data_dir=/media/data/tst/$set/en-de
+#test_subset=(office)
+#test_subset=(webrtc1)
+#test_subset=(adap2)
+
+data_config=config_st_share.yaml
+result_file=./result
+
+beam_size=5
+lenpen=0.6
+max_tokens=10000
+
+models=()
+i=0
+for line in `cat $model_txt`; do
+    i=`expr $i + 1`
+    
+    model_dir=$ckpt/$line
+    [[ ! -d $model_dir ]] && echo $model_dir && exit 1;
+
+    if [[ -f $model_dir/avg_10_checkpoint.pt ]]; then
+        model=$model_dir/avg_10_checkpoint.pt
+    else
+        model=$model_dir/checkpoint_best.pt
+    fi
+    [[ ! -f $model ]] && echo $model && exit 1;
+
+    models[$i]=$model
+done
+
+models=`echo ${models[*]} | sed 's/ /:/g'`
+
+res_dir=$ckpt/ensemble/$set
+i=0
+while : 
+do
+    if [[ -d $res_dir/$i ]]; then
+        i=`expr $i + 1`
+    else
+        res_dir=$res_dir/$i
+        break
+    fi 
+done
+
+mkdir -p $res_dir
+cp $model_txt $res_dir
+
+
+if [[ -z ${device} || ${#device[@]} -eq 0 ]]; then
+    if [[ ${gpu_num} -eq 0 ]]; then
+        device=()
+    else
+        source ./local/utils.sh
+        device=$(get_devices $gpu_num 0)
+    fi
+fi
+export CUDA_VISIBLE_DEVICES=${device}
+
+for subset in ${test_subset[@]}; do
+    subset=${subset}_st
+    cmd="python ${root_dir}/fairseq_cli/generate.py
+    ${data_dir}
+    --config-yaml ${data_config}
+    --gen-subset ${subset}
+    --task speech_to_text
+    --path ${models}
+    --results-path ${res_dir}
+    --skip-invalid-size-inputs-valid-test
+    --max-tokens ${max_tokens}
+    --beam ${beam_size}
+    --lenpen ${lenpen}
+    --scoring sacrebleu"
+    echo -e "\033[34mRun command: \n${cmd} \033[0m"
+
+    eval $cmd
+    tail -n 1 ${res_dir}/generate-${subset}.txt
+
+    cd $res_dir
+    evaluate.sh translation-${subset}.txt $set
+    cd -
+done
+
--- a/egs/mustc/st/local/monitor.sh
+++ b/egs/mustc/st/local/monitor.sh
+gpu_num=1
+cmd="sh train.sh"
+
+while :
+do
+    record=$(mktemp -t temp.record.XXXXXX)
+    gpustat > $record
+    all_devices=$(seq 0 "$(sed '1,2d' ${record} | wc -l)");
+
+    count=0
+    for dev in "${all_devices[@]}"
+    do
+        line=$((dev + 2))
+        use=$(head -n $line ${record} | tail -1 | cut -d '|' -f3 | cut -d '/' -f1)
+
+        if [[ $use -lt 100 ]]; then
+            device[$count]=$dev
+            count=$((count + 1))
+            if [[ $count -eq $gpu_num ]]; then
+                break
+            fi
+        fi
+    done
+    if [[ ${#device[@]} -lt $gpu_num ]]; then
+        sleep 60s
+    else
+        echo "Run $cmd"
+        eval $cmd
+        sleep 10s
+        exit
+    fi
+done
--- a/egs/mustc/st/local/parse_options.sh
+++ b/egs/mustc/st/local/parse_options.sh
+#!/usr/bin/env bash
+
+# Copyright 2012  Johns Hopkins University (Author: Daniel Povey);
+#                 Arnab Ghoshal, Karel Vesely
+
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
+# WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
+# MERCHANTABLITY OR NON-INFRINGEMENT.
+# See the Apache 2 License for the specific language governing permissions and
+# limitations under the License.
+
+
+# Parse command-line options.
+# To be sourced by another script (as in ". parse_options.sh").
+# Option format is: --option-name arg
+# and shell variable "option_name" gets set to value "arg."
+# The exception is --help, which takes no arguments, but prints the
+# $help_message variable (if defined).
+
+
+###
+### The --config file options have lower priority to command line
+### options, so we need to import them first...
+###
+
+# Now import all the configs specified by command-line, in left-to-right order
+for ((argpos=1; argpos<$#; argpos++)); do
+  if [ "${!argpos}" == "--config" ]; then
+    argpos_plus1=$((argpos+1))
+    config=${!argpos_plus1}
+    [ ! -r $config ] && echo "$0: missing config '$config'" && exit 1
+    . $config  # source the config file.
+  fi
+done
+
+
+###
+### Now we process the command line options
+###
+while true; do
+  [ -z "${1:-}" ] && break;  # break if there are no arguments
+  case "$1" in
+    # If the enclosing script is called with --help option, print the help
+    # message and exit.  Scripts should put help messages in $help_message
+    --help|-h) if [ -z "$help_message" ]; then echo "No help found." 1>&2;
+      else printf "$help_message\n" 1>&2 ; fi;
+      exit 0 ;;
+    --*=*) echo "$0: options to scripts must be of the form --name value, got '$1'"
+      exit 1 ;;
+    # If the first command-line argument begins with "--" (e.g. --foo-bar),
+    # then work out the variable name as $name, which will equal "foo_bar".
+    --*) name=`echo "$1" | sed s/^--// | sed s/-/_/g`;
+      # Next we test whether the variable in question is undefned-- if so it's
+      # an invalid option and we die.  Note: $0 evaluates to the name of the
+      # enclosing script.
+      # The test [ -z ${foo_bar+xxx} ] will return true if the variable foo_bar
+      # is undefined.  We then have to wrap this test inside "eval" because
+      # foo_bar is itself inside a variable ($name).
+      eval '[ -z "${'$name'+xxx}" ]' && echo "$0: invalid option $1" 1>&2 && exit 1;
+
+      oldval="`eval echo \\$$name`";
+      # Work out whether we seem to be expecting a Boolean argument.
+      if [ "$oldval" == "true" ] || [ "$oldval" == "false" ]; then
+        was_bool=true;
+      else
+        was_bool=false;
+      fi
+
+      # Set the variable to the right value-- the escaped quotes make it work if
+      # the option had spaces, like --cmd "queue.pl -sync y"
+      eval $name=\"$2\";
+
+      # Check that Boolean-valued arguments are really Boolean.
+      if $was_bool && [[ "$2" != "true" && "$2" != "false" ]]; then
+        echo "$0: expected \"true\" or \"false\": $1 $2" 1>&2
+        exit 1;
+      fi
+      shift 2;
+      ;;
+  *) break;
+  esac
+done
+
+
+# Check for an empty argument to the --cmd option, which can easily occur as a
+# result of scripting errors.
+[ ! -z "${cmd+xxx}" ] && [ -z "$cmd" ] && echo "$0: empty argument to --cmd option" 1>&2 && exit 1;
+
+
+true; # so this script returns exit code 0.
--- a/egs/mustc/st/local/utils.sh
+++ b/egs/mustc/st/local/utils.sh
--- a/egs/mustc/st/run.sh
+++ b/egs/mustc/st/run.sh
--- a/egs/mustc/st/train.sh
+++ b/egs/mustc/st/train.sh
--- a/egs/templete/asr/conf/train.yaml
+++ b/egs/templete/asr/conf/train.yaml
--- a/egs/templete/asr/conf/train_ctc.yaml
+++ b/egs/templete/asr/conf/train_ctc.yaml
--- a/egs/templete/asr/conf/train_ctc_conformer.yaml
+++ b/egs/templete/asr/conf/train_ctc_conformer.yaml
--- a/egs/templete/asr/conf/train_ctc_conformer_rpr.yaml
+++ b/egs/templete/asr/conf/train_ctc_conformer_rpr.yaml
--- a/egs/templete/asr/conf/train_ctc_debug.yaml
+++ b/egs/templete/asr/conf/train_ctc_debug.yaml
--- a/egs/templete/asr/conf/train_ctc_rpr.yaml
+++ b/egs/templete/asr/conf/train_ctc_rpr.yaml
--- a/egs/templete/asr/conf/train_ctc_sate.yaml
+++ b/egs/templete/asr/conf/train_ctc_sate.yaml
--- a/egs/templete/asr/conf/train_ctc_sate_conformer.yaml
+++ b/egs/templete/asr/conf/train_ctc_sate_conformer.yaml
--- a/egs/templete/asr/conf/train_ctc_sate_conformer_rpr.yaml
+++ b/egs/templete/asr/conf/train_ctc_sate_conformer_rpr.yaml
--- a/egs/templete/asr/conf/train_ctc_sate_rpr.yaml
+++ b/egs/templete/asr/conf/train_ctc_sate_rpr.yaml
--- a/egs/templete/asr/conf/train_ctc_templete.yaml
+++ b/egs/templete/asr/conf/train_ctc_templete.yaml
--- a/egs/templete/asr/conf/train_sate.yaml
+++ b/egs/templete/asr/conf/train_sate.yaml
--- a/egs/templete/asr/conf/train_sate_rpr.yaml
+++ b/egs/templete/asr/conf/train_sate_rpr.yaml
--- a/egs/templete/asr/decode.sh
+++ b/egs/templete/asr/decode.sh
--- a/egs/templete/asr/local/monitor.sh
+++ b/egs/templete/asr/local/monitor.sh
--- a/egs/templete/asr/local/parse_options.sh
+++ b/egs/templete/asr/local/parse_options.sh
--- a/egs/templete/asr/local/path.sh
+++ b/egs/templete/asr/local/path.sh
--- a/egs/templete/asr/local/utils.sh
+++ b/egs/templete/asr/local/utils.sh
--- a/egs/templete/asr/run.sh
+++ b/egs/templete/asr/run.sh
--- a/egs/templete/asr/train.sh
+++ b/egs/templete/asr/train.sh
--- a/egs/templete/mt/binary.sh
+++ b/egs/templete/mt/binary.sh
--- a/egs/templete/mt/conf/train.yaml
+++ b/egs/templete/mt/conf/train.yaml
--- a/egs/templete/mt/conf/train_dlcl.yaml
+++ b/egs/templete/mt/conf/train_dlcl.yaml
--- a/egs/templete/mt/conf/train_dlcl_rpr.yaml
+++ b/egs/templete/mt/conf/train_dlcl_rpr.yaml
--- a/egs/templete/mt/conf/train_rpr.yaml
+++ b/egs/templete/mt/conf/train_rpr.yaml
--- a/egs/templete/mt/conf/train_s.yaml
+++ b/egs/templete/mt/conf/train_s.yaml
--- a/egs/templete/mt/decode.sh
+++ b/egs/templete/mt/decode.sh
--- a/egs/templete/mt/local/lower_rm.py
+++ b/egs/templete/mt/local/lower_rm.py
--- a/egs/templete/mt/local/monitor.sh
+++ b/egs/templete/mt/local/monitor.sh
--- a/egs/templete/mt/local/parse_options.sh
+++ b/egs/templete/mt/local/parse_options.sh
--- a/egs/templete/mt/local/path.sh
+++ b/egs/templete/mt/local/path.sh
--- a/egs/templete/mt/local/utils.sh
+++ b/egs/templete/mt/local/utils.sh
--- a/egs/templete/mt/run.sh
+++ b/egs/templete/mt/run.sh
--- a/egs/templete/mt/train.sh
+++ b/egs/templete/mt/train.sh
--- a/egs/templete/st/binary.sh
+++ b/egs/templete/st/binary.sh
--- a/egs/templete/st/conf/train.yaml
+++ b/egs/templete/st/conf/train.yaml
--- a/egs/templete/st/conf/train_ctc.yaml
+++ b/egs/templete/st/conf/train_ctc.yaml
--- a/egs/templete/st/conf/train_ctc_conformer.yaml
+++ b/egs/templete/st/conf/train_ctc_conformer.yaml
--- a/egs/templete/st/conf/train_ctc_conformer_rpr.yaml
+++ b/egs/templete/st/conf/train_ctc_conformer_rpr.yaml
--- a/egs/templete/st/conf/train_ctc_rpr.yaml
+++ b/egs/templete/st/conf/train_ctc_rpr.yaml
--- a/egs/templete/st/conf/train_ctc_sate.yaml
+++ b/egs/templete/st/conf/train_ctc_sate.yaml
--- a/egs/templete/st/conf/train_ctc_sate_conformer.yaml
+++ b/egs/templete/st/conf/train_ctc_sate_conformer.yaml
--- a/egs/templete/st/conf/train_ctc_sate_conformer_rpr.yaml
+++ b/egs/templete/st/conf/train_ctc_sate_conformer_rpr.yaml
--- a/egs/templete/st/conf/train_ctc_sate_rpr.yaml
+++ b/egs/templete/st/conf/train_ctc_sate_rpr.yaml
--- a/egs/templete/st/conf/train_ctc_templete.yaml
+++ b/egs/templete/st/conf/train_ctc_templete.yaml
--- a/egs/templete/st/conf/train_sate.yaml
+++ b/egs/templete/st/conf/train_sate.yaml
--- a/egs/templete/st/conf/train_sate_rpr.yaml
+++ b/egs/templete/st/conf/train_sate_rpr.yaml
--- a/egs/templete/st/decode.sh
+++ b/egs/templete/st/decode.sh
--- a/egs/templete/st/ensemble.sh
+++ b/egs/templete/st/ensemble.sh
--- a/egs/templete/st/local/monitor.sh
+++ b/egs/templete/st/local/monitor.sh
--- a/egs/templete/st/local/parse_options.sh
+++ b/egs/templete/st/local/parse_options.sh
--- a/egs/templete/st/local/path.sh
+++ b/egs/templete/st/local/path.sh
--- a/egs/templete/st/local/utils.sh
+++ b/egs/templete/st/local/utils.sh
--- a/egs/templete/st/run.sh
+++ b/egs/templete/st/run.sh
--- a/egs/templete/st/train.sh
+++ b/egs/templete/st/train.sh
--- a/examples/.gitignore
+++ b/examples/.gitignore
--- a/examples/__init__.py
+++ b/examples/__init__.py
--- a/examples/adaptive_span/README.md
+++ b/examples/adaptive_span/README.md
--- a/examples/adaptive_span/__init__.py
+++ b/examples/adaptive_span/__init__.py
--- a/examples/adaptive_span/adagrad_with_grad_clip.py
+++ b/examples/adaptive_span/adagrad_with_grad_clip.py
--- a/examples/adaptive_span/adaptive_span_attention.py
+++ b/examples/adaptive_span/adaptive_span_attention.py
--- a/examples/adaptive_span/adaptive_span_loss.py
+++ b/examples/adaptive_span/adaptive_span_loss.py
--- a/examples/adaptive_span/adaptive_span_model.py
+++ b/examples/adaptive_span/adaptive_span_model.py
--- a/examples/adaptive_span/adaptive_span_model_wrapper.py
+++ b/examples/adaptive_span/adaptive_span_model_wrapper.py
--- a/examples/adaptive_span/truncated_bptt_lm_task.py
+++ b/examples/adaptive_span/truncated_bptt_lm_task.py
--- a/examples/backtranslation/README.md
+++ b/examples/backtranslation/README.md
--- a/examples/backtranslation/deduplicate_lines.py
+++ b/examples/backtranslation/deduplicate_lines.py
--- a/examples/backtranslation/extract_bt_data.py
+++ b/examples/backtranslation/extract_bt_data.py
--- a/examples/backtranslation/prepare-de-monolingual.sh
+++ b/examples/backtranslation/prepare-de-monolingual.sh
--- a/examples/backtranslation/prepare-wmt18en2de.sh
+++ b/examples/backtranslation/prepare-wmt18en2de.sh
--- a/examples/backtranslation/sacrebleu.sh
+++ b/examples/backtranslation/sacrebleu.sh
--- a/examples/backtranslation/tokenized_bleu.sh
+++ b/examples/backtranslation/tokenized_bleu.sh
--- a/examples/bart/README.glue.md
+++ b/examples/bart/README.glue.md
--- a/examples/bart/README.md
+++ b/examples/bart/README.md
--- a/examples/bart/README.summarization.md
+++ b/examples/bart/README.summarization.md
--- a/examples/bart/summarize.py
+++ b/examples/bart/summarize.py
--- a/examples/byte_level_bpe/README.md
+++ b/examples/byte_level_bpe/README.md
--- a/examples/byte_level_bpe/get_bitext.py
+++ b/examples/byte_level_bpe/get_bitext.py
--- a/examples/byte_level_bpe/get_data.sh
+++ b/examples/byte_level_bpe/get_data.sh
--- a/examples/byte_level_bpe/gru_transformer.py
+++ b/examples/byte_level_bpe/gru_transformer.py
--- a/examples/camembert/README.md
+++ b/examples/camembert/README.md
--- a/examples/constrained_decoding/README.md
+++ b/examples/constrained_decoding/README.md
--- a/examples/constrained_decoding/normalize.py
+++ b/examples/constrained_decoding/normalize.py
--- a/examples/constrained_decoding/tok.py
+++ b/examples/constrained_decoding/tok.py
--- a/examples/conv_seq2seq/README.md
+++ b/examples/conv_seq2seq/README.md
--- a/examples/criss/README.md
+++ b/examples/criss/README.md
--- a/examples/criss/download_and_preprocess_flores_test.sh
+++ b/examples/criss/download_and_preprocess_flores_test.sh
--- a/examples/criss/download_and_preprocess_tatoeba.sh
+++ b/examples/criss/download_and_preprocess_tatoeba.sh
--- a/examples/criss/mining/mine.py
+++ b/examples/criss/mining/mine.py
--- a/examples/criss/mining/mine_example.sh
+++ b/examples/criss/mining/mine_example.sh
--- a/examples/criss/save_encoder.py
+++ b/examples/criss/save_encoder.py
--- a/examples/criss/sentence_retrieval/encoder_analysis.py
+++ b/examples/criss/sentence_retrieval/encoder_analysis.py
--- a/examples/criss/sentence_retrieval/sentence_retrieval_tatoeba.sh
+++ b/examples/criss/sentence_retrieval/sentence_retrieval_tatoeba.sh
--- a/examples/criss/unsupervised_mt/eval.sh
+++ b/examples/criss/unsupervised_mt/eval.sh
--- a/examples/cross_lingual_language_model/README.md
+++ b/examples/cross_lingual_language_model/README.md
--- a/examples/fast_noisy_channel/README.md
+++ b/examples/fast_noisy_channel/README.md
--- a/examples/fast_noisy_channel/__init__.py
+++ b/examples/fast_noisy_channel/__init__.py
--- a/examples/fast_noisy_channel/noisy_channel_beam_search.py
+++ b/examples/fast_noisy_channel/noisy_channel_beam_search.py
--- a/examples/fast_noisy_channel/noisy_channel_sequence_generator.py
+++ b/examples/fast_noisy_channel/noisy_channel_sequence_generator.py
--- a/examples/fast_noisy_channel/noisy_channel_translation.py
+++ b/examples/fast_noisy_channel/noisy_channel_translation.py
--- a/examples/gottbert/README.md
+++ b/examples/gottbert/README.md
--- a/examples/joint_alignment_translation/README.md
+++ b/examples/joint_alignment_translation/README.md
--- a/examples/joint_alignment_translation/prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh
+++ b/examples/joint_alignment_translation/prepare-wmt18en2de_no_norm_no_escape_no_agressive.sh
--- a/examples/language_model/README.adaptive_inputs.md
+++ b/examples/language_model/README.adaptive_inputs.md
--- a/examples/language_model/README.conv.md
+++ b/examples/language_model/README.conv.md
--- a/examples/language_model/README.md
+++ b/examples/language_model/README.md
--- a/examples/language_model/prepare-wikitext-103.sh
+++ b/examples/language_model/prepare-wikitext-103.sh
--- a/examples/laser/README.md
+++ b/examples/laser/README.md
--- a/examples/laser/laser_src/__init__.py
+++ b/examples/laser/laser_src/__init__.py
--- a/examples/laser/laser_src/laser_lstm.py
+++ b/examples/laser/laser_src/laser_lstm.py
--- a/examples/laser/laser_src/laser_task.py
+++ b/examples/laser/laser_src/laser_task.py
--- a/examples/laser/laser_src/laser_transformer.py
+++ b/examples/laser/laser_src/laser_transformer.py
--- a/examples/laser/laser_src/multitask_data_utils.py
+++ b/examples/laser/laser_src/multitask_data_utils.py
--- a/examples/latent_depth/README.md
+++ b/examples/latent_depth/README.md
--- a/examples/latent_depth/latent_depth_src/__init__.py
+++ b/examples/latent_depth/latent_depth_src/__init__.py
--- a/examples/latent_depth/latent_depth_src/loss/__init__.py
+++ b/examples/latent_depth/latent_depth_src/loss/__init__.py
--- a/examples/latent_depth/latent_depth_src/loss/latent_depth.py
+++ b/examples/latent_depth/latent_depth_src/loss/latent_depth.py
--- a/examples/latent_depth/latent_depth_src/models/__init__.py
+++ b/examples/latent_depth/latent_depth_src/models/__init__.py
--- a/examples/latent_depth/latent_depth_src/models/latent_multilingual_transformer.py
+++ b/examples/latent_depth/latent_depth_src/models/latent_multilingual_transformer.py
--- a/examples/latent_depth/latent_depth_src/models/latent_transformer.py
+++ b/examples/latent_depth/latent_depth_src/models/latent_transformer.py
--- a/examples/latent_depth/latent_depth_src/modules/__init__.py
+++ b/examples/latent_depth/latent_depth_src/modules/__init__.py
--- a/examples/latent_depth/latent_depth_src/modules/latent_layers.py
+++ b/examples/latent_depth/latent_depth_src/modules/latent_layers.py
--- a/examples/latent_depth/latent_depth_src/multilingual_translation_latent_depth.py
+++ b/examples/latent_depth/latent_depth_src/multilingual_translation_latent_depth.py
--- a/examples/layerdrop/README.md
+++ b/examples/layerdrop/README.md
--- a/examples/linformer/README.md
+++ b/examples/linformer/README.md
--- a/examples/linformer/linformer_src/__init__.py
+++ b/examples/linformer/linformer_src/__init__.py
--- a/examples/linformer/linformer_src/models/__init__.py
+++ b/examples/linformer/linformer_src/models/__init__.py
--- a/examples/linformer/linformer_src/models/linformer_roberta.py
+++ b/examples/linformer/linformer_src/models/linformer_roberta.py
--- a/examples/linformer/linformer_src/modules/__init__.py
+++ b/examples/linformer/linformer_src/modules/__init__.py
--- a/examples/linformer/linformer_src/modules/linformer_sentence_encoder.py
+++ b/examples/linformer/linformer_src/modules/linformer_sentence_encoder.py
--- a/examples/linformer/linformer_src/modules/linformer_sentence_encoder_layer.py
+++ b/examples/linformer/linformer_src/modules/linformer_sentence_encoder_layer.py
--- a/examples/linformer/linformer_src/modules/multihead_linear_attention.py
+++ b/examples/linformer/linformer_src/modules/multihead_linear_attention.py
--- a/examples/m2m_100/README.md
+++ b/examples/m2m_100/README.md
--- a/examples/m2m_100/install_dependecies.sh
+++ b/examples/m2m_100/install_dependecies.sh
--- a/examples/m2m_100/process_data/clean_histogram.py
+++ b/examples/m2m_100/process_data/clean_histogram.py
--- a/examples/m2m_100/process_data/dedup_data.py
+++ b/examples/m2m_100/process_data/dedup_data.py
--- a/examples/m2m_100/process_data/remove_too_much_punc.py
+++ b/examples/m2m_100/process_data/remove_too_much_punc.py
--- a/examples/m2m_100/tok.sh
+++ b/examples/m2m_100/tok.sh
--- a/examples/m2m_100/tokenizers/README.md
+++ b/examples/m2m_100/tokenizers/README.md
--- a/examples/m2m_100/tokenizers/seg_ja.sh
+++ b/examples/m2m_100/tokenizers/seg_ja.sh
--- a/examples/m2m_100/tokenizers/seg_ko.sh
+++ b/examples/m2m_100/tokenizers/seg_ko.sh
--- a/examples/m2m_100/tokenizers/thirdparty/.gitignore
+++ b/examples/m2m_100/tokenizers/thirdparty/.gitignore
--- a/examples/m2m_100/tokenizers/tokenize_indic.py
+++ b/examples/m2m_100/tokenizers/tokenize_indic.py
--- a/examples/m2m_100/tokenizers/tokenize_thai.py
+++ b/examples/m2m_100/tokenizers/tokenize_thai.py
--- a/examples/m2m_100/tokenizers/tokenize_zh.py
+++ b/examples/m2m_100/tokenizers/tokenize_zh.py
--- a/examples/m2m_100/tokenizers/tokenizer_ar.sh
+++ b/examples/m2m_100/tokenizers/tokenizer_ar.sh
--- a/examples/mbart/README.md
+++ b/examples/mbart/README.md
--- a/examples/megatron_11b/README.md
+++ b/examples/megatron_11b/README.md
--- a/examples/megatron_11b/detok.py
+++ b/examples/megatron_11b/detok.py
--- a/examples/multilingual/ML50_langs.txt
+++ b/examples/multilingual/ML50_langs.txt
--- a/examples/multilingual/README.md
+++ b/examples/multilingual/README.md
--- a/examples/multilingual/data_scripts/README.md
+++ b/examples/multilingual/data_scripts/README.md
--- a/examples/multilingual/data_scripts/binarize.py
+++ b/examples/multilingual/data_scripts/binarize.py
--- a/examples/multilingual/data_scripts/check_iswlt_test_data.py
+++ b/examples/multilingual/data_scripts/check_iswlt_test_data.py
--- a/examples/multilingual/data_scripts/check_self_overlaps.py
+++ b/examples/multilingual/data_scripts/check_self_overlaps.py
--- a/examples/multilingual/data_scripts/check_valid_test_overlaps.py
+++ b/examples/multilingual/data_scripts/check_valid_test_overlaps.py
--- a/examples/multilingual/data_scripts/dedup_all.py
+++ b/examples/multilingual/data_scripts/dedup_all.py
--- a/examples/multilingual/data_scripts/download_ML50_v1.sh
+++ b/examples/multilingual/data_scripts/download_ML50_v1.sh
--- a/examples/multilingual/data_scripts/download_af_xh.sh
+++ b/examples/multilingual/data_scripts/download_af_xh.sh
--- a/examples/multilingual/data_scripts/download_flores_data.sh
+++ b/examples/multilingual/data_scripts/download_flores_data.sh
--- a/examples/multilingual/data_scripts/download_iitb.sh
+++ b/examples/multilingual/data_scripts/download_iitb.sh
--- a/examples/multilingual/data_scripts/download_iwslt_and_extract.sh
+++ b/examples/multilingual/data_scripts/download_iwslt_and_extract.sh
--- a/examples/multilingual/data_scripts/download_lotus.sh
+++ b/examples/multilingual/data_scripts/download_lotus.sh
--- a/examples/multilingual/data_scripts/download_ted_and_extract.py
+++ b/examples/multilingual/data_scripts/download_ted_and_extract.py
--- a/examples/multilingual/data_scripts/download_wat19_my.sh
+++ b/examples/multilingual/data_scripts/download_wat19_my.sh
--- a/examples/multilingual/data_scripts/download_wmt19_and_before.py
+++ b/examples/multilingual/data_scripts/download_wmt19_and_before.py
--- a/examples/multilingual/data_scripts/download_wmt20.sh
+++ b/examples/multilingual/data_scripts/download_wmt20.sh
--- a/examples/multilingual/data_scripts/preprocess_ML50_v1.sh
+++ b/examples/multilingual/data_scripts/preprocess_ML50_v1.sh
--- a/examples/multilingual/data_scripts/remove_valid_test_in_train.py
+++ b/examples/multilingual/data_scripts/remove_valid_test_in_train.py
--- a/examples/multilingual/data_scripts/requirement.txt
+++ b/examples/multilingual/data_scripts/requirement.txt
--- a/examples/multilingual/data_scripts/utils/dedup.py
+++ b/examples/multilingual/data_scripts/utils/dedup.py
--- a/examples/multilingual/data_scripts/utils/fasttext_multi_filter.py
+++ b/examples/multilingual/data_scripts/utils/fasttext_multi_filter.py
--- a/examples/multilingual/data_scripts/utils/strip_sgm.sh
+++ b/examples/multilingual/data_scripts/utils/strip_sgm.sh
--- a/examples/multilingual/finetune_multilingual_model.sh
+++ b/examples/multilingual/finetune_multilingual_model.sh
--- a/examples/multilingual/multilingual_fairseq_gen.sh
+++ b/examples/multilingual/multilingual_fairseq_gen.sh
--- a/examples/multilingual/train_multilingual_model.sh
+++ b/examples/multilingual/train_multilingual_model.sh
--- a/examples/noisychannel/README.md
+++ b/examples/noisychannel/README.md
--- a/examples/noisychannel/__init__.py
+++ b/examples/noisychannel/__init__.py
--- a/examples/noisychannel/rerank.py
+++ b/examples/noisychannel/rerank.py
--- a/examples/noisychannel/rerank_generate.py
+++ b/examples/noisychannel/rerank_generate.py
--- a/examples/noisychannel/rerank_options.py
+++ b/examples/noisychannel/rerank_options.py
--- a/examples/noisychannel/rerank_score_bw.py
+++ b/examples/noisychannel/rerank_score_bw.py
--- a/examples/noisychannel/rerank_score_lm.py
+++ b/examples/noisychannel/rerank_score_lm.py
--- a/examples/noisychannel/rerank_tune.py
+++ b/examples/noisychannel/rerank_tune.py
--- a/examples/noisychannel/rerank_utils.py
+++ b/examples/noisychannel/rerank_utils.py
--- a/examples/nonautoregressive_translation/README.md
+++ b/examples/nonautoregressive_translation/README.md
--- a/examples/nonautoregressive_translation/scripts.md
+++ b/examples/nonautoregressive_translation/scripts.md
--- a/examples/paraphraser/README.md
+++ b/examples/paraphraser/README.md
--- a/examples/paraphraser/paraphrase.py
+++ b/examples/paraphraser/paraphrase.py
--- a/examples/pay_less_attention_paper/README.md
+++ b/examples/pay_less_attention_paper/README.md
--- a/examples/pointer_generator/README.md
+++ b/examples/pointer_generator/README.md
--- a/examples/pointer_generator/README.xsum.md
+++ b/examples/pointer_generator/README.xsum.md
--- a/examples/pointer_generator/pointer_generator_src/__init__.py
+++ b/examples/pointer_generator/pointer_generator_src/__init__.py
--- a/examples/pointer_generator/pointer_generator_src/transformer_pg.py
+++ b/examples/pointer_generator/pointer_generator_src/transformer_pg.py
--- a/examples/pointer_generator/postprocess.py
+++ b/examples/pointer_generator/postprocess.py
--- a/examples/pointer_generator/preprocess.py
+++ b/examples/pointer_generator/preprocess.py
--- a/examples/quant_noise/README.md
+++ b/examples/quant_noise/README.md
--- a/examples/quant_noise/transformer_quantization_config.yaml
+++ b/examples/quant_noise/transformer_quantization_config.yaml
--- a/examples/roberta/README.custom_classification.md
+++ b/examples/roberta/README.custom_classification.md
--- a/examples/roberta/README.glue.md
+++ b/examples/roberta/README.glue.md
--- a/examples/roberta/README.md
+++ b/examples/roberta/README.md
--- a/examples/roberta/README.pretraining.md
+++ b/examples/roberta/README.pretraining.md
--- a/examples/roberta/README.race.md
+++ b/examples/roberta/README.race.md
--- a/examples/roberta/commonsense_qa/README.md
+++ b/examples/roberta/commonsense_qa/README.md
--- a/examples/roberta/commonsense_qa/__init__.py
+++ b/examples/roberta/commonsense_qa/__init__.py
--- a/examples/roberta/commonsense_qa/commonsense_qa_task.py
+++ b/examples/roberta/commonsense_qa/commonsense_qa_task.py
--- a/examples/roberta/commonsense_qa/download_cqa_data.sh
+++ b/examples/roberta/commonsense_qa/download_cqa_data.sh
--- a/examples/roberta/multiprocessing_bpe_encoder.py
+++ b/examples/roberta/multiprocessing_bpe_encoder.py
--- a/examples/roberta/preprocess_GLUE_tasks.sh
+++ b/examples/roberta/preprocess_GLUE_tasks.sh
--- a/examples/roberta/preprocess_RACE.py
+++ b/examples/roberta/preprocess_RACE.py
--- a/examples/roberta/preprocess_RACE.sh
+++ b/examples/roberta/preprocess_RACE.sh
--- a/examples/roberta/wsc/README.md
+++ b/examples/roberta/wsc/README.md
--- a/examples/roberta/wsc/__init__.py
+++ b/examples/roberta/wsc/__init__.py
--- a/examples/roberta/wsc/wsc_criterion.py
+++ b/examples/roberta/wsc/wsc_criterion.py
--- a/examples/roberta/wsc/wsc_task.py
+++ b/examples/roberta/wsc/wsc_task.py
--- a/examples/roberta/wsc/wsc_utils.py
+++ b/examples/roberta/wsc/wsc_utils.py
--- a/examples/rxf/README.md
+++ b/examples/rxf/README.md
--- a/examples/rxf/__init__.py
+++ b/examples/rxf/__init__.py
--- a/examples/rxf/rxf_src/__init__.py
+++ b/examples/rxf/rxf_src/__init__.py
--- a/examples/rxf/rxf_src/label_smoothed_cross_entropy_r3f.py
+++ b/examples/rxf/rxf_src/label_smoothed_cross_entropy_r3f.py
--- a/examples/rxf/rxf_src/sentence_prediction_r3f.py
+++ b/examples/rxf/rxf_src/sentence_prediction_r3f.py
--- a/examples/scaling_nmt/README.md
+++ b/examples/scaling_nmt/README.md
--- a/examples/simultaneous_translation/README.md
+++ b/examples/simultaneous_translation/README.md
--- a/examples/simultaneous_translation/__init__.py
+++ b/examples/simultaneous_translation/__init__.py
--- a/examples/simultaneous_translation/eval/__init__.py
+++ b/examples/simultaneous_translation/eval/__init__.py
--- a/examples/simultaneous_translation/eval/agents/__init__.py
+++ b/examples/simultaneous_translation/eval/agents/__init__.py
--- a/examples/simultaneous_translation/eval/agents/agent.py
+++ b/examples/simultaneous_translation/eval/agents/agent.py
--- a/examples/simultaneous_translation/eval/agents/simul_trans_agent.py
+++ b/examples/simultaneous_translation/eval/agents/simul_trans_agent.py
--- a/examples/simultaneous_translation/eval/agents/simul_trans_text_agent.py
+++ b/examples/simultaneous_translation/eval/agents/simul_trans_text_agent.py
--- a/examples/simultaneous_translation/eval/agents/word_splitter.py
+++ b/examples/simultaneous_translation/eval/agents/word_splitter.py
--- a/examples/simultaneous_translation/eval/client.py
+++ b/examples/simultaneous_translation/eval/client.py
--- a/examples/simultaneous_translation/eval/eval_latency.py
+++ b/examples/simultaneous_translation/eval/eval_latency.py
--- a/examples/simultaneous_translation/eval/evaluate.py
+++ b/examples/simultaneous_translation/eval/evaluate.py
--- a/examples/simultaneous_translation/eval/scorers/__init__.py
+++ b/examples/simultaneous_translation/eval/scorers/__init__.py
--- a/examples/simultaneous_translation/eval/scorers/scorer.py
+++ b/examples/simultaneous_translation/eval/scorers/scorer.py
--- a/examples/simultaneous_translation/eval/scorers/text_scorer.py
+++ b/examples/simultaneous_translation/eval/scorers/text_scorer.py
--- a/examples/simultaneous_translation/eval/server.py
+++ b/examples/simultaneous_translation/eval/server.py
--- a/examples/simultaneous_translation/models/__init__.py
+++ b/examples/simultaneous_translation/models/__init__.py
--- a/examples/simultaneous_translation/models/convtransformer_simul_trans.py
+++ b/examples/simultaneous_translation/models/convtransformer_simul_trans.py
--- a/examples/simultaneous_translation/models/transformer_monotonic_attention.py
+++ b/examples/simultaneous_translation/models/transformer_monotonic_attention.py
--- a/examples/simultaneous_translation/modules/__init__.py
+++ b/examples/simultaneous_translation/modules/__init__.py
--- a/examples/simultaneous_translation/modules/fixed_pre_decision.py
+++ b/examples/simultaneous_translation/modules/fixed_pre_decision.py
--- a/examples/simultaneous_translation/modules/monotonic_multihead_attention.py
+++ b/examples/simultaneous_translation/modules/monotonic_multihead_attention.py
--- a/examples/simultaneous_translation/modules/monotonic_transformer_layer.py
+++ b/examples/simultaneous_translation/modules/monotonic_transformer_layer.py
--- a/examples/simultaneous_translation/utils/__init__.py
+++ b/examples/simultaneous_translation/utils/__init__.py
--- a/examples/simultaneous_translation/utils/data_utils.py
+++ b/examples/simultaneous_translation/utils/data_utils.py
--- a/examples/simultaneous_translation/utils/functions.py
+++ b/examples/simultaneous_translation/utils/functions.py
--- a/examples/simultaneous_translation/utils/latency.py
+++ b/examples/simultaneous_translation/utils/latency.py
--- a/examples/speech_recognition/README.md
+++ b/examples/speech_recognition/README.md
--- a/examples/speech_recognition/__init__.py
+++ b/examples/speech_recognition/__init__.py
--- a/examples/speech_recognition/criterions/ASG_loss.py
+++ b/examples/speech_recognition/criterions/ASG_loss.py
--- a/examples/speech_recognition/criterions/__init__.py
+++ b/examples/speech_recognition/criterions/__init__.py
--- a/examples/speech_recognition/criterions/cross_entropy_acc.py
+++ b/examples/speech_recognition/criterions/cross_entropy_acc.py
--- a/examples/speech_recognition/data/__init__.py
+++ b/examples/speech_recognition/data/__init__.py
--- a/examples/speech_recognition/data/asr_dataset.py
+++ b/examples/speech_recognition/data/asr_dataset.py
--- a/examples/speech_recognition/data/collaters.py
+++ b/examples/speech_recognition/data/collaters.py
--- a/examples/speech_recognition/data/data_utils.py
+++ b/examples/speech_recognition/data/data_utils.py
--- a/examples/speech_recognition/data/replabels.py
+++ b/examples/speech_recognition/data/replabels.py
--- a/examples/speech_recognition/datasets/asr_prep_json.py
+++ b/examples/speech_recognition/datasets/asr_prep_json.py
--- a/examples/speech_recognition/datasets/prepare-librispeech.sh
+++ b/examples/speech_recognition/datasets/prepare-librispeech.sh
--- a/examples/speech_recognition/hydra/README.md
+++ b/examples/speech_recognition/hydra/README.md
--- a/examples/speech_recognition/hydra/conf/hydra/sweeper/ax.yaml
+++ b/examples/speech_recognition/hydra/conf/hydra/sweeper/ax.yaml
--- a/examples/speech_recognition/hydra/conf/infer.yaml
+++ b/examples/speech_recognition/hydra/conf/infer.yaml
--- a/examples/speech_recognition/hydra/decoder.py
+++ b/examples/speech_recognition/hydra/decoder.py
--- a/examples/speech_recognition/hydra/infer.py
+++ b/examples/speech_recognition/hydra/infer.py
--- a/examples/speech_recognition/infer.py
+++ b/examples/speech_recognition/infer.py
--- a/examples/speech_recognition/models/__init__.py
+++ b/examples/speech_recognition/models/__init__.py
--- a/examples/speech_recognition/models/vggtransformer.py
+++ b/examples/speech_recognition/models/vggtransformer.py
--- a/examples/speech_recognition/models/w2l_conv_glu_enc.py
+++ b/examples/speech_recognition/models/w2l_conv_glu_enc.py
--- a/examples/speech_recognition/tasks/__init__.py
+++ b/examples/speech_recognition/tasks/__init__.py
--- a/examples/speech_recognition/tasks/speech_recognition.py
+++ b/examples/speech_recognition/tasks/speech_recognition.py
--- a/examples/speech_recognition/utils/wer_utils.py
+++ b/examples/speech_recognition/utils/wer_utils.py
--- a/examples/speech_recognition/w2l_decoder.py
+++ b/examples/speech_recognition/w2l_decoder.py
--- a/examples/speech_to_text/README.md
+++ b/examples/speech_to_text/README.md
--- a/examples/speech_to_text/data_utils.py
+++ b/examples/speech_to_text/data_utils.py
--- a/examples/speech_to_text/docs/covost_example.md
+++ b/examples/speech_to_text/docs/covost_example.md
--- a/examples/speech_to_text/docs/librispeech_example.md
+++ b/examples/speech_to_text/docs/librispeech_example.md
--- a/examples/speech_to_text/docs/mtedx_example.md
+++ b/examples/speech_to_text/docs/mtedx_example.md
--- a/examples/speech_to_text/docs/mustc_example.md
+++ b/examples/speech_to_text/docs/mustc_example.md
--- a/examples/speech_to_text/docs/simulst_mustc_example.md
+++ b/examples/speech_to_text/docs/simulst_mustc_example.md
--- a/examples/speech_to_text/prep_asr_data.py
+++ b/examples/speech_to_text/prep_asr_data.py
--- a/examples/speech_to_text/prep_covost_data.py
+++ b/examples/speech_to_text/prep_covost_data.py
--- a/examples/speech_to_text/prep_librispeech_data.py
+++ b/examples/speech_to_text/prep_librispeech_data.py
--- a/examples/speech_to_text/prep_mt_data.py
+++ b/examples/speech_to_text/prep_mt_data.py
--- a/examples/speech_to_text/prep_mtedx_data.py
+++ b/examples/speech_to_text/prep_mtedx_data.py
--- a/examples/speech_to_text/prep_mustc_data.py
+++ b/examples/speech_to_text/prep_mustc_data.py
--- a/examples/speech_to_text/prep_st_data.py
+++ b/examples/speech_to_text/prep_st_data.py
--- a/examples/speech_to_text/simultaneous_translation/agents/fairseq_simul_st_agent.py
+++ b/examples/speech_to_text/simultaneous_translation/agents/fairseq_simul_st_agent.py
--- a/examples/speech_to_text/simultaneous_translation/agents/simul_trans_agent.py
+++ b/examples/speech_to_text/simultaneous_translation/agents/simul_trans_agent.py
--- a/examples/stories/README.md
+++ b/examples/stories/README.md
--- a/examples/translation/README.md
+++ b/examples/translation/README.md
--- a/examples/translation/prepare-iwslt14.sh
+++ b/examples/translation/prepare-iwslt14.sh
--- a/examples/translation/prepare-iwslt17-multilingual.sh
+++ b/examples/translation/prepare-iwslt17-multilingual.sh
--- a/examples/translation/prepare-wmt14en2de.sh
+++ b/examples/translation/prepare-wmt14en2de.sh
--- a/examples/translation/prepare-wmt14en2fr.sh
+++ b/examples/translation/prepare-wmt14en2fr.sh
--- a/examples/translation_moe/README.md
+++ b/examples/translation_moe/README.md
--- a/examples/translation_moe/score.py
+++ b/examples/translation_moe/score.py
--- a/examples/translation_moe/translation_moe_src/__init__.py
+++ b/examples/translation_moe/translation_moe_src/__init__.py
--- a/examples/translation_moe/translation_moe_src/logsumexp_moe.py
+++ b/examples/translation_moe/translation_moe_src/logsumexp_moe.py
--- a/examples/translation_moe/translation_moe_src/mean_pool_gating_network.py
+++ b/examples/translation_moe/translation_moe_src/mean_pool_gating_network.py
--- a/examples/translation_moe/translation_moe_src/translation_moe.py
+++ b/examples/translation_moe/translation_moe_src/translation_moe.py
--- a/examples/truncated_bptt/README.md
+++ b/examples/truncated_bptt/README.md
--- a/examples/truncated_bptt/__init__.py
+++ b/examples/truncated_bptt/__init__.py
--- a/examples/truncated_bptt/transformer_xl_model.py
+++ b/examples/truncated_bptt/transformer_xl_model.py
--- a/examples/truncated_bptt/truncated_bptt_lm_task.py
+++ b/examples/truncated_bptt/truncated_bptt_lm_task.py
--- a/examples/unsupervised_quality_estimation/README.md
+++ b/examples/unsupervised_quality_estimation/README.md
--- a/examples/unsupervised_quality_estimation/aggregate_scores.py
+++ b/examples/unsupervised_quality_estimation/aggregate_scores.py
--- a/examples/unsupervised_quality_estimation/meteor.py
+++ b/examples/unsupervised_quality_estimation/meteor.py
--- a/examples/unsupervised_quality_estimation/repeat_lines.py
+++ b/examples/unsupervised_quality_estimation/repeat_lines.py
--- a/examples/wav2vec/README.md
+++ b/examples/wav2vec/README.md
--- a/examples/wav2vec/config/finetuning/base_100h.yaml
+++ b/examples/wav2vec/config/finetuning/base_100h.yaml
--- a/examples/wav2vec/config/finetuning/base_10h.yaml
+++ b/examples/wav2vec/config/finetuning/base_10h.yaml
--- a/examples/wav2vec/config/finetuning/base_10m.yaml
+++ b/examples/wav2vec/config/finetuning/base_10m.yaml
--- a/examples/wav2vec/config/finetuning/base_1h.yaml
+++ b/examples/wav2vec/config/finetuning/base_1h.yaml
--- a/examples/wav2vec/config/finetuning/base_960h.yaml
+++ b/examples/wav2vec/config/finetuning/base_960h.yaml
--- a/examples/wav2vec/config/finetuning/vox_100h.yaml
+++ b/examples/wav2vec/config/finetuning/vox_100h.yaml
--- a/examples/wav2vec/config/finetuning/vox_10h.yaml
+++ b/examples/wav2vec/config/finetuning/vox_10h.yaml
--- a/examples/wav2vec/config/finetuning/vox_10m.yaml
+++ b/examples/wav2vec/config/finetuning/vox_10m.yaml
--- a/examples/wav2vec/config/finetuning/vox_1h.yaml
+++ b/examples/wav2vec/config/finetuning/vox_1h.yaml
--- a/examples/wav2vec/config/finetuning/vox_960h.yaml
+++ b/examples/wav2vec/config/finetuning/vox_960h.yaml
--- a/examples/wav2vec/config/pretraining/wav2vec2_base_librispeech.yaml
+++ b/examples/wav2vec/config/pretraining/wav2vec2_base_librispeech.yaml
--- a/examples/wav2vec/config/pretraining/wav2vec2_large_librivox.yaml
+++ b/examples/wav2vec/config/pretraining/wav2vec2_large_librivox.yaml
--- a/examples/wav2vec/libri_labels.py
+++ b/examples/wav2vec/libri_labels.py
--- a/examples/wav2vec/vq-wav2vec_featurize.py
+++ b/examples/wav2vec/vq-wav2vec_featurize.py
--- a/examples/wav2vec/wav2vec_featurize.py
+++ b/examples/wav2vec/wav2vec_featurize.py
--- a/examples/wav2vec/wav2vec_manifest.py
+++ b/examples/wav2vec/wav2vec_manifest.py
--- a/examples/wmt19/README.md
+++ b/examples/wmt19/README.md
--- a/examples/wmt20/README.md
+++ b/examples/wmt20/README.md
--- a/examples/xlmr/README.md
+++ b/examples/xlmr/README.md
--- a/fairseq/__init__.py
+++ b/fairseq/__init__.py
--- a/fairseq/benchmark/__init__.py
+++ b/fairseq/benchmark/__init__.py
--- a/fairseq/benchmark/dummy_lm.py
+++ b/fairseq/benchmark/dummy_lm.py
--- a/fairseq/benchmark/dummy_masked_lm.py
+++ b/fairseq/benchmark/dummy_masked_lm.py
--- a/fairseq/benchmark/dummy_model.py
+++ b/fairseq/benchmark/dummy_model.py
--- a/fairseq/benchmark/dummy_mt.py
+++ b/fairseq/benchmark/dummy_mt.py
--- a/fairseq/binarizer.py
+++ b/fairseq/binarizer.py
--- a/fairseq/checkpoint_utils.py
+++ b/fairseq/checkpoint_utils.py
--- a/fairseq/clib/cuda/ngram_repeat_block_cuda.cpp
+++ b/fairseq/clib/cuda/ngram_repeat_block_cuda.cpp
--- a/fairseq/clib/cuda/ngram_repeat_block_cuda_kernel.cu
+++ b/fairseq/clib/cuda/ngram_repeat_block_cuda_kernel.cu
--- a/fairseq/clib/libbleu/libbleu.cpp
+++ b/fairseq/clib/libbleu/libbleu.cpp
--- a/fairseq/clib/libbleu/module.cpp
+++ b/fairseq/clib/libbleu/module.cpp
--- a/fairseq/clib/libnat/edit_dist.cpp
+++ b/fairseq/clib/libnat/edit_dist.cpp
--- a/fairseq/clib/libnat_cuda/binding.cpp
+++ b/fairseq/clib/libnat_cuda/binding.cpp
--- a/fairseq/clib/libnat_cuda/edit_dist.cu
+++ b/fairseq/clib/libnat_cuda/edit_dist.cu
--- a/fairseq/clib/libnat_cuda/edit_dist.h
+++ b/fairseq/clib/libnat_cuda/edit_dist.h
--- a/fairseq/config/__init__.py
+++ b/fairseq/config/__init__.py
--- a/fairseq/config/config.yaml
+++ b/fairseq/config/config.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_baevski_gbw.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_baevski_gbw.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_baevski_wiki103.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_baevski_wiki103.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_big.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_big.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_gbw.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_gbw.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_gpt.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_gpt.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_gpt2_big.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_gpt2_big.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_gpt2_medium.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_gpt2_medium.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_gpt2_small.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_gpt2_small.yaml
--- a/fairseq/config/model/transformer_lm/transformer_lm_wiki103.yaml
+++ b/fairseq/config/model/transformer_lm/transformer_lm_wiki103.yaml
--- a/fairseq/config/model/wav2vec/vq_wav2vec_gumbel.yaml
+++ b/fairseq/config/model/wav2vec/vq_wav2vec_gumbel.yaml
--- a/fairseq/config/model/wav2vec2/wav2vec2_base.yaml
+++ b/fairseq/config/model/wav2vec2/wav2vec2_base.yaml
--- a/fairseq/config/model/wav2vec2/wav2vec2_large.yaml
+++ b/fairseq/config/model/wav2vec2/wav2vec2_large.yaml
--- a/fairseq/criterions/__init__.py
+++ b/fairseq/criterions/__init__.py
--- a/fairseq/criterions/adaptive_loss.py
+++ b/fairseq/criterions/adaptive_loss.py
--- a/fairseq/criterions/composite_loss.py
+++ b/fairseq/criterions/composite_loss.py
--- a/fairseq/criterions/cross_entropy.py
+++ b/fairseq/criterions/cross_entropy.py
--- a/fairseq/criterions/ctc.py
+++ b/fairseq/criterions/ctc.py
--- a/fairseq/criterions/fairseq_criterion.py
+++ b/fairseq/criterions/fairseq_criterion.py
--- a/fairseq/criterions/label_smoothed_cross_entropy.py
+++ b/fairseq/criterions/label_smoothed_cross_entropy.py
--- a/fairseq/criterions/label_smoothed_cross_entropy_latency_augmented.py
+++ b/fairseq/criterions/label_smoothed_cross_entropy_latency_augmented.py
--- a/fairseq/criterions/label_smoothed_cross_entropy_with_alignment.py
+++ b/fairseq/criterions/label_smoothed_cross_entropy_with_alignment.py
--- a/fairseq/criterions/label_smoothed_cross_entropy_with_ctc.py
+++ b/fairseq/criterions/label_smoothed_cross_entropy_with_ctc.py
--- a/fairseq/criterions/legacy_masked_lm.py
+++ b/fairseq/criterions/legacy_masked_lm.py
--- a/fairseq/criterions/masked_lm.py
+++ b/fairseq/criterions/masked_lm.py
--- a/fairseq/criterions/model_criterion.py
+++ b/fairseq/criterions/model_criterion.py
--- a/fairseq/criterions/nat_loss.py
+++ b/fairseq/criterions/nat_loss.py
--- a/fairseq/criterions/sentence_prediction.py
+++ b/fairseq/criterions/sentence_prediction.py
--- a/fairseq/criterions/sentence_ranking.py
+++ b/fairseq/criterions/sentence_ranking.py
--- a/fairseq/criterions/wav2vec_criterion.py
+++ b/fairseq/criterions/wav2vec_criterion.py
--- a/fairseq/data/__init__.py
+++ b/fairseq/data/__init__.py
--- a/fairseq/data/add_target_dataset.py
+++ b/fairseq/data/add_target_dataset.py
--- a/fairseq/data/append_token_dataset.py
+++ b/fairseq/data/append_token_dataset.py
--- a/fairseq/data/audio/__init__.py
+++ b/fairseq/data/audio/__init__.py
--- a/fairseq/data/audio/audio_utils.py
+++ b/fairseq/data/audio/audio_utils.py
--- a/fairseq/data/audio/feature_transforms/__init__.py
+++ b/fairseq/data/audio/feature_transforms/__init__.py
--- a/fairseq/data/audio/feature_transforms/global_cmvn.py
+++ b/fairseq/data/audio/feature_transforms/global_cmvn.py
--- a/fairseq/data/audio/feature_transforms/specaugment.py
+++ b/fairseq/data/audio/feature_transforms/specaugment.py
--- a/fairseq/data/audio/feature_transforms/utterance_cmvn.py
+++ b/fairseq/data/audio/feature_transforms/utterance_cmvn.py
--- a/fairseq/data/audio/raw_audio_dataset.py
+++ b/fairseq/data/audio/raw_audio_dataset.py
--- a/fairseq/data/audio/speech_to_text_dataset.py
+++ b/fairseq/data/audio/speech_to_text_dataset.py
--- a/fairseq/data/backtranslation_dataset.py
+++ b/fairseq/data/backtranslation_dataset.py
--- a/fairseq/data/base_wrapper_dataset.py
+++ b/fairseq/data/base_wrapper_dataset.py
--- a/fairseq/data/bucket_pad_length_dataset.py
+++ b/fairseq/data/bucket_pad_length_dataset.py
--- a/fairseq/data/colorize_dataset.py
+++ b/fairseq/data/colorize_dataset.py
--- a/fairseq/data/concat_dataset.py
+++ b/fairseq/data/concat_dataset.py
--- a/fairseq/data/concat_sentences_dataset.py
+++ b/fairseq/data/concat_sentences_dataset.py
--- a/fairseq/data/data_utils.py
+++ b/fairseq/data/data_utils.py
--- a/fairseq/data/data_utils_fast.pyx
+++ b/fairseq/data/data_utils_fast.pyx
--- a/fairseq/data/denoising_dataset.py
+++ b/fairseq/data/denoising_dataset.py
--- a/fairseq/data/dictionary.py
+++ b/fairseq/data/dictionary.py
--- a/fairseq/data/encoders/__init__.py
+++ b/fairseq/data/encoders/__init__.py
--- a/fairseq/data/encoders/byte_bpe.py
+++ b/fairseq/data/encoders/byte_bpe.py
--- a/fairseq/data/encoders/byte_utils.py
+++ b/fairseq/data/encoders/byte_utils.py
--- a/fairseq/data/encoders/bytes.py
+++ b/fairseq/data/encoders/bytes.py
--- a/fairseq/data/encoders/characters.py
+++ b/fairseq/data/encoders/characters.py
--- a/fairseq/data/encoders/fastbpe.py
+++ b/fairseq/data/encoders/fastbpe.py
--- a/fairseq/data/encoders/gpt2_bpe.py
+++ b/fairseq/data/encoders/gpt2_bpe.py
--- a/fairseq/data/encoders/gpt2_bpe_utils.py
+++ b/fairseq/data/encoders/gpt2_bpe_utils.py
--- a/fairseq/data/encoders/hf_bert_bpe.py
+++ b/fairseq/data/encoders/hf_bert_bpe.py
--- a/fairseq/data/encoders/hf_byte_bpe.py
+++ b/fairseq/data/encoders/hf_byte_bpe.py
--- a/fairseq/data/encoders/moses_tokenizer.py
+++ b/fairseq/data/encoders/moses_tokenizer.py
--- a/fairseq/data/encoders/nltk_tokenizer.py
+++ b/fairseq/data/encoders/nltk_tokenizer.py
--- a/fairseq/data/encoders/sentencepiece_bpe.py
+++ b/fairseq/data/encoders/sentencepiece_bpe.py
--- a/fairseq/data/encoders/space_tokenizer.py
+++ b/fairseq/data/encoders/space_tokenizer.py
--- a/fairseq/data/encoders/subword_nmt_bpe.py
+++ b/fairseq/data/encoders/subword_nmt_bpe.py
--- a/fairseq/data/encoders/utils.py
+++ b/fairseq/data/encoders/utils.py
--- a/fairseq/data/fairseq_dataset.py
+++ b/fairseq/data/fairseq_dataset.py
--- a/fairseq/data/fasta_dataset.py
+++ b/fairseq/data/fasta_dataset.py
--- a/fairseq/data/id_dataset.py
+++ b/fairseq/data/id_dataset.py
--- a/fairseq/data/indexed_dataset.py
+++ b/fairseq/data/indexed_dataset.py
--- a/fairseq/data/iterators.py
+++ b/fairseq/data/iterators.py
--- a/fairseq/data/language_pair_dataset.py
+++ b/fairseq/data/language_pair_dataset.py
--- a/fairseq/data/legacy/__init__.py
+++ b/fairseq/data/legacy/__init__.py
--- a/fairseq/data/legacy/block_pair_dataset.py
+++ b/fairseq/data/legacy/block_pair_dataset.py
--- a/fairseq/data/legacy/masked_lm_dataset.py
+++ b/fairseq/data/legacy/masked_lm_dataset.py
--- a/fairseq/data/legacy/masked_lm_dictionary.py
+++ b/fairseq/data/legacy/masked_lm_dictionary.py
--- a/fairseq/data/list_dataset.py
+++ b/fairseq/data/list_dataset.py
--- a/fairseq/data/lm_context_window_dataset.py
+++ b/fairseq/data/lm_context_window_dataset.py
--- a/fairseq/data/lru_cache_dataset.py
+++ b/fairseq/data/lru_cache_dataset.py
--- a/fairseq/data/mask_tokens_dataset.py
+++ b/fairseq/data/mask_tokens_dataset.py
--- a/fairseq/data/monolingual_dataset.py
+++ b/fairseq/data/monolingual_dataset.py
--- a/fairseq/data/multi_corpus_dataset.py
+++ b/fairseq/data/multi_corpus_dataset.py
--- a/fairseq/data/multi_corpus_sampled_dataset.py
+++ b/fairseq/data/multi_corpus_sampled_dataset.py
--- a/fairseq/data/multilingual/__init__.py
+++ b/fairseq/data/multilingual/__init__.py
--- a/fairseq/data/multilingual/multilingual_data_manager.py
+++ b/fairseq/data/multilingual/multilingual_data_manager.py
--- a/fairseq/data/multilingual/multilingual_utils.py
+++ b/fairseq/data/multilingual/multilingual_utils.py
--- a/fairseq/data/multilingual/sampled_multi_dataset.py
+++ b/fairseq/data/multilingual/sampled_multi_dataset.py
--- a/fairseq/data/multilingual/sampled_multi_epoch_dataset.py
+++ b/fairseq/data/multilingual/sampled_multi_epoch_dataset.py
--- a/fairseq/data/multilingual/sampling_method.py
+++ b/fairseq/data/multilingual/sampling_method.py
--- a/fairseq/data/nested_dictionary_dataset.py
+++ b/fairseq/data/nested_dictionary_dataset.py
--- a/fairseq/data/noising.py
+++ b/fairseq/data/noising.py
--- a/fairseq/data/num_samples_dataset.py
+++ b/fairseq/data/num_samples_dataset.py
--- a/fairseq/data/numel_dataset.py
+++ b/fairseq/data/numel_dataset.py
--- a/fairseq/data/offset_tokens_dataset.py
+++ b/fairseq/data/offset_tokens_dataset.py
--- a/fairseq/data/pad_dataset.py
+++ b/fairseq/data/pad_dataset.py
--- a/fairseq/data/plasma_utils.py
+++ b/fairseq/data/plasma_utils.py
--- a/fairseq/data/prepend_dataset.py
+++ b/fairseq/data/prepend_dataset.py
--- a/fairseq/data/prepend_token_dataset.py
+++ b/fairseq/data/prepend_token_dataset.py
--- a/fairseq/data/raw_label_dataset.py
+++ b/fairseq/data/raw_label_dataset.py
--- a/fairseq/data/replace_dataset.py
+++ b/fairseq/data/replace_dataset.py
--- a/fairseq/data/resampling_dataset.py
+++ b/fairseq/data/resampling_dataset.py
--- a/fairseq/data/roll_dataset.py
+++ b/fairseq/data/roll_dataset.py
--- a/fairseq/data/round_robin_zip_datasets.py
+++ b/fairseq/data/round_robin_zip_datasets.py
--- a/fairseq/data/shorten_dataset.py
+++ b/fairseq/data/shorten_dataset.py
--- a/fairseq/data/sort_dataset.py
+++ b/fairseq/data/sort_dataset.py
--- a/fairseq/data/strip_token_dataset.py
+++ b/fairseq/data/strip_token_dataset.py
--- a/fairseq/data/subsample_dataset.py
+++ b/fairseq/data/subsample_dataset.py
--- a/fairseq/data/token_block_dataset.py
+++ b/fairseq/data/token_block_dataset.py
--- a/fairseq/data/token_block_utils_fast.pyx
+++ b/fairseq/data/token_block_utils_fast.pyx
--- a/fairseq/data/transform_eos_dataset.py
+++ b/fairseq/data/transform_eos_dataset.py
--- a/fairseq/data/transform_eos_lang_pair_dataset.py
+++ b/fairseq/data/transform_eos_lang_pair_dataset.py
--- a/fairseq/dataclass/__init__.py
+++ b/fairseq/dataclass/__init__.py
--- a/fairseq/dataclass/configs.py
+++ b/fairseq/dataclass/configs.py
--- a/fairseq/dataclass/constants.py
+++ b/fairseq/dataclass/constants.py
--- a/fairseq/dataclass/initialize.py
+++ b/fairseq/dataclass/initialize.py
--- a/fairseq/dataclass/utils.py
+++ b/fairseq/dataclass/utils.py
--- a/fairseq/distributed/__init__.py
+++ b/fairseq/distributed/__init__.py
--- a/fairseq/distributed/distributed_timeout_wrapper.py
+++ b/fairseq/distributed/distributed_timeout_wrapper.py
--- a/fairseq/distributed/fully_sharded_data_parallel.py
+++ b/fairseq/distributed/fully_sharded_data_parallel.py
--- a/fairseq/distributed/legacy_distributed_data_parallel.py
+++ b/fairseq/distributed/legacy_distributed_data_parallel.py
--- a/fairseq/distributed/module_proxy_wrapper.py
+++ b/fairseq/distributed/module_proxy_wrapper.py
--- a/fairseq/distributed/tpu_distributed_data_parallel.py
+++ b/fairseq/distributed/tpu_distributed_data_parallel.py
--- a/fairseq/distributed/utils.py
+++ b/fairseq/distributed/utils.py
--- a/fairseq/file_io.py
+++ b/fairseq/file_io.py
--- a/fairseq/file_utils.py
+++ b/fairseq/file_utils.py
--- a/fairseq/hub_utils.py
+++ b/fairseq/hub_utils.py
--- a/fairseq/incremental_decoding_utils.py
+++ b/fairseq/incremental_decoding_utils.py
--- a/fairseq/iterative_refinement_generator.py
+++ b/fairseq/iterative_refinement_generator.py
--- a/fairseq/logging/__init__.py
+++ b/fairseq/logging/__init__.py
--- a/fairseq/logging/meters.py
+++ b/fairseq/logging/meters.py
--- a/fairseq/logging/metrics.py
+++ b/fairseq/logging/metrics.py
--- a/fairseq/logging/progress_bar.py
+++ b/fairseq/logging/progress_bar.py
--- a/fairseq/model_parallel/__init__.py
+++ b/fairseq/model_parallel/__init__.py
--- a/fairseq/model_parallel/criterions/__init__.py
+++ b/fairseq/model_parallel/criterions/__init__.py
--- a/fairseq/model_parallel/criterions/vocab_parallel_cross_entropy.py
+++ b/fairseq/model_parallel/criterions/vocab_parallel_cross_entropy.py
--- a/fairseq/model_parallel/megatron_trainer.py
+++ b/fairseq/model_parallel/megatron_trainer.py
--- a/fairseq/model_parallel/models/__init__.py
+++ b/fairseq/model_parallel/models/__init__.py
--- a/fairseq/model_parallel/models/pipeline_parallel_transformer/__init__.py
+++ b/fairseq/model_parallel/models/pipeline_parallel_transformer/__init__.py
--- a/fairseq/model_parallel/models/pipeline_parallel_transformer/layers.py
+++ b/fairseq/model_parallel/models/pipeline_parallel_transformer/layers.py
--- a/fairseq/model_parallel/models/pipeline_parallel_transformer/model.py
+++ b/fairseq/model_parallel/models/pipeline_parallel_transformer/model.py
--- a/fairseq/model_parallel/models/roberta/__init__.py
+++ b/fairseq/model_parallel/models/roberta/__init__.py
--- a/fairseq/model_parallel/models/roberta/model.py
+++ b/fairseq/model_parallel/models/roberta/model.py
--- a/fairseq/model_parallel/models/transformer.py
+++ b/fairseq/model_parallel/models/transformer.py
--- a/fairseq/model_parallel/models/transformer_lm.py
+++ b/fairseq/model_parallel/models/transformer_lm.py
--- a/fairseq/model_parallel/modules/__init__.py
+++ b/fairseq/model_parallel/modules/__init__.py
--- a/fairseq/model_parallel/modules/multihead_attention.py
+++ b/fairseq/model_parallel/modules/multihead_attention.py
--- a/fairseq/model_parallel/modules/transformer_layer.py
+++ b/fairseq/model_parallel/modules/transformer_layer.py
--- a/fairseq/models/__init__.py
+++ b/fairseq/models/__init__.py
--- a/fairseq/models/bart/__init__.py
+++ b/fairseq/models/bart/__init__.py
--- a/fairseq/models/bart/hub_interface.py
+++ b/fairseq/models/bart/hub_interface.py
--- a/fairseq/models/bart/model.py
+++ b/fairseq/models/bart/model.py
--- a/fairseq/models/composite_encoder.py
+++ b/fairseq/models/composite_encoder.py
--- a/fairseq/models/distributed_fairseq_model.py
+++ b/fairseq/models/distributed_fairseq_model.py
--- a/fairseq/models/dlcl_transformer.py
+++ b/fairseq/models/dlcl_transformer.py
--- a/fairseq/models/fairseq_decoder.py
+++ b/fairseq/models/fairseq_decoder.py
--- a/fairseq/models/fairseq_encoder.py
+++ b/fairseq/models/fairseq_encoder.py
--- a/fairseq/models/fairseq_incremental_decoder.py
+++ b/fairseq/models/fairseq_incremental_decoder.py
--- a/fairseq/models/fairseq_model.py
+++ b/fairseq/models/fairseq_model.py
--- a/fairseq/models/fconv.py
+++ b/fairseq/models/fconv.py
--- a/fairseq/models/fconv_lm.py
+++ b/fairseq/models/fconv_lm.py
--- a/fairseq/models/fconv_self_att.py
+++ b/fairseq/models/fconv_self_att.py
--- a/fairseq/models/huggingface/__init__.py
+++ b/fairseq/models/huggingface/__init__.py
--- a/fairseq/models/huggingface/hf_gpt2.py
+++ b/fairseq/models/huggingface/hf_gpt2.py
--- a/fairseq/models/lightconv.py
+++ b/fairseq/models/lightconv.py
--- a/fairseq/models/lightconv_lm.py
+++ b/fairseq/models/lightconv_lm.py
--- a/fairseq/models/lstm.py
+++ b/fairseq/models/lstm.py
--- a/fairseq/models/lstm_lm.py
+++ b/fairseq/models/lstm_lm.py
--- a/fairseq/models/masked_lm.py
+++ b/fairseq/models/masked_lm.py
--- a/fairseq/models/model_utils.py
+++ b/fairseq/models/model_utils.py
--- a/fairseq/models/multilingual_transformer.py
+++ b/fairseq/models/multilingual_transformer.py
--- a/fairseq/models/nat/__init__.py
+++ b/fairseq/models/nat/__init__.py
--- a/fairseq/models/nat/cmlm_transformer.py
+++ b/fairseq/models/nat/cmlm_transformer.py
--- a/fairseq/models/nat/fairseq_nat_model.py
+++ b/fairseq/models/nat/fairseq_nat_model.py
--- a/fairseq/models/nat/insertion_transformer.py
+++ b/fairseq/models/nat/insertion_transformer.py
--- a/fairseq/models/nat/iterative_nonautoregressive_transformer.py
+++ b/fairseq/models/nat/iterative_nonautoregressive_transformer.py
--- a/fairseq/models/nat/levenshtein_transformer.py
+++ b/fairseq/models/nat/levenshtein_transformer.py
--- a/fairseq/models/nat/levenshtein_utils.py
+++ b/fairseq/models/nat/levenshtein_utils.py
--- a/fairseq/models/nat/nat_crf_transformer.py
+++ b/fairseq/models/nat/nat_crf_transformer.py
--- a/fairseq/models/nat/nonautoregressive_ensembles.py
+++ b/fairseq/models/nat/nonautoregressive_ensembles.py
--- a/fairseq/models/nat/nonautoregressive_transformer.py
+++ b/fairseq/models/nat/nonautoregressive_transformer.py
--- a/fairseq/models/roberta/__init__.py
+++ b/fairseq/models/roberta/__init__.py
--- a/fairseq/models/roberta/alignment_utils.py
+++ b/fairseq/models/roberta/alignment_utils.py
--- a/fairseq/models/roberta/hub_interface.py
+++ b/fairseq/models/roberta/hub_interface.py
--- a/fairseq/models/roberta/model.py
+++ b/fairseq/models/roberta/model.py
--- a/fairseq/models/roberta/model_camembert.py
+++ b/fairseq/models/roberta/model_camembert.py
--- a/fairseq/models/roberta/model_gottbert.py
+++ b/fairseq/models/roberta/model_gottbert.py
--- a/fairseq/models/roberta/model_xlmr.py
+++ b/fairseq/models/roberta/model_xlmr.py
--- a/fairseq/models/speech_to_text/__init__.py
+++ b/fairseq/models/speech_to_text/__init__.py
--- a/fairseq/models/speech_to_text/berard.py
+++ b/fairseq/models/speech_to_text/berard.py
--- a/fairseq/models/speech_to_text/convtransformer.py
+++ b/fairseq/models/speech_to_text/convtransformer.py
--- a/fairseq/models/speech_to_text/modules/augmented_memory_attention.py
+++ b/fairseq/models/speech_to_text/modules/augmented_memory_attention.py
--- a/fairseq/models/speech_to_text/modules/emformer.py
+++ b/fairseq/models/speech_to_text/modules/emformer.py
--- a/fairseq/models/speech_to_text/s2t_conformer.py
+++ b/fairseq/models/speech_to_text/s2t_conformer.py
--- a/fairseq/models/speech_to_text/s2t_sate.py
+++ b/fairseq/models/speech_to_text/s2t_sate.py
--- a/fairseq/models/speech_to_text/s2t_transformer.py
+++ b/fairseq/models/speech_to_text/s2t_transformer.py
--- a/fairseq/models/speech_to_text/utils.py
+++ b/fairseq/models/speech_to_text/utils.py
--- a/fairseq/models/transformer.py
+++ b/fairseq/models/transformer.py
--- a/fairseq/models/transformer_align.py
+++ b/fairseq/models/transformer_align.py
--- a/fairseq/models/transformer_from_pretrained_xlm.py
+++ b/fairseq/models/transformer_from_pretrained_xlm.py
--- a/fairseq/models/transformer_lm.py
+++ b/fairseq/models/transformer_lm.py
--- a/fairseq/models/wav2vec/__init__.py
+++ b/fairseq/models/wav2vec/__init__.py
--- a/fairseq/models/wav2vec/wav2vec.py
+++ b/fairseq/models/wav2vec/wav2vec.py
--- a/fairseq/models/wav2vec/wav2vec2.py
+++ b/fairseq/models/wav2vec/wav2vec2.py
--- a/fairseq/models/wav2vec/wav2vec2_asr.py
+++ b/fairseq/models/wav2vec/wav2vec2_asr.py
--- a/fairseq/modules/__init__.py
+++ b/fairseq/modules/__init__.py
--- a/fairseq/modules/adaptive_input.py
+++ b/fairseq/modules/adaptive_input.py
--- a/fairseq/modules/adaptive_softmax.py
+++ b/fairseq/modules/adaptive_softmax.py
--- a/fairseq/modules/beamable_mm.py
+++ b/fairseq/modules/beamable_mm.py
--- a/fairseq/modules/character_token_embedder.py
+++ b/fairseq/modules/character_token_embedder.py
--- a/fairseq/modules/checkpoint_activations.py
+++ b/fairseq/modules/checkpoint_activations.py
--- a/fairseq/modules/conformer_layer.py
+++ b/fairseq/modules/conformer_layer.py
--- a/fairseq/modules/conv_tbc.py
+++ b/fairseq/modules/conv_tbc.py
--- a/fairseq/modules/convolution.py
+++ b/fairseq/modules/convolution.py
--- a/fairseq/modules/cross_entropy.py
+++ b/fairseq/modules/cross_entropy.py
--- a/fairseq/modules/cuda_utils.cu
+++ b/fairseq/modules/cuda_utils.cu
--- a/fairseq/modules/downsampled_multihead_attention.py
+++ b/fairseq/modules/downsampled_multihead_attention.py
--- a/fairseq/modules/dynamic_convolution.py
+++ b/fairseq/modules/dynamic_convolution.py
--- a/fairseq/modules/dynamic_crf_layer.py
+++ b/fairseq/modules/dynamic_crf_layer.py
--- a/fairseq/modules/dynamicconv_layer/__init__.py
+++ b/fairseq/modules/dynamicconv_layer/__init__.py
--- a/fairseq/modules/dynamicconv_layer/cuda_function_gen.py
+++ b/fairseq/modules/dynamicconv_layer/cuda_function_gen.py
--- a/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cpp
+++ b/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cpp
--- a/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cuh
+++ b/fairseq/modules/dynamicconv_layer/dynamicconv_cuda.cuh
--- a/fairseq/modules/dynamicconv_layer/dynamicconv_cuda_kernel.cu
+++ b/fairseq/modules/dynamicconv_layer/dynamicconv_cuda_kernel.cu
--- a/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py
+++ b/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py
--- a/fairseq/modules/dynamicconv_layer/dynamiconv_cpu.cpp
+++ b/fairseq/modules/dynamicconv_layer/dynamiconv_cpu.cpp
--- a/fairseq/modules/dynamicconv_layer/setup.py
+++ b/fairseq/modules/dynamicconv_layer/setup.py
--- a/fairseq/modules/fairseq_dropout.py
+++ b/fairseq/modules/fairseq_dropout.py
--- a/fairseq/modules/fp32_group_norm.py
+++ b/fairseq/modules/fp32_group_norm.py
--- a/fairseq/modules/gelu.py
+++ b/fairseq/modules/gelu.py
--- a/fairseq/modules/grad_multiply.py
+++ b/fairseq/modules/grad_multiply.py
--- a/fairseq/modules/gumbel_vector_quantizer.py
+++ b/fairseq/modules/gumbel_vector_quantizer.py
--- a/fairseq/modules/kmeans_vector_quantizer.py
+++ b/fairseq/modules/kmeans_vector_quantizer.py
--- a/fairseq/modules/layer_drop.py
+++ b/fairseq/modules/layer_drop.py
--- a/fairseq/modules/layer_history.py
+++ b/fairseq/modules/layer_history.py
--- a/fairseq/modules/layer_norm.py
+++ b/fairseq/modules/layer_norm.py
--- a/fairseq/modules/learned_positional_embedding.py
+++ b/fairseq/modules/learned_positional_embedding.py
--- a/fairseq/modules/lightconv_layer/__init__.py
+++ b/fairseq/modules/lightconv_layer/__init__.py
--- a/fairseq/modules/lightconv_layer/cuda_function_gen.py
+++ b/fairseq/modules/lightconv_layer/cuda_function_gen.py
--- a/fairseq/modules/lightconv_layer/lightconv_cuda.cpp
+++ b/fairseq/modules/lightconv_layer/lightconv_cuda.cpp
--- a/fairseq/modules/lightconv_layer/lightconv_cuda.cuh
+++ b/fairseq/modules/lightconv_layer/lightconv_cuda.cuh
--- a/fairseq/modules/lightconv_layer/lightconv_cuda_kernel.cu
+++ b/fairseq/modules/lightconv_layer/lightconv_cuda_kernel.cu
--- a/fairseq/modules/lightconv_layer/lightconv_layer.py
+++ b/fairseq/modules/lightconv_layer/lightconv_layer.py
--- a/fairseq/modules/lightconv_layer/setup.py
+++ b/fairseq/modules/lightconv_layer/setup.py
--- a/fairseq/modules/lightweight_convolution.py
+++ b/fairseq/modules/lightweight_convolution.py
--- a/fairseq/modules/linearized_convolution.py
+++ b/fairseq/modules/linearized_convolution.py
--- a/fairseq/modules/local_multihead_attention.py
+++ b/fairseq/modules/local_multihead_attention.py
--- a/fairseq/modules/multihead_attention.py
+++ b/fairseq/modules/multihead_attention.py
--- a/fairseq/modules/positional_embedding.py
+++ b/fairseq/modules/positional_embedding.py
--- a/fairseq/modules/quant_noise.py
+++ b/fairseq/modules/quant_noise.py
--- a/fairseq/modules/quantization/__init__.py
+++ b/fairseq/modules/quantization/__init__.py
--- a/fairseq/modules/quantization/pq/__init__.py
+++ b/fairseq/modules/quantization/pq/__init__.py
--- a/fairseq/modules/quantization/pq/em.py
+++ b/fairseq/modules/quantization/pq/em.py
--- a/fairseq/modules/quantization/pq/modules/__init__.py
+++ b/fairseq/modules/quantization/pq/modules/__init__.py
--- a/fairseq/modules/quantization/pq/modules/qconv.py
+++ b/fairseq/modules/quantization/pq/modules/qconv.py
--- a/fairseq/modules/quantization/pq/modules/qemb.py
+++ b/fairseq/modules/quantization/pq/modules/qemb.py
--- a/fairseq/modules/quantization/pq/modules/qlinear.py
+++ b/fairseq/modules/quantization/pq/modules/qlinear.py
--- a/fairseq/modules/quantization/pq/pq.py
+++ b/fairseq/modules/quantization/pq/pq.py
--- a/fairseq/modules/quantization/pq/utils.py
+++ b/fairseq/modules/quantization/pq/utils.py
--- a/fairseq/modules/quantization/quantization_options.py
+++ b/fairseq/modules/quantization/quantization_options.py
--- a/fairseq/modules/quantization/scalar/__init__.py
+++ b/fairseq/modules/quantization/scalar/__init__.py
--- a/fairseq/modules/quantization/scalar/modules/__init__.py
+++ b/fairseq/modules/quantization/scalar/modules/__init__.py
--- a/fairseq/modules/quantization/scalar/modules/qact.py
+++ b/fairseq/modules/quantization/scalar/modules/qact.py
--- a/fairseq/modules/quantization/scalar/modules/qconv.py
+++ b/fairseq/modules/quantization/scalar/modules/qconv.py
--- a/fairseq/modules/quantization/scalar/modules/qemb.py
+++ b/fairseq/modules/quantization/scalar/modules/qemb.py
--- a/fairseq/modules/quantization/scalar/modules/qlinear.py
+++ b/fairseq/modules/quantization/scalar/modules/qlinear.py
--- a/fairseq/modules/quantization/scalar/ops.py
+++ b/fairseq/modules/quantization/scalar/ops.py
--- a/fairseq/modules/quantization/scalar/utils.py
+++ b/fairseq/modules/quantization/scalar/utils.py
--- a/fairseq/modules/rel_position_multihead_attention.py
+++ b/fairseq/modules/rel_position_multihead_attention.py
--- a/fairseq/modules/relative_multihead_attention.py
+++ b/fairseq/modules/relative_multihead_attention.py
--- a/fairseq/modules/same_pad.py
+++ b/fairseq/modules/same_pad.py
--- a/fairseq/modules/scalar_bias.py
+++ b/fairseq/modules/scalar_bias.py
--- a/fairseq/modules/sinusoidal_positional_embedding.py
+++ b/fairseq/modules/sinusoidal_positional_embedding.py
--- a/fairseq/modules/sparse_multihead_attention.py
+++ b/fairseq/modules/sparse_multihead_attention.py
--- a/fairseq/modules/sparse_transformer_sentence_encoder.py
+++ b/fairseq/modules/sparse_transformer_sentence_encoder.py
--- a/fairseq/modules/sparse_transformer_sentence_encoder_layer.py
+++ b/fairseq/modules/sparse_transformer_sentence_encoder_layer.py
--- a/fairseq/modules/transformer_layer.py
+++ b/fairseq/modules/transformer_layer.py
--- a/fairseq/modules/transformer_sentence_encoder.py
+++ b/fairseq/modules/transformer_sentence_encoder.py
--- a/fairseq/modules/transformer_sentence_encoder_layer.py
+++ b/fairseq/modules/transformer_sentence_encoder_layer.py
--- a/fairseq/modules/transpose_last.py
+++ b/fairseq/modules/transpose_last.py
--- a/fairseq/modules/unfold.py
+++ b/fairseq/modules/unfold.py
--- a/fairseq/modules/vggblock.py
+++ b/fairseq/modules/vggblock.py
--- a/fairseq/nan_detector.py
+++ b/fairseq/nan_detector.py
--- a/fairseq/ngram_repeat_block.py
+++ b/fairseq/ngram_repeat_block.py
--- a/fairseq/optim/__init__.py
+++ b/fairseq/optim/__init__.py
--- a/fairseq/optim/adadelta.py
+++ b/fairseq/optim/adadelta.py
--- a/fairseq/optim/adafactor.py
+++ b/fairseq/optim/adafactor.py
--- a/fairseq/optim/adagrad.py
+++ b/fairseq/optim/adagrad.py
--- a/fairseq/optim/adam.py
+++ b/fairseq/optim/adam.py
--- a/fairseq/optim/adamax.py
+++ b/fairseq/optim/adamax.py
--- a/fairseq/optim/bmuf.py
+++ b/fairseq/optim/bmuf.py
--- a/fairseq/optim/composite.py
+++ b/fairseq/optim/composite.py
--- a/fairseq/optim/cpu_adam.py
+++ b/fairseq/optim/cpu_adam.py
--- a/fairseq/optim/dynamic_loss_scaler.py
+++ b/fairseq/optim/dynamic_loss_scaler.py
--- a/fairseq/optim/fairseq_optimizer.py
+++ b/fairseq/optim/fairseq_optimizer.py
--- a/fairseq/optim/fp16_optimizer.py
+++ b/fairseq/optim/fp16_optimizer.py
--- a/fairseq/optim/fused_adam.py
+++ b/fairseq/optim/fused_adam.py
--- a/fairseq/optim/fused_lamb.py
+++ b/fairseq/optim/fused_lamb.py
--- a/fairseq/optim/lr_scheduler/__init__.py
+++ b/fairseq/optim/lr_scheduler/__init__.py
--- a/fairseq/optim/lr_scheduler/cosine_lr_scheduler.py
+++ b/fairseq/optim/lr_scheduler/cosine_lr_scheduler.py
--- a/fairseq/optim/lr_scheduler/fairseq_lr_scheduler.py
+++ b/fairseq/optim/lr_scheduler/fairseq_lr_scheduler.py
--- a/fairseq/optim/lr_scheduler/fixed_schedule.py
+++ b/fairseq/optim/lr_scheduler/fixed_schedule.py
--- a/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py
+++ b/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py
--- a/fairseq/optim/lr_scheduler/manual_lr_scheduler.py
+++ b/fairseq/optim/lr_scheduler/manual_lr_scheduler.py
--- a/fairseq/optim/lr_scheduler/pass_through.py
+++ b/fairseq/optim/lr_scheduler/pass_through.py
--- a/fairseq/optim/lr_scheduler/polynomial_decay_schedule.py
+++ b/fairseq/optim/lr_scheduler/polynomial_decay_schedule.py
--- a/fairseq/optim/lr_scheduler/reduce_lr_on_plateau.py
+++ b/fairseq/optim/lr_scheduler/reduce_lr_on_plateau.py
--- a/fairseq/optim/lr_scheduler/tri_stage_lr_scheduler.py
+++ b/fairseq/optim/lr_scheduler/tri_stage_lr_scheduler.py
--- a/fairseq/optim/lr_scheduler/triangular_lr_scheduler.py
+++ b/fairseq/optim/lr_scheduler/triangular_lr_scheduler.py
--- a/fairseq/optim/nag.py
+++ b/fairseq/optim/nag.py
--- a/fairseq/optim/sgd.py
+++ b/fairseq/optim/sgd.py
--- a/fairseq/optim/shard.py
+++ b/fairseq/optim/shard.py
--- a/fairseq/options.py
+++ b/fairseq/options.py
--- a/fairseq/pdb.py
+++ b/fairseq/pdb.py
--- a/fairseq/quantization_utils.py
+++ b/fairseq/quantization_utils.py
--- a/fairseq/registry.py
+++ b/fairseq/registry.py
--- a/fairseq/scoring/__init__.py
+++ b/fairseq/scoring/__init__.py
--- a/fairseq/scoring/bleu.py
+++ b/fairseq/scoring/bleu.py
--- a/fairseq/scoring/chrf.py
+++ b/fairseq/scoring/chrf.py
--- a/fairseq/scoring/tokenizer.py
+++ b/fairseq/scoring/tokenizer.py
--- a/fairseq/scoring/wer.py
+++ b/fairseq/scoring/wer.py
--- a/fairseq/search.py
+++ b/fairseq/search.py
--- a/fairseq/sequence_generator.py
+++ b/fairseq/sequence_generator.py
--- a/fairseq/sequence_scorer.py
+++ b/fairseq/sequence_scorer.py
--- a/fairseq/tasks/__init__.py
+++ b/fairseq/tasks/__init__.py
--- a/fairseq/tasks/audio_pretraining.py
+++ b/fairseq/tasks/audio_pretraining.py
--- a/fairseq/tasks/cross_lingual_lm.py
+++ b/fairseq/tasks/cross_lingual_lm.py
--- a/fairseq/tasks/denoising.py
+++ b/fairseq/tasks/denoising.py
--- a/fairseq/tasks/fairseq_task.py
+++ b/fairseq/tasks/fairseq_task.py
--- a/fairseq/tasks/language_modeling.py
+++ b/fairseq/tasks/language_modeling.py
--- a/fairseq/tasks/legacy_masked_lm.py
+++ b/fairseq/tasks/legacy_masked_lm.py
--- a/fairseq/tasks/masked_lm.py
+++ b/fairseq/tasks/masked_lm.py
--- a/fairseq/tasks/multilingual_denoising.py
+++ b/fairseq/tasks/multilingual_denoising.py
--- a/fairseq/tasks/multilingual_masked_lm.py
+++ b/fairseq/tasks/multilingual_masked_lm.py
--- a/fairseq/tasks/multilingual_translation.py
+++ b/fairseq/tasks/multilingual_translation.py
--- a/fairseq/tasks/semisupervised_translation.py
+++ b/fairseq/tasks/semisupervised_translation.py
--- a/fairseq/tasks/sentence_prediction.py
+++ b/fairseq/tasks/sentence_prediction.py
--- a/fairseq/tasks/sentence_ranking.py
+++ b/fairseq/tasks/sentence_ranking.py
--- a/fairseq/tasks/speech_to_text.py
+++ b/fairseq/tasks/speech_to_text.py
--- a/fairseq/tasks/translation.py
+++ b/fairseq/tasks/translation.py
--- a/fairseq/tasks/translation_from_pretrained_bart.py
+++ b/fairseq/tasks/translation_from_pretrained_bart.py
--- a/fairseq/tasks/translation_from_pretrained_xlm.py
+++ b/fairseq/tasks/translation_from_pretrained_xlm.py
--- a/fairseq/tasks/translation_lev.py
+++ b/fairseq/tasks/translation_lev.py
--- a/fairseq/tasks/translation_multi_simple_epoch.py
+++ b/fairseq/tasks/translation_multi_simple_epoch.py
--- a/fairseq/tasks/translation_with_tokenizer.py
+++ b/fairseq/tasks/translation_with_tokenizer.py
--- a/fairseq/token_generation_constraints.py
+++ b/fairseq/token_generation_constraints.py
--- a/fairseq/tokenizer.py
+++ b/fairseq/tokenizer.py
--- a/fairseq/trainer.py
+++ b/fairseq/trainer.py
--- a/fairseq/utils.py
+++ b/fairseq/utils.py
--- a/fairseq/version.txt
+++ b/fairseq/version.txt
--- a/fairseq_cli/__init__.py
+++ b/fairseq_cli/__init__.py
--- a/fairseq_cli/eval_lm.py
+++ b/fairseq_cli/eval_lm.py
--- a/fairseq_cli/generate.py
+++ b/fairseq_cli/generate.py
--- a/fairseq_cli/hydra_train.py
+++ b/fairseq_cli/hydra_train.py
--- a/fairseq_cli/interactive.py
+++ b/fairseq_cli/interactive.py
--- a/fairseq_cli/preprocess.py
+++ b/fairseq_cli/preprocess.py
--- a/fairseq_cli/score.py
+++ b/fairseq_cli/score.py
--- a/fairseq_cli/train.py
+++ b/fairseq_cli/train.py
--- a/fairseq_cli/validate.py
+++ b/fairseq_cli/validate.py
--- a/hubconf.py
+++ b/hubconf.py
--- a/pyproject.toml
+++ b/pyproject.toml
--- a/scripts/__init__.py
+++ b/scripts/__init__.py
--- a/scripts/average_checkpoints.py
+++ b/scripts/average_checkpoints.py
--- a/scripts/build_sym_alignment.py
+++ b/scripts/build_sym_alignment.py
--- a/scripts/compare_namespaces.py
+++ b/scripts/compare_namespaces.py
--- a/scripts/compound_split_bleu.sh
+++ b/scripts/compound_split_bleu.sh
--- a/scripts/constraints/extract.py
+++ b/scripts/constraints/extract.py
--- a/scripts/constraints/validate.py
+++ b/scripts/constraints/validate.py
--- a/scripts/convert_dictionary.lua
+++ b/scripts/convert_dictionary.lua
--- a/scripts/convert_model.lua
+++ b/scripts/convert_model.lua
--- a/scripts/count_docs.py
+++ b/scripts/count_docs.py
--- a/scripts/postprocessing.py
+++ b/scripts/postprocessing.py
--- a/scripts/read_binarized.py
+++ b/scripts/read_binarized.py
--- a/scripts/rm_pt.py
+++ b/scripts/rm_pt.py
--- a/scripts/sacrebleu.sh
+++ b/scripts/sacrebleu.sh
--- a/scripts/shard_docs.py
+++ b/scripts/shard_docs.py
--- a/scripts/split_train_valid_docs.py
+++ b/scripts/split_train_valid_docs.py
--- a/scripts/spm_decode.py
+++ b/scripts/spm_decode.py
--- a/scripts/spm_encode.py
+++ b/scripts/spm_encode.py
--- a/scripts/spm_train.py
+++ b/scripts/spm_train.py
--- a/setup.py
+++ b/setup.py
--- a/tests/__init__.py
+++ b/tests/__init__.py
--- a/tests/distributed/__init__.py
+++ b/tests/distributed/__init__.py
--- a/tests/distributed/test_bmuf.py
+++ b/tests/distributed/test_bmuf.py
--- a/tests/distributed/test_distributed_timeout_wrapper.py
+++ b/tests/distributed/test_distributed_timeout_wrapper.py
--- a/tests/distributed/test_module_proxy_wrapper.py
+++ b/tests/distributed/test_module_proxy_wrapper.py
--- a/tests/distributed/test_utils.py
+++ b/tests/distributed/test_utils.py
--- a/tests/distributed/utils.py
+++ b/tests/distributed/utils.py
--- a/tests/gpu/__init__.py
+++ b/tests/gpu/__init__.py
--- a/tests/gpu/test_binaries_gpu.py
+++ b/tests/gpu/test_binaries_gpu.py
--- a/tests/gpu/transformer_quantization_config.yaml
+++ b/tests/gpu/transformer_quantization_config.yaml
--- a/tests/speech_recognition/__init__.py
+++ b/tests/speech_recognition/__init__.py
--- a/tests/speech_recognition/asr_test_base.py
+++ b/tests/speech_recognition/asr_test_base.py
--- a/tests/speech_recognition/test_collaters.py
+++ b/tests/speech_recognition/test_collaters.py
--- a/tests/speech_recognition/test_cross_entropy.py
+++ b/tests/speech_recognition/test_cross_entropy.py
--- a/tests/speech_recognition/test_data_utils.py
+++ b/tests/speech_recognition/test_data_utils.py
--- a/tests/speech_recognition/test_vggtransformer.py
+++ b/tests/speech_recognition/test_vggtransformer.py
--- a/tests/test_activation_checkpointing.py
+++ b/tests/test_activation_checkpointing.py
--- a/tests/test_average_checkpoints.py
+++ b/tests/test_average_checkpoints.py
--- a/tests/test_backtranslation_dataset.py
+++ b/tests/test_backtranslation_dataset.py
--- a/tests/test_binaries.py
+++ b/tests/test_binaries.py
--- a/tests/test_character_token_embedder.py
+++ b/tests/test_character_token_embedder.py
--- a/tests/test_checkpoint_utils.py
+++ b/tests/test_checkpoint_utils.py
--- a/tests/test_concat_dataset.py
+++ b/tests/test_concat_dataset.py
--- a/tests/test_constraints.py
+++ b/tests/test_constraints.py
--- a/tests/test_convtbc.py
+++ b/tests/test_convtbc.py
--- a/tests/test_data_utils.py
+++ b/tests/test_data_utils.py
--- a/tests/test_dataset.py
+++ b/tests/test_dataset.py
--- a/tests/test_dictionary.py
+++ b/tests/test_dictionary.py
--- a/tests/test_export.py
+++ b/tests/test_export.py
--- a/tests/test_file_io.py
+++ b/tests/test_file_io.py
--- a/tests/test_fp16_optimizer.py
+++ b/tests/test_fp16_optimizer.py
--- a/tests/test_inference_dropout.py
+++ b/tests/test_inference_dropout.py
--- a/tests/test_iopath.py
+++ b/tests/test_iopath.py
--- a/tests/test_iterators.py
+++ b/tests/test_iterators.py
--- a/tests/test_label_smoothing.py
+++ b/tests/test_label_smoothing.py
--- a/tests/test_lm_context_window.py
+++ b/tests/test_lm_context_window.py
--- a/tests/test_lstm_jitable.py
+++ b/tests/test_lstm_jitable.py
--- a/tests/test_memory_efficient_fp16.py
+++ b/tests/test_memory_efficient_fp16.py
--- a/tests/test_metrics.py
+++ b/tests/test_metrics.py
--- a/tests/test_multi_corpus_dataset.py
+++ b/tests/test_multi_corpus_dataset.py
--- a/tests/test_multi_corpus_sampled_dataset.py
+++ b/tests/test_multi_corpus_sampled_dataset.py
--- a/tests/test_multihead_attention.py
+++ b/tests/test_multihead_attention.py
--- a/tests/test_noising.py
+++ b/tests/test_noising.py
--- a/tests/test_reproducibility.py
+++ b/tests/test_reproducibility.py
--- a/tests/test_resampling_dataset.py
+++ b/tests/test_resampling_dataset.py
--- a/tests/test_sequence_generator.py
+++ b/tests/test_sequence_generator.py
--- a/tests/test_sequence_scorer.py
+++ b/tests/test_sequence_scorer.py
--- a/tests/test_sparse_multihead_attention.py
+++ b/tests/test_sparse_multihead_attention.py
--- a/tests/test_token_block_dataset.py
+++ b/tests/test_token_block_dataset.py
--- a/tests/test_train.py
+++ b/tests/test_train.py
--- a/tests/test_utils.py
+++ b/tests/test_utils.py
--- a/tests/utils.py
+++ b/tests/utils.py
--- a/train.py
+++ b/train.py