Skip to main content

Showing 1–50 of 81 results for author: Sennrich, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.00397  [pdf, other

    cs.CL

    An Analysis of BPE Vocabulary Trimming in Neural Machine Translation

    Authors: Marco Cognetta, Tatsuya Hiraoka, Naoaki Okazaki, Rico Sennrich, Yuval Pinter

    Abstract: We explore threshold vocabulary trimming in Byte-Pair Encoding subword tokenization, a postprocessing step that replaces rare subwords with their component subwords. The technique is available in popular tokenization libraries but has not been subjected to rigorous scientific scrutiny. While the removal of rare subwords is suggested as best practice in machine translation implementations, both as… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 15 pages

  2. arXiv:2402.04251  [pdf, other

    cs.CL

    Linear-time Minimum Bayes Risk Decoding with Reference Aggregation

    Authors: Jannis Vamvas, Rico Sennrich

    Abstract: Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations, but is expensive, even if a sampling-based approximation is used. Besides requiring a large number of sampled sequences, it requires the pairwise calculation of a utility metric, which has quadratic complexity. In this paper, we propose to approximate pairwise metric… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  3. arXiv:2401.16313  [pdf, other

    cs.CL

    Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets

    Authors: Nikita Moghe, Arnisa Fazla, Chantal Amrhein, Tom Kocmi, Mark Steedman, Alexandra Birch, Rico Sennrich, Liane Guillou

    Abstract: Recent machine translation (MT) metrics calibrate their effectiveness by correlating with human judgement but without any insights about their behaviour across different error types. Challenge sets are used to probe specific dimensions of metric behaviour but there are very few such datasets and they either focus on a limited number of phenomena or a limited number of language pairs. We introduce… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.15615

  4. arXiv:2401.14400  [pdf, other

    cs.CL

    Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

    Authors: Jannis Vamvas, Noëmi Aepli, Rico Sennrich

    Abstract: Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of ful… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: First Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)

  5. arXiv:2401.06769  [pdf, other

    cs.CL

    Machine Translation Models are Zero-Shot Detectors of Translation Direction

    Authors: Michelle Wastl, Jannis Vamvas, Rico Sennrich

    Abstract: Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations. In this work, we explore an unsupervised approach to translation direction detection based on the simple hypothesis that… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  6. arXiv:2312.12683  [pdf, other

    cs.CL

    Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?

    Authors: Tannon Kew, Florian Schottmann, Rico Sennrich

    Abstract: The vast majority of today's large language models are English-centric, having been pretrained predominantly on English text. Yet, in order to meet user expectations, models need to be able to respond appropriately in multiple languages once deployed in downstream applications. Given limited exposure to other languages during pretraining, cross-lingual transfer is important for achieving decent pe… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  7. arXiv:2312.00536  [pdf, other

    cs.CL

    Trained MT Metrics Learn to Cope with Machine-translated References

    Authors: Jannis Vamvas, Tobias Domhan, Sony Trenous, Rico Sennrich, Eva Hasler

    Abstract: Neural metrics trained on human evaluations of MT tend to correlate well with human judgments, but their behavior is not fully understood. In this paper, we perform a controlled experiment and compare a baseline metric that has not been trained on human evaluations (Prism) to a trained version of the same metric (Prism+FT). Surprisingly, we find that Prism+FT becomes more robust to machine-transla… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: WMT 2023

  8. arXiv:2311.16865  [pdf, other

    cs.CL

    A Benchmark for Evaluating Machine Translation Metrics on Dialects Without Standard Orthography

    Authors: Noëmi Aepli, Chantal Amrhein, Florian Schottmann, Rico Sennrich

    Abstract: For sensible progress in natural language processing, it is important that we are aware of the limitations of the evaluation metrics we use. In this work, we evaluate how robust metrics are to non-standardized dialects, i.e. spelling differences in language varieties that do not have a standard orthography. To investigate this, we collect a dataset of human translations and human judgments for aut… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: WMT 2023 Research Paper

    ACM Class: I.2.7

  9. arXiv:2311.07439  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models

    Authors: Alireza Mohammadshahi, Jannis Vamvas, Rico Sennrich

    Abstract: Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions. Pivoting via high-resource languages remains a strong strategy for low-resource directions, and in this paper we revisit ways of pivoting through multiple languages. Previous work has used… ▽ More

    Submitted 14 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

  10. arXiv:2309.07098  [pdf, other

    cs.CL

    Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

    Authors: Rico Sennrich, Jannis Vamvas, Alireza Mohammadshahi

    Abstract: Hallucinations and off-target translation remain unsolved problems in MT, especially for low-resource languages and massively multilingual models. In this paper, we introduce two related methods to mitigate these failure cases with a modified decoding objective, without either requiring retraining or external models. In source-contrastive decoding, we search for a translation that is probable give… ▽ More

    Submitted 29 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: EACL 2024

  11. arXiv:2307.15703  [pdf, other

    cs.CL cs.AI cs.LG

    Uncertainty in Natural Language Generation: From Theory to Applications

    Authors: Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz

    Abstract: Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  12. arXiv:2305.13303  [pdf, other

    cs.CL

    Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents

    Authors: Jannis Vamvas, Rico Sennrich

    Abstract: Automatically highlighting words that cause semantic differences between two documents could be useful for a wide range of applications. We formulate recognizing semantic differences (RSD) as a token-level regression task and study three unsupervised approaches that rely on a masked language model. To assess the approaches, we begin with basic English sentences and gradually move to more complex,… ▽ More

    Submitted 20 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  13. arXiv:2305.11140  [pdf, other

    cs.CL

    Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

    Authors: Chantal Amrhein, Florian Schottmann, Rico Sennrich, Samuel Läubli

    Abstract: Natural language generation models reproduce and often amplify the biases present in their training data. Previous research explored using sequence-to-sequence rewriting models to transform biased model outputs (or original texts) into more gender-fair language by creating pseudo training data through linguistic rules. However, this approach is not practical for languages with more complex morphol… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: accepted to ACL 2023

    ACM Class: I.2.7

  14. arXiv:2305.08414  [pdf, other

    cs.CL cs.AI

    What's the Meaning of Superhuman Performance in Today's NLU?

    Authors: Simone Tedeschi, Johan Bos, Thierry Declerck, Jan Hajic, Daniel Hershcovich, Eduard H. Hovy, Alexander Koller, Simon Krek, Steven Schockaert, Rico Sennrich, Ekaterina Shutova, Roberto Navigli

    Abstract: In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 9 pages, long paper at ACL 2023 proceedings

  15. arXiv:2305.01778  [pdf, other

    cs.CL cs.CV

    SLTUNET: A Simple Unified Model for Sign Language Translation

    Authors: Biao Zhang, Mathias Müller, Rico Sennrich

    Abstract: Despite recent successes with neural models for sign language translation (SLT), translation quality still lags behind spoken languages because of the data scarcity and modality gap between sign video and text. To address both problems, we investigate strategies for cross-modality representation sharing for SLT. We propose SLTUNET, a simple unified neural model designed to support multiple SLTrela… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: ICLR 2023

  16. arXiv:2303.13310  [pdf, other

    cs.CL

    SwissBERT: The Multilingual Language Model for Switzerland

    Authors: Jannis Vamvas, Johannes Graën, Rico Sennrich

    Abstract: We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland -- German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous model… ▽ More

    Submitted 16 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: SwissText 2023 [v3: Changed template because the proceedings moved to a different publisher. Same content.]

  17. arXiv:2302.10871  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

    Authors: Biao Zhang, Barry Haddow, Rico Sennrich

    Abstract: For end-to-end speech translation, regularizing the encoder with the Connectionist Temporal Classification (CTC) objective using the source transcript or target translation as labels can greatly improve quality metrics. However, CTC demands an extra prediction layer over the vocabulary space, bringing in nonnegligible model parameters and computational overheads, although this layer is typically n… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: EACL 2023

  18. arXiv:2209.02982  [pdf, other

    cs.CL

    Improving the Cross-Lingual Generalisation in Visual Question Answering

    Authors: Farhad Nooralahzadeh, Rico Sennrich

    Abstract: While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the po… ▽ More

    Submitted 30 November, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: This work is accepted by the AAAI 2023

  19. arXiv:2207.11717  [pdf, other

    cs.LG

    A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues

    Authors: Jason Armitage, Leonardo Impett, Rico Sennrich

    Abstract: In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route. Artificial agents in outdoor Vision-and-Language Navigation (VLN) are also confronted with detecting supervisory signal on environment features and location in inputs. To boost the prominence of relevant features in transformer-based architectures without costly preprocessing… ▽ More

    Submitted 18 November, 2022; v1 submitted 24 July, 2022; originally announced July 2022.

    Comments: Accepted to WACV 2023

    ACM Class: I.2

  20. arXiv:2206.04571  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Revisiting End-to-End Speech-to-Text Translation From Scratch

    Authors: Biao Zhang, Barry Haddow, Rico Sennrich

    Abstract: End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or decoder using source transcripts via speech recognition or text translation tasks, without which translation performance drops substantially. However, transcripts are not always available, and how significant such pretraining is for E2E ST has rarely been studied in the literature. In this paper, we re… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: ICML

  21. arXiv:2205.15960  [pdf, other

    cs.CL

    NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

    Authors: Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder

    Abstract: Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing re… ▽ More

    Submitted 12 April, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: EACL 2023

  22. arXiv:2204.13692  [pdf, other

    cs.CL

    NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures

    Authors: Jannis Vamvas, Rico Sennrich

    Abstract: Being able to rank the similarity of short text segments is an interesting bonus feature of neural machine translation. Translation-based similarity measures include direct and pivot translation probability, as well as translation cross-likelihood, which has not been studied so far. We analyze these measures in the common framework of multilingual NMT, releasing the NMTScore library (available at… ▽ More

    Submitted 19 October, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Findings of EMNLP 2022

  23. arXiv:2203.01927  [pdf, other

    cs.CL

    As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive Conditioning

    Authors: Jannis Vamvas, Rico Sennrich

    Abstract: Omission and addition of content is a typical issue in neural machine translation. We propose a method for detecting such phenomena with off-the-shelf translation models. Using contrastive conditioning, we compare the likelihood of a full sequence under a translation model to the likelihood of its parts, given the corresponding source or target sequence. This allows to pinpoint superfluous words i… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  24. arXiv:2202.05148  [pdf, other

    cs.CL

    Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET

    Authors: Chantal Amrhein, Rico Sennrich

    Abstract: Neural metrics have achieved impressive correlation with human judgements in the evaluation of machine translation systems, but before we can safely optimise towards such metrics, we should be aware of (and ideally eliminate) biases toward bad translations that receive high scores. Our experiments show that sample-based Minimum Bayes Risk decoding can be used to explore and quantify such weaknesse… ▽ More

    Submitted 26 September, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: accepted to AACL-IJCNLP 2022

    ACM Class: I.2.7

  25. arXiv:2110.13229  [pdf, other

    cs.LG cs.CL

    Distributionally Robust Recurrent Decoders with Random Network Distillation

    Authors: Antonio Valerio Miceli-Barone, Alexandra Birch, Rico Sennrich

    Abstract: Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text. This has been attributed to "shortcut learning": relying on weak correlations over arbitrary large contexts. We propose a method… ▽ More

    Submitted 24 April, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: 8 pages, 1 figure

  26. arXiv:2109.07465  [pdf, other

    cs.CL

    On the Limits of Minimal Pairs in Contrastive Evaluation

    Authors: Jannis Vamvas, Rico Sennrich

    Abstract: Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secon… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: BlackboxNLP 2021

  27. arXiv:2109.06772  [pdf, other

    cs.CL

    Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise

    Authors: Noëmi Aepli, Rico Sennrich

    Abstract: Cross-lingual transfer between a high-resource language and its dialects or closely related language varieties should be facilitated by their similarity. However, current approaches that operate in the embedding space do not take surface similarity into account. This work presents a simple yet effective strategy to imrove cross-lingual transfer between closely related varieties. We propose to augm… ▽ More

    Submitted 11 March, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

    Comments: ACL 2022

    ACM Class: I.2.7

  28. arXiv:2109.03415  [pdf, other

    cs.CL

    Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

    Authors: Jiaoda Li, Duygu Ataman, Rico Sennrich

    Abstract: Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available. However, recent studies have also shown that the performance of MMT models is only marginally impacted when the associated image is replaced with an unrelated image or noise, which suggests that the visual context might not be ex… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  29. arXiv:2109.01396  [pdf, other

    cs.CL

    Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT

    Authors: Elena Voita, Rico Sennrich, Ivan Titov

    Abstract: Differently from the traditional statistical MT that decomposes the translation task into distinct separately learned components, neural machine translation uses a single neural network to model the entire translation process. Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training, and how this mirr… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  30. arXiv:2109.01100  [pdf, other

    cs.CL

    How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology?

    Authors: Chantal Amrhein, Rico Sennrich

    Abstract: Data-driven subword segmentation has become the default strategy for open-vocabulary machine translation and other NLP tasks, but may not be sufficiently generic for optimal learning of non-concatenative morphology. We design a test suite to evaluate segmentation strategies on different types of morphological phenomena in a controlled, semi-synthetic setting. In our experiments, we compare how wel… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021

    ACM Class: I.2.7

  31. arXiv:2107.12203  [pdf, other

    cs.CL

    Revisiting Negation in Neural Machine Translation

    Authors: Gongbo Tang, Philipp Rönchen, Rico Sennrich, Joakim Nivre

    Abstract: In this paper, we evaluate the translation of negation both automatically and manually, in English--German (EN--DE) and English--Chinese (EN--ZH). We show that the ability of neural machine translation (NMT) models to translate negation has improved with deeper and more advanced networks, although the performance varies between language pairs and translation directions. The accuracy of manual eval… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: To appear at TACL and to be presented at ACL 2021. Authors' final version

  32. arXiv:2105.08504  [pdf, other

    cs.CL cs.LG

    Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

    Authors: Mathias Müller, Rico Sennrich

    Abstract: Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search -- the de facto standard inference algorithm in NMT -- and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decodin… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: V1: ACL 2021 camera-ready

  33. arXiv:2104.07012  [pdf, other

    cs.CL cs.LG

    Sparse Attention with Linear Units

    Authors: Biao Zhang, Ivan Titov, Rico Sennrich

    Abstract: Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing the softmax function in the attention with its sparse variants. In this work, we introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a ReLU, and show that sparsity naturally emerges from such a formulation. Training stability is achieved with… ▽ More

    Submitted 6 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: EMNLP2021, code is available at https://github.com/bzhangGo/zero

  34. arXiv:2104.03945  [pdf, other

    cs.CL

    On Biasing Transformer Attention Towards Monotonicity

    Authors: Annette Rios, Chantal Amrhein, Noëmi Aepli, Rico Sennrich

    Abstract: Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it o… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: To be published in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021)

  35. arXiv:2103.11878  [pdf, other

    cs.CL cs.AI

    BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation

    Authors: Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, Ming Zhou

    Abstract: Standard automatic metrics, e.g. BLEU, are not reliable for document-level MT evaluation. They can neither distinguish document-level improvements in translation quality from sentence-level ones, nor identify the discourse phenomena that cause context-agnostic translations. This paper introduces a novel automatic metric BlonDe to widen the scope of automatic MT evaluation from sentence to document… ▽ More

    Submitted 5 July, 2022; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: 9 pages, accepted to NAACL 2022

  36. arXiv:2011.05978  [pdf, other

    cs.CL cs.HC

    The Impact of Text Presentation on Translator Performance

    Authors: Samuel Läubli, Patrick Simianer, Joern Wuebker, Geza Kovacs, Rico Sennrich, Spence Green

    Abstract: Widely used computer-aided translation (CAT) tools divide documents into segments such as sentences and arrange them in a side-by-side, spreadsheet-like view. We present the first controlled evaluation of these design choices on translator performance, measuring speed and accuracy in three experimental text processing tasks. We find significant evidence that sentence-by-sentence presentation enabl… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Accepted for publication in Target

  37. arXiv:2011.03469  [pdf, other

    cs.CL

    Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

    Authors: Gongbo Tang, Rico Sennrich, Joakim Nivre

    Abstract: Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morpholog… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

    Comments: accepted by COLING 2020, camera-ready version

  38. arXiv:2011.01846  [pdf, other

    cs.CL

    Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

    Authors: Denis Emelin, Ivan Titov, Rico Sennrich

    Abstract: Word sense disambiguation is a well-known source of translation errors in NMT. We posit that some of the incorrect disambiguation choices are due to models' over-reliance on dataset artifacts found in training data, specifically superficial word co-occurrences, rather than a deeper understanding of the source text. We introduce a method for the prediction of disambiguation errors based on statisti… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to EMNLP 2020

  39. arXiv:2011.01703  [pdf, other

    cs.CL

    Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

    Authors: Annette Rios, Mathias Müller, Rico Sennrich

    Abstract: Zero-shot neural machine translation is an attractive goal because of the high cost of obtaining data and building translation systems for new translation directions. However, previous papers have reported mixed success in zero-shot translation. It is hard to predict in which settings it will be effective, and what limits performance compared to a fully supervised system. In this paper, we investi… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted at WMT 2020

  40. arXiv:2010.14481  [pdf, other

    cs.CL cs.LG

    Fast Interleaved Bidirectional Sequence Generation

    Authors: Biao Zhang, Ivan Titov, Rico Sennrich

    Abstract: Independence assumptions during sequence generation can speed up inference, but parallel generation of highly inter-dependent tokens comes at a cost in quality. Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and righ… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: WMT2020, source code is at https://github.com/bzhangGo/zero/tree/master/docs/interleaved_bidirectional_transformer

  41. arXiv:2010.10907  [pdf, other

    cs.CL

    Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation

    Authors: Elena Voita, Rico Sennrich, Ivan Titov

    Abstract: In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argu… ▽ More

    Submitted 25 June, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: ACL 2021 (more accurate results with the improved LRP code)

  42. arXiv:2010.08518  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Adaptive Feature Selection for End-to-End Speech Translation

    Authors: Biao Zhang, Ivan Titov, Barry Haddow, Rico Sennrich

    Abstract: Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. We first pre-train an ASR encoder and apply AFS to dynamically estimate the importance of each encoded speech feature to SR. A S… ▽ More

    Submitted 20 October, 2020; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: EMNLP2020 Findings; source code is at https://github.com/bzhangGo/zero

  43. arXiv:2009.14824  [pdf, other

    cs.CL

    On Romanization for Model Transfer Between Scripts in Neural Machine Translation

    Authors: Chantal Amrhein, Rico Sennrich

    Abstract: Transfer learning is a popular strategy to improve the quality of low-resource machine translation. For an optimal transfer of the embedding layer, the child and parent model should share a substantial part of the vocabulary. This is not the case when transferring to languages with a different script. We explore the benefit of romanization in this scenario. Our results show that romanization entai… ▽ More

    Submitted 30 September, 2020; originally announced September 2020.

    Comments: accepted at Findings of EMNLP 2020

  44. arXiv:2005.03642  [pdf, other

    cs.CL

    On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation

    Authors: Chaojun Wang, Rico Sennrich

    Abstract: The standard training algorithm in neural machine translation (NMT) suffers from exposure bias, and alternative algorithms have been proposed to mitigate this. However, the practical impact of exposure bias is under debate. In this paper, we link exposure bias to another well-known problem in NMT, namely the tendency to generate hallucinations under domain shift. In experiments on three datasets w… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  45. arXiv:2004.11867  [pdf, other

    cs.CL

    Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

    Authors: Biao Zhang, Philip Williams, Ivan Titov, Rico Sennrich

    Abstract: Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via langua… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Comments: ACL2020

  46. arXiv:2004.11854  [pdf, other

    cs.CL

    On Sparsifying Encoder Outputs in Sequence-to-Sequence Models

    Authors: Biao Zhang, Ivan Titov, Rico Sennrich

    Abstract: Sequence-to-sequence models usually transfer all encoder outputs to the decoder for generation. In this work, by contrast, we hypothesize that these encoder outputs can be compressed to shorten the sequence delivered for decoding. We take Transformer as the testbed and introduce a layer of stochastic gates in-between the encoder and the decoder. The gates are regularized using the expected value o… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

  47. A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

    Authors: Samuel Läubli, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, Antonio Toral

    Abstract: The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

    Journal ref: Journal of Artificial Intelligence Research 67 (2020) 653-672

  48. arXiv:2003.08385  [pdf, other

    cs.CL

    X-Stance: A Multilingual Multi-Target Dataset for Stance Detection

    Authors: Jannis Vamvas, Rico Sennrich

    Abstract: We extract a large-scale stance detection dataset from comments written by candidates of elections in Switzerland. The dataset consists of German, French and Italian text, allowing for a cross-lingual evaluation of stance detection. It contains 67 000 comments on more than 150 political issues (targets). Unlike stance detection models that have specific target issues, we use the dataset to train a… ▽ More

    Submitted 10 June, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: SwissText + KONVENS 2020. Data and code are available at https://github.com/ZurichNLP/xstance

  49. arXiv:1911.03362  [pdf, other

    cs.CL cs.LG stat.ML

    Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation

    Authors: Nikolay Bogoychev, Rico Sennrich

    Abstract: The quality of neural machine translation can be improved by leveraging additional monolingual resources to create synthetic training data. Source-side monolingual data can be (forward-)translated into the target language for self-training; target-side monolingual data can be back-translated. It has been widely reported that back-translation delivers superior results, but could this be due to arte… ▽ More

    Submitted 3 October, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

  50. arXiv:1911.03109  [pdf, other

    cs.CL

    Domain Robustness in Neural Machine Translation

    Authors: Mathias Müller, Annette Rios, Rico Sennrich

    Abstract: Translating text that diverges from the training domain is a key challenge for machine translation. Domain robustness---the generalization of models to unseen test domains---is low for both statistical (SMT) and neural machine translation (NMT). In this paper, we study the performance of SMT and NMT models on out-of-domain test sets. We find that in unknown domains, SMT and NMT suffer from very di… ▽ More

    Submitted 24 September, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: V2: AMTA camera-ready