Porting fairseq wmt19 translation system to transformersA...

Porting fairseq wmt19 translation system to transformers
A guest blog post by Stas Bekman
This article is an attempt to document how fairseq wmt19 translation system was ported to transformers.

I was looking for some interesting project to work on and Sam Shleifer suggested I work on porting a high quality translator.

I read the short paper: Facebook FAIR's WMT19 News Translation Task Submission that describes the original system and decided to give it a try.

Initially, I had no idea how to approach this complex project and Sam helped me to break it down to smaller tasks, which was of a great help.

I chose to work with the pre-trained en-ru/ru-en models during porting as I speak both languages. It'd have been much more difficult to work with de-en/en-de pairs as I don't speak German, and being able to evaluate the translation quality by just reading and making sense of the outputs at the advanced stages of the porting process saved me a lot of time.

Also, as I did the initial porting with the en-ru/ru-en models, I was totally unaware that the de-en/en-de models used a merged vocabulary, whereas the former used 2 separate vocabularies of different sizes. So once I did the more complicated work of supporting 2 separate vocabularies, it was trivial to get the merged vocabulary to work.