Amazon Alexa AI researchers recently unveiled Alexa Teacher Models (AlexaTM 20B) that beats GPT-3 on NLP benchmarks. The 20-billion-parameter sequence-to-sequence (seq2seq) language model showcases SOTA capabilities on few-shot learning. The model is yet to be released publicly.
Check out the GitHub repository here.
Unlike OpenAI’s GPT-3 or Google’s PaLM, which are decoder-only models, AlexaTM 20B is a seq2seq model that contains an encoder and a decoder allowing better performance on machine translation (MT) and summarization.
Sequence-to-sequence model is a special class of recurrent neural network architecture, typically used to solve complex language problems, including machine translation, creating chatbots, question answering, text summarisation, etc.
With 1/8 number of parameters, the new language model by Amazon outperformed GPT-3 on SQuADv2 and SuperGLUE benchmarks. The multilingual model achieves excellent performance on few-shot MT tasks, even on low-resource languages, on the Flores-101 dataset.
On several other benchmarks like MLSum, AlexaTM outperformed all other models for 1-shot summarization in Spanish, German, French and most language pairs on 1-shot MT tasks. On low-resourced languages like Tamil, Telugu, and Marathi, the improvement was significant. On English-based languages, the model outperformed GPT-3 on MT tasks but came second to the larger PaLM model.
Saleh Soltan, senior applied scientist on Amazon, said that, “the proposed style of pretraining enables seq2seq models that outperform much larger decoder-only LLMs across different tasks, both in a few-shot setting and fine-tuning.”